From 2afb3b1bd0cc5bb61a3d12f80d3221b75b028876 Mon Sep 17 00:00:00 2001
From: Jay Vucic <jay.vucic@gmail.com>
Date: Tue, 26 May 2026 10:47:42 -0700
Subject: [PATCH 1/3] feat(skills): add NIST 800-61r3 reviewer suite and
 project-manager skill

Adds 6 NIST 800-61r3 IR analysis skills (after-action reviewer, CSF mapper,
gap analyzer, maturity scorer, policy reviewer, recommendation auditor),
accompanying agent definition, and project-manager OS dashboard skill.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
 .gitignore                                    |   2 +
 agents/nist-800-61r3-ir-reviewer.md           | 266 ++++++++++++++++
 .../SKILL.md                                  | 269 ++++++++++++++++
 skills/nist-800-61r3-csf-mapper/SKILL.md      | 171 ++++++++++
 skills/nist-800-61r3-gap-analyzer/SKILL.md    | 218 +++++++++++++
 skills/nist-800-61r3-maturity-scorer/SKILL.md | 299 ++++++++++++++++++
 skills/nist-800-61r3-policy-reviewer/SKILL.md | 248 +++++++++++++++
 .../SKILL.md                                  | 270 ++++++++++++++++
 skills/project-manager/README.md              |  27 ++
 skills/project-manager/SKILL.md               |  88 ++++++
 10 files changed, 1858 insertions(+)
 create mode 100644 agents/nist-800-61r3-ir-reviewer.md
 create mode 100644 skills/nist-800-61r3-after-action-reviewer/SKILL.md
 create mode 100644 skills/nist-800-61r3-csf-mapper/SKILL.md
 create mode 100644 skills/nist-800-61r3-gap-analyzer/SKILL.md
 create mode 100644 skills/nist-800-61r3-maturity-scorer/SKILL.md
 create mode 100644 skills/nist-800-61r3-policy-reviewer/SKILL.md
 create mode 100644 skills/nist-800-61r3-recommendation-auditor/SKILL.md
 create mode 100644 skills/project-manager/README.md
 create mode 100644 skills/project-manager/SKILL.md

diff --git a/.gitignore b/.gitignore
index e43b0f9..4c312bf 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1 +1,3 @@
 .DS_Store
+
+.claude/worktrees/
diff --git a/agents/nist-800-61r3-ir-reviewer.md b/agents/nist-800-61r3-ir-reviewer.md
new file mode 100644
index 0000000..f0992c1
--- /dev/null
+++ b/agents/nist-800-61r3-ir-reviewer.md
@@ -0,0 +1,266 @@
+---
+name: nist-800-61r3-ir-reviewer
+description: "Composite IR reviewer agent grounded in NIST SP 800-61r3 (April 2025). Accepts any cybersecurity document — IR plan, playbook, incident report, after-action report, IR policy, risk assessment — classifies it, runs the appropriate nist-800-61r3-* skill sequence, and synthesizes a full NIST IR Evaluation Report with CSF coverage map, priority gap list, R-item audit, maturity score, and high-ROI improvement roadmap. Use when asked to 'review this against NIST 800-61r3', 'NIST IR evaluation', 'full NIST review', or 'evaluate our IR program'."
+model: opus
+color: blue
+---
+
+You are a senior cybersecurity incident response reviewer with deep expertise in NIST SP 800-61r3 (April 2025) — *Incident Response Recommendations and Considerations for Cybersecurity Risk Management: A CSF 2.0 Community Profile*. You have read every page of the standard and know every CSF element, recommendation, consideration, and note it contains.
+
+Your role is to evaluate cybersecurity documents against the SP 800-61r3 Community Profile and produce actionable, citation-precise evaluation reports. You are methodical, not bureaucratic. You cite specific CSF IDs and R/C/N item numbers. You distinguish between R-items ("should do") and C-items ("should consider") and never conflate them. You call out real gaps and you acknowledge what's working.
+
+---
+
+## Your Knowledge Base
+
+**Document structure:** SP 800-61r3 is organized as a CSF 2.0 Community Profile in two tables:
+- **Table 2** — Preparation & Lessons Learned: covers GV (Govern), ID (Identify), PR (Protect)
+- **Table 3** — Incident Response: covers DE (Detect), RS (Respond), RC (Recover) — all elements are High priority
+
+**New in r3 vs r2:** The previous four-phase model (Preparation → Detection & Analysis → Containment/Eradication/Recovery → Post-Incident Activity) has been superseded by the CSF 2.0 six-function model. Never use r2 phase language unless explicitly mapping legacy documents.
+
+**Priority system:**
+- **High** — Core incident response activity for most organizations (all Table 3)
+- **Medium** — Directly supports incident response (selected Table 2)
+- **Low** — Indirectly supports incident response (remaining Table 2)
+
+**Annotation types:**
+- **R** = Recommendation ("the organization should do this") — highest obligation
+- **C** = Consideration ("the organization should consider doing this") — lower obligation
+- **N** = Note (informational only) — not scored
+
+---
+
+## Skill Sequence
+
+You orchestrate these skills in sequence based on document type:
+
+```
+ALL DOCUMENTS:
+  1. nist-800-61r3-csf-mapper      → Coverage map (CSF element → Addressed/Partial/Not Found)
+  2. nist-800-61r3-gap-analyzer    → Priority gap list (High/Medium/Low)
+  3. nist-800-61r3-recommendation-auditor → R-item audit (Met/Partial/Not Met)
+  4. nist-800-61r3-maturity-scorer → Weighted score + maturity level
+
+TYPE-SPECIFIC (add to sequence):
+  IR Policy / IR Plan → nist-800-61r3-policy-reviewer (§2.3 element check)
+  After-Action / Lessons Learned → nist-800-61r3-after-action-reviewer
+  Both types → run both additional skills
+```
+
+---
+
+## Workflow
+
+### Phase 1: Document Intake
+
+When a user submits a document for review:
+
+1. **Acknowledge** the document and confirm you received it
+2. **Classify** the document type:
+   - IR Plan
+   - Playbook (technical or operational)
+   - Incident Report (active or closed)
+   - After-Action Report / Post-Incident Report / Lessons Learned
+   - IR Policy
+   - Risk Assessment
+   - Combined/Multiple documents (whole-program assessment)
+   - Unknown — ask for clarification
+3. **Confirm scope**: Full review (all functions) or targeted (specific CSF function)?
+4. **Note document metadata** if present: version, date, author, organization type
+
+### Phase 2: CSF Mapping
+
+Run `nist-800-61r3-csf-mapper`:
+- Produce the coverage table grouped by Function (DE → RS → RC first, then GV → ID → PR)
+- Identify document type-expected coverage profile
+- Flag immediately if Table 3 (Incident Response) elements are largely absent — this is a critical finding regardless of document type
+
+### Phase 3: Gap Analysis
+
+Run `nist-800-61r3-gap-analyzer` using the coverage map:
+- Produce the prioritized gap list (Critical → Significant → Minor → Medium → Low)
+- For each Critical gap, note which other elements it blocks or degrades
+- Identify the single highest-impact gap fix
+
+### Phase 4: Recommendation Audit
+
+Run `nist-800-61r3-recommendation-auditor` scoped to document type:
+- Audit all in-scope R-items
+- Separate Table 3 R-items (all High) from Table 2 R-items
+- Compute R-item compliance percentage per function
+- Flag Top 3 Not Met R-items by business/legal impact
+
+### Phase 5: Type-Specific Review (if applicable)
+
+**For IR Policy / IR Plan:** Run `nist-800-61r3-policy-reviewer`
+- Check all 8 §2.3 required policy elements
+- Check GV.RR-02 role/authority designations
+- Check ID.IM-04 plan maintenance framework
+
+**For After-Action Report / Lessons Learned:** Run `nist-800-61r3-after-action-reviewer`
+- Check RC.RP-06 completeness (incident + response + lessons)
+- Check RS.AN-03 root cause depth
+- Check lessons actionability score
+
+### Phase 6: Maturity Scoring
+
+Run `nist-800-61r3-maturity-scorer`:
+- Apply weighted scoring model (High=3, Medium=2, Low=1)
+- Compute per-function scores
+- Assign maturity level (1–5)
+- Identify top 5 high-ROI improvement actions with score uplift
+
+### Phase 7: Synthesize NIST IR Evaluation Report
+
+Produce the final report (format below).
+
+---
+
+## NIST IR Evaluation Report Format
+
+```
+╔══════════════════════════════════════════════════════════════════╗
+║              NIST SP 800-61r3 IR EVALUATION REPORT              ║
+╚══════════════════════════════════════════════════════════════════╝
+
+Document:       [name, version, date if available]
+Document Type:  [classified type]
+Organization:   [if known]
+Standard:       NIST SP 800-61r3 (April 2025), CSF 2.0 Community Profile
+Reviewed by:    nist-800-61r3-ir-reviewer
+Review Date:    [date]
+
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+MATURITY SCORE & LEVEL
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+
+Overall Score: XX%    Maturity Level: [1–5] — [Name]
+
+Function Heatmap:
+  DE (Detect)   [bar]  XX%
+  RS (Respond)  [bar]  XX%
+  RC (Recover)  [bar]  XX%
+  GV (Govern)   [bar]  XX%
+  ID (Identify) [bar]  XX%
+  PR (Protect)  [bar]  XX%
+
+Table 3 (IR readiness):  XX%
+Table 2 (Preparation):   XX%
+
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+CRITICAL GAPS  (must address before next incident)
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+
+[C1] [CSF ID] — [Short title]
+     Status: Missing / Partial
+     Requirement: "[verbatim R-item text]" ([CSF ID.Rx])
+     Impact: [why this matters operationally]
+     Remediation: [specific, practical action]
+
+[C2] ...
+
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+SIGNIFICANT GAPS  (address within 30 days)
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+
+[S1] ...
+
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+RECOMMENDATION AUDIT SUMMARY
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+
+High-priority R-items (Table 3): Met: X  Partial: X  Not Met: X
+Medium-priority R-items (Table 2): Met: X  Partial: X  Not Met: X
+R-item compliance: XX% (High), XX% (Medium)
+
+Top 3 unmet R-items by impact:
+  1. [CSF ID.Rx] — [description] — [business/legal impact]
+  2. ...
+  3. ...
+
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+[TYPE-SPECIFIC SECTION — if applicable]
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+
+For IR Policy: §2.3 POLICY ELEMENTS CHECKLIST
+For AAR:       AFTER-ACTION COMPLETENESS REVIEW
+
+[checklist output from respective skill]
+
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+HIGH-ROI IMPROVEMENT ROADMAP
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+
+Rank  Element     Score Impact  Action
+────  ──────────  ────────────  ──────────────────────────────────
+ 1    [ID]        +X.X%         [specific action]
+ 2    ...
+ ...
+
+Completing all 5 actions would raise score to approximately XX% (Level X).
+
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+WHAT'S WORKING
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+
+[3–5 specific strengths with CSF IDs]
+
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+MEDIUM & LOW PRIORITY GAPS  (address in next program review cycle)
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+
+[condensed list, no full descriptions needed]
+```
+
+---
+
+## Behavioral Rules
+
+**Citation precision:**
+- Always cite the specific CSF ID (e.g., RS.MA-02, not just "Respond")
+- Always cite the specific R/C/N item number (e.g., RS.MA-02.R1, not just "RS.MA-02")
+- Quote the verbatim requirement text for every Critical and Significant gap
+
+**Tone:**
+- Direct and practical — write for IR practitioners, not compliance officers
+- Acknowledge what works; don't lead with only deficiencies
+- Distinguish between a bad document and a document that doesn't cover what wasn't expected of it
+
+**Scope discipline:**
+- Only evaluate elements relevant to the document type — don't flag an incident report for missing IR policy elements
+- Explicitly state scope assumptions at the top of the report
+- If a document is ambiguous in type, classify conservatively and note the assumption
+
+**r3 vs r2:**
+- This agent evaluates against SP 800-61r3 (April 2025) exclusively
+- If a document references r2 phase language, note it and map to r3 equivalents using Table 1
+- Do not penalize for r2 terminology if the underlying requirement is addressed
+
+**Escalation signals:**
+- If the document has essentially zero Table 3 coverage, pause after Phase 2 and ask whether the user intended to submit a different document — zero IR coverage on an "IR Plan" is a red flag
+- If the document appears to be from a federal agency context, note that GV.OC-03.R1 (regulatory notification requirements) has additional FISMA implications
+
+---
+
+## Quick Reference: Trigger → Skill Mapping
+
+| User says... | This agent does... |
+|---|---|
+| "Review this [any doc] against NIST" | Full evaluation (all phases) |
+| "Quick check against 800-61r3" | Phases 1–3 only (mapper + gaps + R-audit), skip scoring |
+| "Score our IR program" | Phase 1 + 6 only (classify + score) |
+| "Is this playbook NIST-compliant?" | Phases 1–4 (mapper + gaps + R-audit + score), Table 3 focus |
+| "Review this policy" | Phases 1–4 + policy-reviewer |
+| "Review this AAR" | Phases 1–4 + after-action-reviewer |
+| "What's our maturity level?" | Phase 6 only if prior assessment provided; else full run |
+
+---
+
+## NEVER
+
+- **NEVER invent a CSF ID** — only use IDs that appear in SP 800-61r3's actual tables
+- **NEVER skip the "What's Working" section** — finding only gaps is incomplete and demoralizing; acknowledge strengths
+- **NEVER use SP 800-61r2 phase names (Preparation / Detection & Analysis / Containment)** in findings — use CSF 2.0 Function names
+- **NEVER produce a maturity score without the per-function breakdown** — the overall score alone is meaningless
+- **NEVER mark a score as "NIST compliant"** — SP 800-61r3 is a recommendations document, not a compliance standard; use "aligned" or "covers X% of SP 800-61r3 recommendations"
diff --git a/skills/nist-800-61r3-after-action-reviewer/SKILL.md b/skills/nist-800-61r3-after-action-reviewer/SKILL.md
new file mode 100644
index 0000000..1b03491
--- /dev/null
+++ b/skills/nist-800-61r3-after-action-reviewer/SKILL.md
@@ -0,0 +1,269 @@
+---
+name: nist-800-61r3-after-action-reviewer
+description: "Evaluate post-incident reports, after-action reports (AARs), and lessons-learned documents against NIST SP 800-61r3's post-incident activity requirements — specifically RS.AN-03, RS.AN-06/07/08, RC.RP-06, and ID.IM-03. Checks root cause completeness, evidence integrity, magnitude validation, stakeholder communications, recovery verification, and whether lessons are actionable. Use when asked to 'review this after-action report', 'evaluate our lessons learned', or 'post-incident review check'."
+user-invocable: true
+---
+
+# NIST SP 800-61r3 After-Action Reviewer
+
+Evaluate post-incident reports, after-action reports (AARs), and lessons-learned documents against NIST SP 800-61r3 (April 2025) post-incident requirements.
+
+---
+
+## Goal
+
+SP 800-61r3 defines rigorous requirements for what an after-action report must contain and how the lessons-learned process feeds back into IR program improvement. This skill checks that an AAR captures everything the standard requires — not just "what happened" but "how we know," "what we missed," and "what we're changing."
+
+---
+
+## Inputs
+
+- The after-action report, post-incident report, or lessons-learned document (text or file path)
+- Incident type (data breach, ransomware, account takeover, DoS, insider, etc.) — affects scope
+- Was this a major incident? (affects whether senior leadership update requirements apply)
+
+---
+
+## Workflow
+
+```
+1. Check RC.RP-06 — After-Action Report Completeness
+   ↓
+2. Check RS.AN-03 — Root Cause Analysis Quality
+   ↓
+3. Check RS.AN-06/07/08 — Evidence & Records Integrity
+   ↓
+4. Check RS.CO-02/03 — Communication Retrospective
+   ↓
+5. Check RC.RP-04/05 — Recovery Verification
+   ↓
+6. Check ID.IM-03 — Lessons-Learned Loop Closure
+   ↓
+7. Generate AAR Review Report
+```
+
+### Step 1: RC.RP-06 — After-Action Report Completeness
+
+**RC.RP-06.R1** states: "Prepare an after-action report that documents the incident itself, the response and recovery actions taken, and lessons learned."
+
+Check that the AAR contains all three mandated components:
+
+**Component A — The Incident**
+- [ ] Incident type and classification (per RS.MA-03 categorization)
+- [ ] Initial detection method and timestamp (links to DE.AE)
+- [ ] Timeline of significant events from first indicator to recovery completion
+- [ ] Systems, data, and services affected
+- [ ] Estimated or confirmed impact (financial, operational, reputational, regulatory)
+- [ ] Threat actor characterization if known (TTPs, attribution confidence)
+
+**Component B — Response and Recovery Actions**
+- [ ] Triage and validation actions (RS.MA-02)
+- [ ] Incident management decisions and rationale (RS.MA-03, RS.MA-04)
+- [ ] Containment measures applied (RS.MI-01)
+- [ ] Eradication actions taken (RS.MI-02)
+- [ ] Recovery steps executed in order (RC.RP-02, RC.RP-04)
+- [ ] Recovery completion declaration (RC.RP-06)
+- [ ] Notifications sent (RS.CO-02, RS.CO-03) with timestamps and recipients
+
+**Component C — Lessons Learned**
+- [ ] What worked well
+- [ ] What did not work or was delayed
+- [ ] Root cause(s) — not just proximate cause but systemic (see Step 2)
+- [ ] Specific, assignable improvement actions (not vague "improve monitoring")
+- [ ] Owner and deadline for each improvement action
+- [ ] Which CSF Function each improvement targets
+
+### Step 2: RS.AN-03 — Root Cause Analysis Quality
+
+SP 800-61r3 RS.AN-03 requires four specific analysis activities:
+
+**RS.AN-03.R1** — "Determine the sequence of events that have occurred during the incident and which assets and resources were involved in each of those events."
+- Check: Is there a complete, timestamped event sequence?
+- Check: Are affected assets enumerated at each stage?
+- Gap signal: Report says "attacker gained access" without explaining the full chain
+
+**RS.AN-03.R2** — "Determine what vulnerabilities, threats, and threat actors were directly or indirectly involved in the incident."
+- Check: Is the specific vulnerability or misconfiguration that enabled the incident identified?
+- Check: Are indirect enablers noted (e.g., missing MFA, stale accounts, excessive permissions)?
+- Gap signal: Report blames "phishing" without identifying which controls failed or were absent
+
+**RS.AN-03.R3** — "Analyze the incident to find the underlying or systemic root causes."
+- Check: Does the AAR go beyond the proximate cause (the exploit/vector) to systemic causes?
+- Systemic cause examples: patch management failure, training gap, monitoring blind spot, inadequate access controls, architectural weakness
+- Technique check: Was a 5-Whys or fishbone analysis performed?
+- Gap signal: Report identifies root cause as "an employee clicked a phishing link" — this is a proximate cause, not systemic
+
+**RS.AN-03.R4** — "Check any deployed cyber deception technology for additional information on attacker behavior."
+- Check: If honeypots, deception tokens, or canary files were deployed — were they checked?
+- Mark N/A if no deception technology is deployed
+
+### Step 3: RS.AN-06/07/08 — Evidence & Records Integrity
+
+**RS.AN-06.R1** — "Safeguard the confidentiality and integrity of incident response records; ensure only authorized personnel have access."
+- Check: Does the AAR reference how IR records were protected?
+- Check: Is there a statement about who had access to incident data?
+- Note: The incident lead is responsible for records safeguarding
+
+**RS.AN-07.R1** — "Collect and retain evidence per evidence preservation procedures and data retention policies; consider factors including the possibility of prosecution."
+- Check: Were forensic images, logs, or artifacts collected and documented?
+- Check: Is there reference to chain of custody procedures?
+- Check: Was prosecution possibility considered and documented?
+- Check: Are retention timelines for collected evidence noted?
+
+**RS.AN-08.R1** — "Look for indicators of compromise, evidence of persistence, and other signs of an incident on both the assets known to be targeted and other potential targets."
+- Check: Was the incident scope validated beyond the initially identified systems?
+- Check: Were adjacent/related assets checked for IoCs?
+- Check: Was persistence mechanism identified and confirmed removed?
+- Gap signal: Report addresses only initially identified host without checking lateral spread
+
+### Step 4: RS.CO-02/03 — Communication Retrospective
+
+Review whether required notifications were actually performed:
+
+**RS.CO-02.R2** — "Follow established procedures concerning incident coordination: what must be reported, to whom, and at what times."
+- Check: Were all required internal notifications made (leadership, legal, HR, asset owners)?
+- Check: Were notification timestamps documented?
+
+**RS.CO-02.R3** — "Perform notifications in compliance with incident notification laws/regulations."
+- Check: Were regulatory/legal notification obligations triggered?
+- Check: If triggered, were they met within required timeframes?
+- Check: Is there documentation of the notification decision (even if decided not required)?
+
+**RS.CO-02.R5** — "Notify law enforcement and regulatory bodies per criteria in IR plan."
+- Check: Was law enforcement notification considered and decision documented?
+
+**RS.CO-03.R2** — "Regularly update senior leadership on status of major incidents."
+- Check (major incidents only): Were senior leadership updates provided during the incident?
+- Check: Is there a log of leadership communications?
+
+### Step 5: RC.RP-04/05 — Recovery Verification
+
+**RC.RP-04.R1** — "Validate that essential services are restored in the appropriate order."
+- Check: Does the AAR document the restoration sequence?
+- Check: Were critical services restored before non-critical?
+
+**RC.RP-04.R2** — "Work with system owners to confirm successful restoration and return to normal operations."
+- Check: Did system owners sign off on recovery?
+- Check: Is there documentation of owner confirmations?
+
+**RC.RP-04.R3** — "Monitor performance of restored systems to verify adequacy of restoration."
+- Check: Was there a post-recovery monitoring period?
+- Check: Were any anomalies detected post-restoration?
+
+**RC.RP-05.R1** — "Check restored assets for IoCs; remediate root causes before production use."
+- Check: Were restored systems scanned for IoCs before going back online?
+- Check: Were root causes remediated before production restoration?
+
+**RC.RP-05.R2** — "Verify correctness and adequacy of restoration actions before putting restored system online."
+- Check: Was there a pre-production verification step?
+
+### Step 6: ID.IM-03 — Lessons-Learned Loop Closure
+
+**ID.IM-03.N3** — "Creating follow-up reports or holding 'lessons learned' meetings when an incident's recovery efforts are concluding, especially if the incident was major."
+
+**ID.IM-03.N2** — "Improvements that affect IR can be made to the IR program itself (plan, policy, procedures) or to other aspects of cybersecurity risk management (e.g., identifying TTPs not currently blocked by safeguards or flagged by detection technologies)."
+
+Check:
+- [ ] Was a lessons-learned meeting held? (attendees, date documented?)
+- [ ] Do improvement actions target IR program changes (not just technical fixes)?
+- [ ] Do any lessons feed back to Preparation (GV, ID, PR functions)?
+- [ ] Are improvement actions SMART (Specific, Measurable, Assignable, Realistic, Time-bound)?
+- [ ] Is there a tracking mechanism for improvement actions?
+
+**Lessons-captured vs. lessons-actionable assessment:**
+- Count total lessons identified
+- Count lessons with assigned owner AND deadline
+- Ratio = actionability score
+
+---
+
+## Output Format
+
+```
+=== NIST SP 800-61r3 After-Action Review ===
+Document: [AAR name/incident reference]
+Incident type: [ransomware / data breach / etc.]
+Severity: [major / significant / minor]
+Standard: NIST SP 800-61r3 (April 2025)
+
+━━━ RC.RP-06 — AFTER-ACTION REPORT COMPLETENESS ━━━
+
+Component A — Incident Documentation
+[✓] Incident type and classification: Present (§1)
+[✓] Detection method and timestamp: Present (§2.1 — DE alert at 14:23 UTC)
+[~] Event timeline: Partial — gaps between 16:00–19:00 UTC unexplained
+[✗] Threat actor characterization: Missing — vector identified but no TTP analysis
+Gap: RC.RP-06.R1 requires documenting the incident itself; threat actor info
+needed for ID.IM-03 improvement loop.
+
+Component B — Response and Recovery Actions
+[✓] Triage and validation: Present (§3)
+[✓] Containment measures: Present (§4.1)
+[✗] Recovery steps: Missing — report notes systems restored but no sequence
+documented. RC.RP-04.R1 requires restoration order validation.
+
+Component C — Lessons Learned
+[~] What worked: Present (§7)
+[✗] Actionable improvements: 3 lessons identified, 0 have assigned owners or
+deadlines. ID.IM-03 requires SMART improvement actions.
+
+━━━ RS.AN-03 — ROOT CAUSE ANALYSIS ━━━
+
+[✓] R1 — Event sequence: Complete timeline from initial access to containment
+[~] R2 — Vulnerability identification: Phishing vector identified; no analysis
+of why MFA bypass was possible or why the phishing email bypassed email filtering
+[✗] R3 — Systemic root cause: Report identifies "employee clicked phishing link"
+as root cause — this is proximate, not systemic. No 5-Whys or structural analysis.
+Gap: Missing analysis of: why MFA was not enforced on this account class,
+why email filtering did not catch this variant, whether this reflects a pattern.
+[N/A] R4 — No deception technology deployed per §1.3
+
+━━━ RS.AN-06/07/08 — EVIDENCE & RECORDS ━━━
+[✓] RS.AN-06.R1 — IR records access control referenced (§6.2)
+[~] RS.AN-07.R1 — Evidence collected but no retention timeline documented
+[✗] RS.AN-08.R1 — Scope validation missing: no evidence adjacent systems
+were checked for lateral movement or IoC presence
+
+━━━ RS.CO COMMUNICATIONS RETROSPECTIVE ━━━
+[✓] Internal notifications made with timestamps (§5.1)
+[~] RS.CO-02.R3 — Regulatory notification decision not documented (HIPAA
+applicability not assessed in AAR)
+[✓] RS.CO-03.R2 — Senior leadership received 3 updates during incident
+
+━━━ RC.RP RECOVERY VERIFICATION ━━━
+[~] RC.RP-04.R2 — System owner confirmations: verbal only, not documented
+[✗] RC.RP-05.R1 — No pre-production IoC scan of restored systems documented
+
+━━━ ID.IM-03 — LESSONS LEARNED LOOP ━━━
+Lessons identified: 3
+Lessons with owner + deadline: 0
+Actionability score: 0% — all lessons are observations, none are assigned actions
+
+━━━ SUMMARY ━━━
+Critical gaps: RS.AN-03.R3 (systemic root cause), RC.RP-05.R1 (IoC scan),
+              ID.IM-03 actionability (no assigned owners)
+Significant gaps: RS.AN-08.R1 (scope validation), RC.RP-04.R2 (owner sign-off)
+Assessment: AAR captures what happened but will not improve the IR program.
+The absence of systemic root cause analysis means the same incident is likely
+to recur. The 0% actionability on lessons learned means no improvements will
+be tracked or implemented.
+```
+
+---
+
+## Deliverable
+
+A section-by-section AAR review with citation-anchored findings, a lessons-actionability score, and specific improvement recommendations. Suitable for:
+- IR program quality assurance
+- Compliance audit support
+- AAR revision guidance
+- Input to `nist-800-61r3-maturity-scorer`
+
+---
+
+## NEVER
+
+- **NEVER accept proximate cause as root cause** — "user clicked a link" is never a root cause; the systemic failure enabling that click is
+- **NEVER mark lessons learned as sufficient without owner + deadline** — unassigned lessons are observations, not improvements
+- **NEVER skip the scope validation check (RS.AN-08)** — unvalidated scope is a top cause of incidents recurring on adjacent systems
+- **NEVER omit the communications retrospective** — notification failures have legal and regulatory consequences that surface later
diff --git a/skills/nist-800-61r3-csf-mapper/SKILL.md b/skills/nist-800-61r3-csf-mapper/SKILL.md
new file mode 100644
index 0000000..113c743
--- /dev/null
+++ b/skills/nist-800-61r3-csf-mapper/SKILL.md
@@ -0,0 +1,171 @@
+---
+name: nist-800-61r3-csf-mapper
+description: "Map content from any cybersecurity document (playbook, IR plan, policy, incident report) to NIST SP 800-61r3 CSF 2.0 elements. Produces a coverage table showing which Functions, Categories, and Subcategories are addressed, partially addressed, or absent. Use when asked to 'map this to NIST', 'what CSF elements does this cover', or 'tag this against 800-61r3'."
+user-invocable: true
+---
+
+# NIST SP 800-61r3 CSF Mapper
+
+Map any cybersecurity document to the CSF 2.0 Community Profile defined in NIST SP 800-61r3 (April 2025).
+
+---
+
+## Goal
+
+Produce a structured coverage table that shows which CSF 2.0 elements from the SP 800-61r3 Community Profile are:
+- **Addressed** — content explicitly satisfies the element
+- **Partial** — content touches the element but incompletely
+- **Not Found** — element is absent from the document
+
+---
+
+## Inputs
+
+- The document to analyze (paste text, provide file path, or describe it)
+- Document type (ask if not obvious): IR Plan, Playbook, Incident Report, After-Action Report, IR Policy, Risk Assessment, Other
+
+---
+
+## Workflow
+
+```
+1. Classify Document
+   ↓
+2. Determine Relevant CSF Scope
+   ↓
+3. Scan & Tag Content
+   ↓
+4. Build Coverage Table
+   ↓
+5. Output Summary
+```
+
+### Step 1: Classify Document
+
+Identify the document type before mapping — different types have different expected coverage profiles:
+
+| Document Type        | Primary CSF Scope Expected |
+|----------------------|---------------------------|
+| IR Plan              | GV.PO, GV.RR, ID.IM-04, RS (all), RC (all) |
+| Playbook             | RS.MA, RS.AN, RS.MI, DE.AE-08, RC.RP |
+| Incident Report      | RS.MA, RS.AN, RS.CO, RS.MI, RC.RP |
+| After-Action Report  | RS.AN-03, RS.AN-06/07, RC.RP-06, ID.IM-03 |
+| IR Policy            | GV.PO, GV.RR, ID.IM-04 (§2.3 elements) |
+| Risk Assessment      | ID.RA (all), GV.RM, ID.AM |
+
+### Step 2: Determine Relevant CSF Scope
+
+SP 800-61r3 defines two tables:
+
+**Table 2 — Preparation & Lessons Learned** (GV, ID, PR):
+- GV: Govern (GV.OC, GV.RM, GV.RR, GV.PO, GV.OV, GV.SC)
+- ID: Identify (ID.AM, ID.RA, ID.IM)
+- PR: Protect (PR.AA, PR.AT, PR.DS, PR.PS, PR.IR)
+
+**Table 3 — Incident Response** (DE, RS, RC) — all elements are High priority:
+- DE: Detect (DE.CM, DE.AE)
+- RS: Respond (RS.MA, RS.AN, RS.CO, RS.MI)
+- RC: Recover (RC.RP, RC.CO)
+
+For most documents, focus first on Table 3 elements (all High priority), then Table 2.
+
+### Step 3: Scan & Tag Content
+
+Read the document and for each meaningful section/paragraph:
+1. Identify which CSF Subcategory it most closely satisfies
+2. Assess completeness: does it fully satisfy the element or only partially?
+3. Note the source location (section number, paragraph, or quote)
+
+**Key semantic anchors** — phrases that signal specific CSF elements:
+
+| If document mentions...                        | Maps to          |
+|------------------------------------------------|------------------|
+| Continuous monitoring, anomaly detection       | DE.CM            |
+| Incident declaration criteria, threshold       | DE.AE-08         |
+| Triage, severity assessment, validate          | RS.MA-02         |
+| Incident categorization, incident type         | RS.MA-03         |
+| Escalation, elevation, resource increase       | RS.MA-04         |
+| Recovery initiation criteria                   | RS.MA-05         |
+| Root cause analysis, event sequence            | RS.AN-03         |
+| Evidence collection, chain of custody          | RS.AN-07         |
+| Incident magnitude, IoC scope                  | RS.AN-08         |
+| Stakeholder notification, breach notification  | RS.CO-02         |
+| Information sharing, ISAC, threat intel share  | RS.CO-03         |
+| Containment, isolate, quarantine               | RS.MI-01         |
+| Eradication, persistence removal, patch        | RS.MI-02         |
+| Recovery plan execution, restore operations    | RC.RP-01/02      |
+| Backup integrity, clean restore verification   | RC.RP-03/05      |
+| After-action report, lessons learned           | RC.RP-06, ID.IM-03 |
+| IR policy, policy elements                     | GV.PO, ID.IM-04  |
+| Roles, responsibilities, authority             | GV.RR-02         |
+
+### Step 4: Build Coverage Table
+
+Output a table with these columns:
+
+| CSF ID | Priority | Description (brief) | Coverage | Evidence/Location |
+|--------|----------|---------------------|----------|-------------------|
+
+Coverage values: `Addressed` / `Partial` / `Not Found`
+
+Group by Function: DE → RS → RC (Table 3 first), then GV → ID → PR (Table 2).
+
+Only include elements that are relevant to the document type (skip Low-priority Table 2 elements unless the document explicitly covers them).
+
+### Step 5: Output Summary
+
+After the table, provide:
+1. **Coverage counts**: X/Y High-priority elements addressed
+2. **Top gaps**: The 3–5 most significant missing elements by priority
+3. **Document type assessment**: Does the coverage match what's expected for this document type?
+
+---
+
+## Output Format
+
+```
+=== CSF 2.0 Coverage Map — [Document Name/Type] ===
+Standard: NIST SP 800-61r3 (April 2025)
+Document Type: [classified type]
+
+TABLE 3 — INCIDENT RESPONSE (All High Priority)
+────────────────────────────────────────────────────────────────────────
+CSF ID      | Description                              | Coverage    | Location
+────────────────────────────────────────────────────────────────────────
+DE.CM       | Continuous monitoring of assets          | Partial     | §2.1
+DE.CM-01    | Network monitoring                       | Addressed   | §2.1.a
+DE.CM-09    | Computing HW/SW/runtime monitoring       | Not Found   | —
+DE.AE       | Adverse event analysis                   | Partial     | —
+DE.AE-02    | Events analyzed to understand activity   | Addressed   | §3.2
+DE.AE-08    | Incident declaration criteria            | Not Found   | —
+RS.MA       | Incident management                      | Partial     | §4
+RS.MA-01    | IR plan executed on incident declaration | Addressed   | §4.1
+RS.MA-02    | Incidents triaged and validated          | Partial     | §4.2 (no severity criteria)
+...
+
+TABLE 2 — PREPARATION & LESSONS LEARNED (Mixed Priority)
+[only elements relevant to document type]
+...
+
+SUMMARY
+────────
+High-priority (Table 3) coverage: 9/23 elements addressed, 5 partial, 9 not found
+Medium-priority coverage: 4/12 addressed
+Top gaps: DE.AE-08 (incident criteria), RS.MA-02 (triage), RS.AN-03 (root cause), RS.CO-02 (notifications), RC.RP-06 (after-action)
+Assessment: Coverage is below expected for an IR Plan — RS.CO and RC sections are largely absent.
+```
+
+---
+
+## Deliverable
+
+A coverage table + summary that can be handed directly to `nist-800-61r3-gap-analyzer` for prioritized gap analysis or to `nist-800-61r3-maturity-scorer` for scoring.
+
+---
+
+## NEVER
+
+- **NEVER invent coverage** — only mark Addressed if the document explicitly addresses the element
+- **NEVER skip Table 3** — all DE/RS/RC elements are High priority; always include them
+- **NEVER use SP 800-61r2 element IDs** — this skill maps to r3/CSF 2.0 IDs only (no legacy "Preparation → Detection & Analysis" phases)
+- **NEVER map vague language to specific elements** — "we monitor our systems" alone does not satisfy DE.CM-01 through DE.CM-09 individually
diff --git a/skills/nist-800-61r3-gap-analyzer/SKILL.md b/skills/nist-800-61r3-gap-analyzer/SKILL.md
new file mode 100644
index 0000000..1756cbb
--- /dev/null
+++ b/skills/nist-800-61r3-gap-analyzer/SKILL.md
@@ -0,0 +1,218 @@
+---
+name: nist-800-61r3-gap-analyzer
+description: "Identify prioritized gaps in any cybersecurity document against the NIST SP 800-61r3 CSF 2.0 Community Profile. Accepts raw document content or a coverage map from nist-800-61r3-csf-mapper and outputs a priority-ranked gap list (High/Medium/Low) with specific SP 800-61r3 citation anchors. Use when asked to 'find gaps in this IR plan', 'what's missing from this playbook', or 'gap analysis against 800-61r3'."
+user-invocable: true
+---
+
+# NIST SP 800-61r3 Gap Analyzer
+
+Identify and prioritize gaps in any cybersecurity document against the CSF 2.0 Community Profile defined in NIST SP 800-61r3 (April 2025).
+
+---
+
+## Goal
+
+Produce a prioritized gap list — sorted High → Medium → Low — for every CSF 2.0 element from SP 800-61r3 that is missing or incomplete in the reviewed document. Each gap includes the SP 800-61r3 citation, the specific requirement not met, and a remediation suggestion.
+
+---
+
+## Inputs
+
+- The document to analyze (text, file path, or description), OR
+- A coverage map previously produced by `nist-800-61r3-csf-mapper`
+- Document type (if not already classified)
+
+---
+
+## Workflow
+
+```
+1. Obtain Coverage Map (run mapper if not already done)
+   ↓
+2. Load SP 800-61r3 Element Baseline
+   ↓
+3. Identify Gaps
+   ↓
+4. Classify & Prioritize
+   ↓
+5. Generate Gap Report
+```
+
+### Step 1: Obtain Coverage Map
+
+If a coverage map exists from `nist-800-61r3-csf-mapper`, use it directly. Otherwise, perform inline mapping against the document before gap analysis.
+
+### Step 2: SP 800-61r3 Element Baseline
+
+The complete element set from SP 800-61r3, organized by priority:
+
+**HIGH PRIORITY — Table 3 (Incident Response)**
+
+| CSF ID     | Description |
+|------------|-------------|
+| DE         | Detect Function |
+| DE.CM      | Continuous Monitoring |
+| DE.CM-01   | Networks/network services monitored |
+| DE.CM-02   | Physical environment monitored |
+| DE.CM-03   | Personnel activity and technology usage monitored |
+| DE.CM-06   | External service provider activities monitored |
+| DE.CM-09   | Computing HW/SW/runtime/data monitored |
+| DE.AE      | Adverse Event Analysis |
+| DE.AE-02   | Events analyzed to understand associated activities |
+| DE.AE-03   | Information correlated from multiple sources |
+| DE.AE-04   | Estimated impact and scope of adverse events understood |
+| DE.AE-06   | Information on adverse events provided to authorized staff/tools |
+| DE.AE-07   | CTI and contextual information integrated into analysis |
+| DE.AE-08   | Incidents declared when adverse events meet incident criteria |
+| RS         | Respond Function |
+| RS.MA      | Incident Management |
+| RS.MA-01   | IR plan executed on incident declaration |
+| RS.MA-02   | Incident reports triaged and validated |
+| RS.MA-03   | Incidents categorized and prioritized |
+| RS.MA-04   | Incidents escalated or elevated as needed |
+| RS.MA-05   | Recovery initiation criteria applied |
+| RS.AN      | Incident Analysis |
+| RS.AN-03   | Root cause and event sequence established |
+| RS.AN-06   | Investigation actions recorded with integrity |
+| RS.AN-07   | Incident data/metadata collected with integrity preserved |
+| RS.AN-08   | Incident magnitude estimated and validated |
+| RS.CO      | Incident Reporting and Communication |
+| RS.CO-02   | Internal and external stakeholders notified |
+| RS.CO-03   | Information shared with designated stakeholders |
+| RS.MI      | Incident Mitigation |
+| RS.MI-01   | Incidents contained |
+| RS.MI-02   | Incidents eradicated |
+| RC         | Recover Function |
+| RC.RP      | Incident Recovery Plan Execution |
+| RC.RP-01   | Recovery portion of IR plan executed |
+| RC.RP-02   | Recovery actions selected, scoped, prioritized, performed |
+| RC.RP-03   | Backup/restoration asset integrity verified |
+| RC.RP-04   | Critical mission functions considered post-incident |
+| RC.RP-05   | Restored asset integrity verified, normal status confirmed |
+| RC.RP-06   | End of incident recovery declared, documentation completed |
+| RC.CO      | Incident Recovery Communication |
+| RC.CO-03   | Recovery activities/progress communicated to stakeholders |
+| RC.CO-04   | Public updates shared using approved methods/messaging |
+
+**MEDIUM PRIORITY — Table 2 (selected high-value elements)**
+
+| CSF ID     | Description |
+|------------|-------------|
+| GV.OC-03   | Legal, regulatory, contractual requirements understood |
+| GV.RR      | Roles, responsibilities, authorities established |
+| GV.RR-02   | Roles/responsibilities documented and enforced |
+| GV.PO      | Organizational cybersecurity policy established |
+| GV.OV-01   | Risk management strategy outcomes reviewed |
+| GV.SC-08   | Relevant suppliers included in IR planning/response |
+| ID.AM-01   | HW inventories maintained |
+| ID.AM-02   | SW/services inventories maintained |
+| ID.RA-02   | CTI received from sharing forums/sources |
+| ID.RA-05   | Threats/vulns/likelihoods used to inform risk response |
+| ID.RA-06   | Risk responses chosen, tracked, communicated |
+| ID.IM-01   | Improvements identified from evaluations |
+| ID.IM-02   | Improvements from security tests and exercises |
+| ID.IM-03   | Improvements from operational execution |
+| ID.IM-04   | IR plans established, maintained, improved |
+| PR.DS-11   | Backups created, protected, maintained, tested |
+| PR.PS-04   | Log records generated for continuous monitoring |
+
+**LOW PRIORITY — Table 2 (informational)**
+
+All remaining GV, ID, PR elements not listed above.
+
+### Step 3: Identify Gaps
+
+For each element in the baseline, determine its status in the coverage map:
+- `Not Found` → Gap
+- `Partial` → Partial Gap (flag as gap with lower severity modifier)
+- `Addressed` → No gap
+
+### Step 4: Classify & Prioritize
+
+For each gap, assign:
+
+**Severity** (for High-priority elements):
+- **Critical**: Element is absent and directly impacts incident handling (RS.MA-02, DE.AE-08, RS.AN-03, RS.MI-01, RC.RP-06)
+- **Significant**: Element is absent but has workarounds or is partially covered
+- **Minor**: Partial coverage that needs strengthening
+
+**Gap type**:
+- **Missing**: No evidence of the element
+- **Incomplete**: Element addressed but key sub-requirements absent
+
+### Step 5: Generate Gap Report
+
+---
+
+## Output Format
+
+```
+=== NIST SP 800-61r3 Gap Analysis Report ===
+Document: [name/type]
+Standard: NIST SP 800-61r3 (April 2025)
+Analysis date: [date]
+
+━━━ CRITICAL GAPS (High Priority — Table 3) ━━━
+
+[C1] DE.AE-08 — Incident Declaration Criteria
+     Status: Missing
+     Requirement: "Apply incident criteria to known and assumed characteristics
+     of analyzed activity, and consider known false positives to determine
+     whether an incident should be declared." (DE.AE-08.R1)
+     Impact: Without defined declaration criteria, incident response may start
+     too late or on false positives — undermining RS.MA-02 triage.
+     Remediation: Define specific observable thresholds (e.g., confirmed C2
+     beacon, lateral movement detected, data staged for exfil) that constitute
+     an incident declaration trigger.
+
+[C2] RS.MA-02 — Incident Triage and Validation
+     Status: Partial (severity categories present, no urgency criteria)
+     Requirement: "Perform a preliminary review of a new incident report to
+     verify that a cybersecurity incident has occurred, then estimate the
+     severity and urgency needed to respond to it." (RS.MA-02.R1)
+     Impact: Severity tiers exist but no time-based urgency criteria — teams
+     cannot determine response SLAs.
+     Remediation: Add urgency criteria alongside severity (e.g., P1 = respond
+     within 1hr, P2 = 4hr, P3 = 24hr).
+
+...
+
+━━━ SIGNIFICANT GAPS (High Priority) ━━━
+
+[S1] RS.CO-02 — Stakeholder Notification
+     ...
+
+━━━ MEDIUM PRIORITY GAPS (Table 2) ━━━
+
+[M1] ID.IM-04 — IR Plans Maintained and Improved
+     ...
+
+━━━ SUMMARY ━━━
+
+Critical gaps:    3  (must address before next incident)
+Significant gaps: 5  (address within 30 days)
+Minor gaps:       4  (address in next program review cycle)
+Medium gaps:      6
+Low gaps:         2
+
+Highest-impact single fix: Adding DE.AE-08 incident declaration criteria would
+immediately improve RS.MA-02, RS.MA-03, and RS.MA-04 effectiveness.
+```
+
+---
+
+## Deliverable
+
+A prioritized, citation-anchored gap report suitable for:
+- Direct input to `nist-800-61r3-maturity-scorer`
+- Action items for IR program improvement backlog
+- Audit findings documentation
+
+---
+
+## NEVER
+
+- **NEVER mark a gap as Critical unless it is in Table 3 AND directly impacts incident handling**
+- **NEVER suggest remediation that adds bureaucracy** — all suggestions should be practical, not compliance theater
+- **NEVER conflate Missing with Partial** — partial coverage is progress; acknowledge what exists
+- **NEVER omit citation anchors** — every finding must cite the specific CSF ID and R/C/N item from SP 800-61r3
diff --git a/skills/nist-800-61r3-maturity-scorer/SKILL.md b/skills/nist-800-61r3-maturity-scorer/SKILL.md
new file mode 100644
index 0000000..e59c492
--- /dev/null
+++ b/skills/nist-800-61r3-maturity-scorer/SKILL.md
@@ -0,0 +1,299 @@
+---
+name: nist-800-61r3-maturity-scorer
+description: "Score an organization's IR program maturity against the NIST SP 800-61r3 CSF 2.0 Community Profile using a weighted scoring model tied to element priority (High/Medium/Low). Produces a per-function heatmap, weighted overall score, maturity level (1-5), and the highest-leverage improvement actions. Use when asked to 'score this against NIST', 'maturity assessment', 'how mature is our IR program', or 'NIST compliance score'."
+user-invocable: true
+---
+
+# NIST SP 800-61r3 Maturity Scorer
+
+Score an organization's IR program maturity against the complete CSF 2.0 Community Profile from NIST SP 800-61r3 (April 2025).
+
+---
+
+## Goal
+
+Produce a quantitative maturity score grounded in SP 800-61r3's own priority system. High-priority elements (all Table 3 elements) have the most weight; Medium-priority elements (Table 2 selected) have moderate weight; Low-priority elements contribute minimally. Output a score, maturity level, per-function heatmap, and the top improvement actions by ROI.
+
+---
+
+## Inputs
+
+- Coverage data from any of:
+  - Raw document (will map inline)
+  - Coverage map from `nist-800-61r3-csf-mapper`
+  - Gap report from `nist-800-61r3-gap-analyzer`
+  - Recommendation audit from `nist-800-61r3-recommendation-auditor`
+  - Combined inputs from multiple documents (whole-program assessment)
+- Assessment mode: **Document** (single doc) or **Program** (entire IR program across multiple docs)
+
+---
+
+## Workflow
+
+```
+1. Determine Element Coverage
+   ↓
+2. Apply Weighted Scoring Model
+   ↓
+3. Compute Per-Function Scores
+   ↓
+4. Compute Overall Weighted Score
+   ↓
+5. Assign Maturity Level
+   ↓
+6. Identify High-ROI Improvements
+   ↓
+7. Generate Maturity Report
+```
+
+### Step 1: Determine Element Coverage
+
+For each SP 800-61r3 element, assign a coverage score:
+- **1.0** — Fully addressed (evidence is explicit and complete)
+- **0.5** — Partially addressed (element present but key sub-requirements missing)
+- **0.0** — Not found or absent
+
+If working from a raw document, perform inline mapping before scoring.
+
+### Step 2: Weighted Scoring Model
+
+SP 800-61r3's own priority system drives the weights:
+
+| Priority | Source | Weight |
+|----------|--------|--------|
+| High     | Table 3 (all DE/RS/RC elements) | 3 |
+| Medium   | Table 2 selected GV/ID/PR elements | 2 |
+| Low      | Table 2 remaining GV/ID/PR elements | 1 |
+
+**Score formula per function:**
+```
+Function_score = Σ(element_coverage × element_weight) / Σ(element_weight)
+```
+
+**Overall weighted score:**
+```
+Overall = Σ(all element_coverage × element_weight) / Σ(all element_weight)
+```
+
+### Step 3: SP 800-61r3 Element Registry with Weights
+
+**HIGH WEIGHT (3) — Table 3 Elements**
+
+| CSF ID | Weight | Description |
+|--------|--------|-------------|
+| DE.CM | 3 | Continuous Monitoring (parent) |
+| DE.CM-01 | 3 | Network monitoring |
+| DE.CM-02 | 3 | Physical environment monitoring |
+| DE.CM-03 | 3 | Personnel activity monitoring |
+| DE.CM-06 | 3 | External service provider monitoring |
+| DE.CM-09 | 3 | Computing HW/SW/runtime monitoring |
+| DE.AE | 3 | Adverse Event Analysis (parent) |
+| DE.AE-02 | 3 | Events analyzed |
+| DE.AE-03 | 3 | Information correlated from multiple sources |
+| DE.AE-04 | 3 | Impact and scope estimated |
+| DE.AE-06 | 3 | Info provided to authorized staff/tools |
+| DE.AE-07 | 3 | CTI integrated into analysis |
+| DE.AE-08 | 3 | Incident declaration criteria |
+| RS.MA | 3 | Incident Management (parent) |
+| RS.MA-01 | 3 | IR plan executed on declaration |
+| RS.MA-02 | 3 | Incidents triaged and validated |
+| RS.MA-03 | 3 | Incidents categorized and prioritized |
+| RS.MA-04 | 3 | Incidents escalated/elevated |
+| RS.MA-05 | 3 | Recovery initiation criteria applied |
+| RS.AN | 3 | Incident Analysis (parent) |
+| RS.AN-03 | 3 | Root cause established |
+| RS.AN-06 | 3 | Investigation actions recorded with integrity |
+| RS.AN-07 | 3 | Incident data collected with integrity |
+| RS.AN-08 | 3 | Incident magnitude estimated |
+| RS.CO | 3 | Incident Communication (parent) |
+| RS.CO-02 | 3 | Stakeholders notified |
+| RS.CO-03 | 3 | Information shared with stakeholders |
+| RS.MI | 3 | Incident Mitigation (parent) |
+| RS.MI-01 | 3 | Incidents contained |
+| RS.MI-02 | 3 | Incidents eradicated |
+| RC.RP | 3 | Recovery Plan Execution (parent) |
+| RC.RP-01 | 3 | Recovery plan executed |
+| RC.RP-02 | 3 | Recovery actions performed |
+| RC.RP-03 | 3 | Backup integrity verified |
+| RC.RP-04 | 3 | Critical functions considered post-incident |
+| RC.RP-05 | 3 | Restored asset integrity verified |
+| RC.RP-06 | 3 | Recovery declared, documentation completed |
+| RC.CO | 3 | Recovery Communication (parent) |
+| RC.CO-03 | 3 | Recovery progress communicated |
+| RC.CO-04 | 3 | Public updates shared |
+
+**MEDIUM WEIGHT (2) — Table 2 Selected**
+
+| CSF ID | Weight |
+|--------|--------|
+| GV.RR | 2 |
+| GV.RR-02 | 2 |
+| GV.PO | 2 |
+| GV.OC-03 | 2 |
+| GV.SC-08 | 2 |
+| GV.OV-01 | 2 |
+| ID.RA-02 | 2 |
+| ID.RA-05 | 2 |
+| ID.RA-06 | 2 |
+| ID.AM-01 | 2 |
+| ID.AM-02 | 2 |
+| ID.IM-01 | 2 |
+| ID.IM-02 | 2 |
+| ID.IM-03 | 2 |
+| ID.IM-04 | 2 |
+| PR.DS-11 | 2 |
+| PR.PS-04 | 2 |
+
+**LOW WEIGHT (1) — Table 2 Remaining**
+
+All remaining GV, ID, PR elements not listed above (GV.OC-01/02/04/05, GV.RM-01–07, GV.RR-01/03/04, GV.PO-01/02, GV.OV-02/03, GV.SC-01–10, ID.AM-03–08, ID.RA-01/03/04/07/08/09/10, PR.AA, PR.AT, PR.DS, PR.PS, PR.IR)
+
+### Step 4: Per-Function Scores
+
+Compute a score for each of the 6 CSF Functions independently, using only the elements belonging to that function.
+
+Convert to percentage and render a visual bar:
+```
+DE (Detect)  [████████░░]  80%
+RS (Respond) [██████░░░░]  60%
+RC (Recover) [████████░░]  80%
+GV (Govern)  [██████░░░░]  55%
+ID (Identify)[████░░░░░░]  45%
+PR (Protect) [████░░░░░░]  40%
+```
+
+### Step 5: Maturity Level Assignment
+
+Map the overall weighted score to a maturity level:
+
+| Score | Level | Name | Characterization |
+|-------|-------|------|-----------------|
+| 0–29% | 1 | Initial | No consistent IR practices; ad hoc and reactive |
+| 30–49% | 2 | Developing | Basic IR exists but is undocumented or inconsistently applied |
+| 50–69% | 3 | Defined | Documented IR program exists; key processes established |
+| 70–84% | 4 | Managed | Measured IR program; performance tracked; continuous improvement in place |
+| 85–100% | 5 | Optimizing | IR integrated into all cybersecurity risk management; lessons feed back continuously |
+
+These levels correspond loosely to CSF 2.0 Implementation Tiers 1–4 and the maturity concepts from ID.IM.
+
+### Step 6: High-ROI Improvements
+
+Identify the top 5 improvements with highest score impact:
+
+For each Not Met or Partial element, compute:
+- **Score uplift**: (1.0 - current_coverage) × weight / total_weighted_points
+- **Dependencies unlocked**: how many other elements improve if this one is addressed
+
+Rank by (score_uplift × 3) + (dependencies_unlocked × 2).
+
+Common high-ROI improvements (based on typical IR programs):
+1. **DE.AE-08** (incident declaration criteria) — weight 3; enables RS.MA-01 through RS.MA-05
+2. **ID.IM-04** (IR plan maintenance) — weight 2; foundational for all Table 3 elements
+3. **RS.AN-03** (root cause analysis) — weight 3; enables ID.IM-03 improvement loop
+4. **RS.CO-02** (stakeholder notifications) — weight 3; legal/regulatory exposure reduction
+5. **PR.PS-04** (log records) — weight 2; enables DE.AE-02/03/07 detection capabilities
+
+### Step 7: Generate Maturity Report
+
+---
+
+## Output Format
+
+```
+╔══════════════════════════════════════════════════════════════════╗
+║     NIST SP 800-61r3 IR Maturity Assessment                     ║
+║     Standard: NIST SP 800-61r3 (April 2025), CSF 2.0            ║
+╚══════════════════════════════════════════════════════════════════╝
+
+Assessment Mode: [Document / Program]
+Document(s): [list]
+Assessment Date: [date]
+
+━━━ MATURITY SCORE ━━━
+
+Overall Weighted Score: 61%
+┌─────────────────────────────┐
+│  MATURITY LEVEL 3: DEFINED  │
+│  Score range: 50–69%        │
+└─────────────────────────────┘
+Documented IR program exists with key processes established.
+Gaps remain in continuous improvement loop and detection coverage.
+
+━━━ PER-FUNCTION SCORES ━━━
+
+Function     Coverage  Score  Bar
+──────────── ───────── ────── ──────────────────────
+DE (Detect)   11/14 el   74%  ████████████████░░░░░
+RS (Respond)  15/25 el   58%  ████████████░░░░░░░░░
+RC (Recover)  10/12 el   80%  ████████████████░░░░
+GV (Govern)    9/18 el   50%  ██████████░░░░░░░░░░░
+ID (Identify)  8/18 el   44%  █████████░░░░░░░░░░░░
+PR (Protect)   7/15 el   47%  █████████░░░░░░░░░░░░
+
+Strongest function:  RC (Recover) — 80%
+Weakest function:    ID (Identify) — 44%
+Table 3 (IR) score:  67%  ← primary operational readiness indicator
+Table 2 (Prep) score: 47%  ← preparation and prevention readiness
+
+━━━ ELEMENT COVERAGE DETAIL ━━━
+
+High Priority (Table 3):  24/39 elements addressed (3 partial)
+Medium Priority (Table 2): 9/17 elements addressed (2 partial)
+Low Priority (Table 2):   11/24 elements addressed
+
+━━━ TOP 5 HIGH-ROI IMPROVEMENTS ━━━
+
+Rank  Element        Score Uplift  Dependencies  Action
+────  ─────────────  ────────────  ────────────  ──────────────────────────────
+ 1    DE.AE-08        +3.2%         5 unlocked    Define incident declaration
+                                                  criteria with false-positive
+                                                  guidance
+ 2    RS.AN-03        +2.8%         3 unlocked    Implement RCA methodology
+                                                  (5-Whys or fishbone) in all
+                                                  AAR procedures
+ 3    ID.IM-04.R2     +1.6%         2 unlocked    Establish formal IR plan
+                                                  review cycle (annual +
+                                                  post-major-incident trigger)
+ 4    RS.CO-02.R3     +2.4%         0 unlocked    Document regulatory
+                                                  notification obligations by
+                                                  jurisdiction and sector
+ 5    PR.PS-04        +1.8%         4 unlocked    Implement centralized log
+                                                  management covering all
+                                                  SP 800-61r3 DE.CM asset types
+
+Implementing all 5 would raise score to approximately 72% (Level 4: Managed).
+
+━━━ BENCHMARKING ━━━
+
+Level 3 (Defined) organizations typically:
+✓ Have documented IR policies and procedures
+✓ Have incident classification and triage processes
+✗ Lack continuous improvement mechanisms
+✗ Have gaps in detection coverage (DE elements)
+✗ Do not systematically measure IR program performance
+
+This assessment is consistent with a Level 3 organization.
+
+━━━ TREND (if prior assessment available) ━━━
+Prior score: N/A
+Change: N/A
+```
+
+---
+
+## Deliverable
+
+A scored maturity report with function-level heatmap, maturity level assignment, and prioritized improvement roadmap. Suitable for:
+- Executive reporting on IR program state
+- Security program roadmap planning
+- Board/leadership cybersecurity briefings
+- Benchmarking across assessment cycles
+
+---
+
+## NEVER
+
+- **NEVER report a score without showing the per-function breakdown** — an overall score without function detail masks where the real problems are
+- **NEVER assign Maturity Level 4 or 5 without verifying the ID.IM improvement loop** — a program that doesn't measure itself cannot be Managed or Optimizing
+- **NEVER use the score as a compliance checkbox** — maturity scores measure program capability, not regulatory compliance; explicitly state this limitation
+- **NEVER round up partial coverage** — 0.5 is 0.5; a half-addressed element is not the same as a fully addressed one
diff --git a/skills/nist-800-61r3-policy-reviewer/SKILL.md b/skills/nist-800-61r3-policy-reviewer/SKILL.md
new file mode 100644
index 0000000..a0e62c4
--- /dev/null
+++ b/skills/nist-800-61r3-policy-reviewer/SKILL.md
@@ -0,0 +1,248 @@
+---
+name: nist-800-61r3-policy-reviewer
+description: "Evaluate IR policy and procedure documents against NIST SP 800-61r3 Section 2.3 required policy elements and the GV.PO/GV.RR/ID.IM-04 CSF categories. Checks for all 8 required policy elements, role authority designations, and plan maintenance requirements. Use when asked to 'review this IR policy', 'check our incident response policy', 'is this policy complete', or 'policy completeness check'."
+user-invocable: true
+---
+
+# NIST SP 800-61r3 Policy Reviewer
+
+Evaluate IR policies, plans, and procedure documents against the requirements defined in NIST SP 800-61r3 Section 2.3 and the associated CSF 2.0 GV/ID elements.
+
+---
+
+## Goal
+
+SP 800-61r3 Section 2.3 explicitly defines the required elements of an IR policy. This skill checks that a policy document contains all required elements, properly designates roles and authorities, and establishes the plan maintenance framework needed to stay current.
+
+---
+
+## Inputs
+
+- The IR policy, IR plan, or procedures document to review (text or file path)
+- Organization type (federal agency, private sector, MSSP) — affects regulatory notification requirements
+
+---
+
+## Workflow
+
+```
+1. Load Section 2.3 Policy Checklist
+   ↓
+2. Check 8 Required Policy Elements
+   ↓
+3. Check Roles & Authorities (GV.RR-02)
+   ↓
+4. Check Plan Framework (ID.IM-04)
+   ↓
+5. Check Shared Responsibility & Supply Chain (GV.SC-08)
+   ↓
+6. Generate Policy Review Report
+```
+
+### Step 1: Section 2.3 Policy Checklist
+
+SP 800-61r3 §2.3 states that most IR policies include the same key elements. These are the required policy elements:
+
+| # | Element | CSF Anchor |
+|---|---------|-----------|
+| 1 | Statement of management commitment | GV.RR-01 |
+| 2 | Purpose and objectives of the policy | GV.PO |
+| 3 | Scope of the policy (to whom/what it applies, under what circumstances) | GV.PO |
+| 4 | Definition of events, cybersecurity incidents, investigations, and related terms | DE.AE-08 |
+| 5 | Roles, responsibilities, and authorities (including who has authority to confiscate/disconnect/shut down assets) | GV.RR-02 |
+| 6 | Guidelines for prioritizing incidents, estimating severity, initiating recovery, maintaining/restoring operations, and other key actions | RS.MA-03, RS.MA-05 |
+| 7 | Performance measures | ID.IM-01 |
+| 8 | Shared responsibility model with third parties (MSSPs, CSPs, ISPs) if applicable | GV.SC-08, GV.RR-02 |
+
+### Step 2: Check 8 Required Policy Elements
+
+For each element, determine:
+- **Present** — policy explicitly addresses this element
+- **Partial** — element is referenced but incomplete (e.g., roles listed but no authority designations)
+- **Missing** — no evidence of the element
+
+**Detailed checks per element:**
+
+**Element 1 — Management Commitment**
+- Is there a signed/dated endorsement from executive leadership?
+- Does it reference resource allocation for IR?
+- CSF: GV.RR-01 — "Organizational leadership is responsible and accountable for cybersecurity risk"
+
+**Element 2 — Purpose and Objectives**
+- Does the policy state why it exists?
+- Does it reference reducing incident impact and improving response effectiveness?
+- CSF: GV.PO
+
+**Element 3 — Scope**
+- Who does this apply to? (employees, contractors, third parties, specific systems)
+- What environments are covered? (cloud, on-prem, OT/ICS, mobile)
+- Are exclusions documented?
+- CSF: GV.PO
+
+**Element 4 — Definitions**
+- Are "event," "adverse event," and "cybersecurity incident" defined per SP 800-61r3 Appendix B?
+- Is "incident" distinguished from "event" — incidents jeopardize CIA of information or constitute policy violations
+- Are investigation-related terms defined?
+- CSF: DE.AE-08
+
+**Element 5 — Roles, Responsibilities, and Authorities**
+- Are incident response roles explicitly named (incident lead, handlers, legal, HR, comms, leadership)?
+- Does the policy designate which roles have authority to:
+  - Confiscate assets
+  - Disconnect systems from the network
+  - Shut down technology assets
+- Are third-party roles (MSSP, CSP, law enforcement) addressed?
+- CSF: GV.RR-02.R1, GV.RR-02.R2
+
+**Element 6 — Prioritization, Severity, and Recovery Guidelines**
+- Are incident severity/priority tiers defined (e.g., P1–P4 or Critical/High/Medium/Low)?
+- Are risk evaluation factors listed? SP 800-61r3 RS.MA.N2 suggests: asset criticality, functional impact, data impact, stage of observed activity, threat actor characterization, recoverability
+- Are recovery initiation criteria defined? (RS.MA-05.R1)
+- Are time-based response SLAs included?
+- CSF: RS.MA-03.R1, RS.MA-05.R1
+
+**Element 7 — Performance Measures**
+- Are IR program performance metrics defined?
+- Examples: mean time to detect, mean time to respond, mean time to recover, % incidents meeting SLAs, % staff trained
+- Is there a review cycle for measuring performance against these metrics?
+- CSF: ID.IM-01.R1 — "Periodically evaluate IR program performance to identify problems"
+
+**Element 8 — Shared Responsibility Model**
+- If the organization uses MSSPs/CSPs/ISPs: are their responsibilities documented?
+- Are contracts/SLAs referenced?
+- Are information flow and coordination authorities defined?
+- Are restrictions on third-party actions documented (e.g., cannot share sanitized incident info with other customers)?
+- CSF: GV.SC-08, GV.RR-02.R3
+
+### Step 3: Roles & Authorities Deep Check (GV.RR-02)
+
+Check specifically for GV.RR-02.R1, R2, R3:
+
+**R1 — All IR roles documented in organizational policies:**
+- [ ] Incident Response Team / SOC roles documented
+- [ ] Legal counsel role in IR documented
+- [ ] HR role documented (GV.RR-02 + RS.CO-03.R3)
+- [ ] Public affairs / media relations role documented
+- [ ] Physical security and facilities role documented
+- [ ] Asset owners role documented
+- [ ] Leadership decision authority documented
+
+**R2 — All appropriate individuals designated the authority to fulfill IR responsibilities:**
+- [ ] Named or role-titled individuals with authority to isolate/disconnect systems
+- [ ] Named or role-titled individuals with authority to invoke business continuity plans
+- [ ] Named or role-titled individuals with authority to engage law enforcement
+- [ ] Escalation/elevation triggers and authority chain documented (RS.MA-04)
+
+### Step 4: Plan Framework Check (ID.IM-04)
+
+SP 800-61r3 ID.IM-04 has four recommendations for all cybersecurity plans:
+
+- **R1** — Synchronize business continuity plans with incident response plans
+  - Check: Does the policy reference BCP/DRP alignment?
+- **R2** — Review and update all cybersecurity plans periodically or when significant improvements are identified
+  - Check: Is there a review/update cycle defined? (Annual, post-incident trigger?)
+- **R3** — Base each plan on the organization's unique requirements, mission, size, structure, and functions
+  - Check: Is the policy tailored to the org or clearly a generic template?
+- **R4** — Each plan identifies the resources and management support needed
+  - Check: Are resource requirements (staff, tools, budget) referenced?
+
+### Step 5: Regulatory & Notification Check (GV.OC-03)
+
+Based on organization type, verify the policy addresses applicable notification requirements:
+
+**GV.OC-03.R1** — Cybersecurity requirements include all IR-related requirements (incident notification, data breach reporting)
+
+Check:
+- [ ] Notification obligations referenced (FISMA, HIPAA, PCI-DSS, GDPR, state breach laws — as applicable)
+- [ ] Breach notification timelines addressed (RS.CO-02.R3)
+- [ ] Law enforcement notification criteria defined (RS.CO-02.R5)
+- [ ] Regulatory body notification criteria defined
+
+### Step 6: Generate Policy Review Report
+
+---
+
+## Output Format
+
+```
+=== NIST SP 800-61r3 IR Policy Review ===
+Document: [policy name/version]
+Organization type: [federal / private / MSSP]
+Standard: NIST SP 800-61r3 §2.3 + CSF 2.0 GV/ID elements
+
+━━━ SECTION 2.3 REQUIRED ELEMENTS CHECKLIST ━━━
+
+[✓] Element 1 — Management Commitment
+    Found: Executive VP signature, dated 2025-01-15; §1.1
+    Note: Does not reference specific resource commitments (FTE, budget)
+
+[✗] Element 4 — Definitions
+    Missing: "Event" and "adverse event" are not defined.
+    The policy uses "incident" broadly without distinguishing security events
+    from confirmed cybersecurity incidents per FISMA 2014 definition.
+    Required by: DE.AE-08, SP 800-61r3 Appendix B
+    Fix: Add a definitions section with SP 800-61r3 Appendix B terms at minimum.
+
+[~] Element 5 — Roles, Responsibilities, and Authorities
+    Partial: Roles listed (IR Team, Legal, HR) but authority to disconnect
+    systems is not designated to any specific role.
+    Required by: GV.RR-02.R2 — "All appropriate individuals should be
+    designated the authority necessary to fulfill their IR responsibilities"
+    Fix: Add explicit authority table specifying who can isolate/disconnect/
+    shut down each asset class.
+
+[✓] Element 6 — Prioritization Guidelines
+    Found: Four-tier severity model (P1-P4) with response SLAs; §3.2
+    Note: Recovery initiation criteria (RS.MA-05.R1) are absent — policy
+    defines when to start responding but not when to start recovering.
+
+[✗] Element 7 — Performance Measures
+    Missing: No IR performance metrics defined anywhere in the document.
+    Required by: ID.IM-01.R1
+    Fix: Define at minimum: MTTD, MTTR, % incidents within SLA,
+    % staff completing IR training annually.
+
+━━━ ROLES & AUTHORITIES (GV.RR-02) ━━━
+[✓] IR Team roles documented
+[✓] Legal counsel role documented
+[~] HR role referenced but not scoped for insider threat (RS.CO-03.R3)
+[✗] Physical security role not mentioned
+[✗] No named authority to invoke BCP/DRP during incident
+
+━━━ PLAN FRAMEWORK (ID.IM-04) ━━━
+[~] R1 — BCP alignment mentioned in passing, not formally linked (§1.4)
+[✓] R2 — Annual review cycle defined (§6.1)
+[✗] R3 — Document appears to be a generic template; lacks org-specific context
+[~] R4 — "Resources as needed" language present but no specifics
+
+━━━ REGULATORY/NOTIFICATION (GV.OC-03) ━━━
+[~] Notification obligations referenced but no specific laws cited
+[✗] No breach notification timelines defined
+
+━━━ SUMMARY ━━━
+Present:  4 / 8 required elements fully addressed
+Partial:  2 / 8 elements partially addressed
+Missing:  2 / 8 required elements absent
+
+Critical gaps: Element 4 (definitions) and Element 7 (performance measures)
+Policy assessment: Not ready for operational use without addressing critical gaps.
+Estimated effort to remediate: LOW — most gaps are additions, not rewrites.
+```
+
+---
+
+## Deliverable
+
+A checklist-based policy review report with element-by-element findings, direct SP 800-61r3 citations, and specific remediation guidance. Suitable for:
+- IR policy audit findings
+- Input to `nist-800-61r3-maturity-scorer`
+- Policy revision task list
+
+---
+
+## NEVER
+
+- **NEVER accept "implied" policy elements** — if an element is not explicit in the document, it is Missing
+- **NEVER skip the authority designations check** — GV.RR-02.R2 is one of the most commonly missed requirements
+- **NEVER recommend creating a separate document for each gap** — all elements should be in the policy; suggest additions to the existing document
+- **NEVER skip the definitions check** — undefined terms (especially "event" vs. "incident") cause real operational confusion during actual incidents
diff --git a/skills/nist-800-61r3-recommendation-auditor/SKILL.md b/skills/nist-800-61r3-recommendation-auditor/SKILL.md
new file mode 100644
index 0000000..f8c2868
--- /dev/null
+++ b/skills/nist-800-61r3-recommendation-auditor/SKILL.md
@@ -0,0 +1,270 @@
+---
+name: nist-800-61r3-recommendation-auditor
+description: "Audit whether a cybersecurity document satisfies the R-tagged (Recommendation) items from NIST SP 800-61r3's CSF 2.0 Community Profile. R-items are the things SP 800-61r3 says organizations *should* do — the highest-signal findings. Outputs a Met/Partial/Not Met assessment per R-item with direct quotes from the standard. Use when asked to 'check all the recommendations', 'audit R items', or 'what recommendations aren't met'."
+user-invocable: true
+---
+
+# NIST SP 800-61r3 Recommendation Auditor
+
+Audit whether a document satisfies the R-tagged recommendations from the SP 800-61r3 CSF 2.0 Community Profile (April 2025).
+
+---
+
+## Goal
+
+SP 800-61r3 uses three annotation types in its Community Profile tables:
+- **R** = Recommendation: "the organization **should** do this"
+- **C** = Consideration: "the organization **should consider** doing this"
+- **N** = Note: informational only
+
+This skill focuses on **R-items** (mandatory-intent recommendations) — these are the highest-signal findings in an audit. C-items are assessed secondarily. N-items are informational only and not scored.
+
+---
+
+## Inputs
+
+- Document to audit (text, file path, or description)
+- Document type (ask if unclear)
+- Scope: Full audit (all R-items) or targeted (specific CSF Function)
+
+---
+
+## Workflow
+
+```
+1. Load R-item Catalog
+   ↓
+2. Classify Document & Set Scope
+   ↓
+3. Audit Each R-item
+   ↓
+4. Score Results
+   ↓
+5. Generate Audit Report
+```
+
+### Step 1: R-item Catalog
+
+The complete R-item catalog from SP 800-61r3, grouped by priority tier:
+
+**HIGH PRIORITY — Table 3 (all R-items here are critical)**
+
+**DE.CM (Continuous Monitoring)**
+- `DE.CM.R1` — Continuous monitoring must cover: networks/network services; computing HW/SW/runtime/data; physical environment; personnel activity and technology usage; external service provider activities
+- `DE.CM.R2` — Tune monitoring technologies to reduce false positives and false negatives to acceptable levels
+- `DE.CM-01.R1` — Monitoring must include wired/wireless networks, network communications/flows, network services (DNS, BGP), and unauthorized/rogue networks
+- `DE.CM-02.R1` — Physical environment monitoring must include all access attempts, movement of people/equipment, and tampering signs
+- `DE.CM-03.R1` — Personnel activity monitoring must include anomalous user activity, authentication attempts, and deception technology
+- `DE.CM-06.R1` — External provider monitoring must include remote/on-site admin, maintenance activities, deviations from expected cloud/ISP behavior
+- `DE.CM-09.R1` — Monitor email, web, file sharing, collaboration services for malware/phishing/exfil
+- `DE.CM-09.R2` — Monitor authentication attempts for credential attacks and unauthorized credential use
+- `DE.CM-09.R3` — Monitor software/hardware configurations for deviations from security baselines
+- `DE.CM-09.R4` — Monitor hardware/software including cybersecurity protection mechanisms for tampering/failure/compromise
+- `DE.CM-09.R5` — Monitor endpoints for cyber health issues (missing patches, malware, unauthorized software)
+
+**DE.AE (Adverse Event Analysis)**
+- `DE.AE.R1` — Rely on technical solutions to filter large event datasets to human-viewable subsets
+- `DE.AE.R2` — Proactively find incidents earlier in the attack life cycle
+- `DE.AE-02.R1` — Use SIEM/SOAR to continuously monitor log events for known malicious/suspicious activity
+- `DE.AE-02.R2` — Use up-to-date CTI in log analysis tools to improve detection accuracy
+- `DE.AE-02.R3` — Regularly conduct manual reviews for technologies that cannot be sufficiently auto-monitored
+- `DE.AE-03.R1` — Constantly transfer log data from sources to a small number of log servers
+- `DE.AE-03.R2` — Use event correlation technology (SIEM/SOAR) to gather related data from multiple sources
+- `DE.AE-03.R3` — Use CTI to correlate events among log sources
+- `DE.AE-04.R1` — Estimate impact/scope through automated (SIEM/SOAR) and/or manual means; review and refine estimates
+- `DE.AE-06.R1` — Generate alerts and provide to cybersecurity/IR tools and staff (SOC, incident responders)
+- `DE.AE-06.R2` — Make log analysis findings accessible to incident responders at all times
+- `DE.AE-07.R1` — Integrate up-to-date CTI and asset inventories into adverse event analysis
+- `DE.AE-07.R2` — Rapidly acquire and analyze vulnerability disclosures from suppliers/vendors/advisories
+- `DE.AE-08.R1` — Apply incident criteria to analyzed activity; consider known false positives to determine incident declaration
+
+**RS.MA (Incident Management)**
+- `RS.MA.R1` — Do not handle incidents on first-come, first-served basis — use risk evaluation factors
+- `RS.MA.R2` — Base triage/prioritization/escalation/recovery on risk evaluation factors
+- `RS.MA.R3` — Track incident response status with: summary, IoCs, status/timeframe per action, next steps
+- `RS.MA-01.R1` — Detection technologies should automatically report confirmed incidents
+- `RS.MA-02.R1` — Perform preliminary review to verify incident occurred, estimate severity and urgency
+- `RS.MA-02.R2` — Have mechanisms for third parties to report possible incidents; monitor reports seriously
+- `RS.MA-03.R1` — Perform detailed review to categorize by incident type (data breach, ransomware, account takeover, DoS)
+- `RS.MA-03.R2` — Prioritize based on scope, likely impact, time-critical nature, resource availability
+- `RS.MA-03.R3` — Select incident response strategies balancing recovery speed vs. investigation need
+- `RS.MA-04.R1` — Track and validate status of all ongoing incidents to identify escalation needs
+- `RS.MA-05.R1` — Apply incident recovery criteria to determine when recovery should be initiated
+- `RS.MA-05.R2` — Consider operational disruption of recovery activities when deciding recovery timing
+
+**RS.AN (Incident Analysis)**
+- `RS.AN-03.R1` — Determine sequence of events and which assets/resources were involved in each event
+- `RS.AN-03.R2` — Determine which vulnerabilities, threats, and threat actors were involved (direct/indirect)
+- `RS.AN-03.R3` — Analyze incident for underlying or systemic root causes
+- `RS.AN-03.R4` — Check any deployed cyber deception technology for additional attacker behavior info
+- `RS.AN-06.R1` — Safeguard confidentiality and integrity of IR records; restrict to authorized personnel only
+- `RS.AN-07.R1` — Collect and retain evidence per evidence preservation procedures and data retention policies; consider prosecution possibility
+- `RS.AN-08.R1` — Look for IoCs, evidence of persistence, and other signs on both targeted and potential target assets
+
+**RS.CO (Incident Reporting and Communication)**
+- `RS.CO.R1` — Have mechanisms in place in advance to coordinate with affected parties about incidents
+- `RS.CO-02.R1` — When incident analyzed/prioritized, coordinate with appropriate individuals inside and outside the organization
+- `RS.CO-02.R2` — Follow established procedures: what must be reported, to whom, at what times
+- `RS.CO-02.R3` — Perform notifications in compliance with incident notification laws/regulations for organization's sectors and geographies
+- `RS.CO-02.R4` — Notify affected third parties of data breaches per regulatory, legal, contractual requirements
+- `RS.CO-02.R5` — Notify law enforcement and regulatory bodies per criteria in IR plan and management approval
+- `RS.CO-03.R1` — Securely share information consistent with response plans and information sharing agreements
+- `RS.CO-03.R2` — Regularly update senior leadership on status of major incidents
+- `RS.CO-03.R3` — Notify human resources when malicious insider activity has occurred
+- `RS.CO-03.R4` — Establish and follow media communications procedures for IR that comply with org policies
+
+**RS.MI (Incident Mitigation)**
+- `RS.MI-01.R1` — Allow incident handlers to manually select and perform containment actions in addition to or instead of automated measures
+- `RS.MI-02.R1` — Identify all affected hosts and services so all flaws and weaknesses can be remediated
+
+**RC.RP (Recovery Plan Execution)**
+- `RC.RP-01.R1` — Begin recovery procedures during or after incident response processes
+- `RC.RP-01.R2` — Inform all individuals with recovery responsibilities about plans and required authorizations
+- `RC.RP-02.R1` — Recovery actions must account for timeliness, precision, and reliability
+- `RC.RP-02.R2` — Select recovery actions based on IR plan criteria and available resources
+- `RC.RP-02.R3` — Change planned recovery actions based on reassessment of needs/resources
+- `RC.RP-03.R1` — Check restoration assets for indicators of compromise, file corruption, and integrity issues before use
+- `RC.RP-04.R1` — Validate that essential services are restored in the appropriate order
+- `RC.RP-04.R2` — Work with system owners to confirm successful restoration and return to normal operations
+- `RC.RP-04.R3` — Monitor performance of restored systems to verify adequacy of restoration
+- `RC.RP-05.R1` — Check restored assets for IoCs; remediate root causes before production use
+- `RC.RP-05.R2` — Verify correctness and adequacy of restoration actions before putting restored system online
+- `RC.RP-06.R1` — Prepare an after-action report documenting the incident, response/recovery actions, and lessons learned
+
+**RC.CO (Recovery Communication)**
+- `RC.CO-03.R1` — Securely share recovery information including restoration progress
+- `RC.CO-03.R2` — Regularly update senior leadership on recovery status and restoration progress for major incidents
+- `RC.CO-03.R3` — Follow contract rules for incident information sharing between org and suppliers
+- `RC.CO-03.R4` — Coordinate crisis communication between the organization and its critical suppliers
+- `RC.CO-04.R1` — Follow org's breach notification procedures for recovering from a data breach incident
+- `RC.CO-04.R2` — Explain the steps being taken to recover and to prevent recurrence
+
+**MEDIUM PRIORITY — Table 2 (selected)**
+
+- `GV.OC-03.R1` — Cybersecurity requirements include all IR-related requirements (incident notification, data breach reporting)
+- `GV.RM-03.R1` — Incident-related decision-making informed by other risk types (privacy, operational, safety, reputational, AI)
+- `GV.RM-06` — Have a standardized method for calculating, documenting, categorizing, and prioritizing cybersecurity risks
+- `GV.RR.R1` — Cybersecurity roles/responsibilities/authorities should include incident response
+- `GV.RR-02.R1` — All roles/responsibilities involving IR should be documented in organizational policies
+- `GV.RR-02.R2` — All appropriate individuals/parties should be designated the authority to fulfill IR responsibilities
+- `GV.PO.R1` — Cybersecurity policies should include an incident response policy
+- `GV.SC-05.R1` — Supply chain risk management requirements include cybersecurity performance, vulnerability/threat/incident disclosure/sharing
+- `GV.SC-08` — Relevant suppliers included in incident planning, response, and recovery activities
+- `ID.AM-01.R1` — Make current, auto-updated inventories of internal/external hardware available for vulnerability finding, monitoring, and shadow IT detection
+- `ID.AM-02.R1` — Make current, auto-updated inventories of internal/external SW/services available
+- `ID.RA-01.R1` — Understand current vulnerabilities to make informed risk decisions (all types: FW, misconfigs, design weaknesses, physical, integrity violations)
+- `ID.RA-03.R1` — Identify internal/external threats during routine operations and from CTI
+- `ID.RA-05.R1` — Use existing risk estimation mechanisms for incident response purposes
+- `ID.IM-01.R1` — Periodically evaluate IR program performance to identify problems and deficiencies
+- `ID.IM-04.R1` — Synchronize business continuity plans with incident response plans
+- `ID.IM-04.R2` — Review and update all cybersecurity plans periodically or when significant improvements are needed
+- `ID.IM-04.R3` — Base each cybersecurity plan on the organization's unique requirements, mission, size, structure
+- `ID.IM-04.R4` — Each plan identifies resources and management support needed
+- `PR.AT-02.R1` — Role-based training includes incident-related responsibilities
+- `PR.DS-11` — Backups created, protected, maintained, and tested (particularly important for recovery)
+
+### Step 2: Classify Document & Set Scope
+
+Determine relevant R-item scope based on document type:
+
+| Document Type       | Primary R-item Scope |
+|---------------------|----------------------|
+| IR Plan             | All Table 3 R-items + ID.IM-04.R1–R4 + GV.PO.R1 + GV.RR-02.R1/R2 |
+| Playbook            | RS.MA, RS.AN, RS.MI, RC.RP R-items + DE.AE-08.R1 |
+| Incident Report     | RS.MA-02–05, RS.AN-03, RS.AN-06/07/08, RS.CO-02/03, RC.RP-06 |
+| After-Action Report | RS.AN-03, RC.RP-06.R1, ID.IM-03 (no R), RS.AN-06.R1, RS.AN-07.R1 |
+| IR Policy           | GV.PO.R1, GV.RR-02.R1/R2, ID.IM-04.R1–R4, GV.OC-03.R1 |
+
+### Step 3: Audit Each R-item
+
+For each in-scope R-item:
+1. Search document for evidence satisfying the requirement
+2. Assign status:
+   - **Met** — document clearly satisfies the R-item requirement
+   - **Partial** — document addresses the concept but misses key sub-requirements
+   - **Not Met** — no evidence found
+   - **N/A** — not applicable to this document type
+
+### Step 4: Score Results
+
+Count Met/Partial/Not Met across priority tiers. Compute:
+- High-priority R-items: Met + Partial/2 ÷ Total
+- Weighted score: (High met × 3) + (Medium met × 2) ÷ total weighted
+
+### Step 5: Generate Audit Report
+
+---
+
+## Output Format
+
+```
+=== NIST SP 800-61r3 Recommendation Audit ===
+Document: [name/type]
+Scope: [Full / Table 3 / targeted function]
+R-items audited: [N]
+
+━━━ NOT MET — High Priority (Table 3) ━━━
+
+[✗] DE.AE-08.R1 — Incident Declaration Criteria
+    Requirement: "Apply incident criteria to known and assumed characteristics
+    of analyzed activity, and consider known false positives to determine
+    whether an incident should be declared."
+    Finding: No incident declaration criteria or thresholds defined in document.
+    Severity: Critical — incidents may be declared inconsistently or too late.
+
+[✗] RS.MA-03.R1 — Incident Categorization
+    Requirement: "Perform a more detailed review to categorize incidents by
+    type (e.g., data breach, ransomware, account takeover, denial of service)."
+    Finding: Document references 'incident types' but provides no categorization
+    taxonomy or guidance for categorizing a specific incident.
+    Severity: Significant.
+
+━━━ PARTIAL — High Priority (Table 3) ━━━
+
+[~] RS.MA-02.R1 — Triage and Validation
+    Requirement: "Perform preliminary review... estimate severity AND urgency."
+    Finding: Severity tiers (P1–P4) are defined, but no urgency or time-based
+    response SLA criteria are present.
+    Gap: Add time-to-respond criteria alongside severity classification.
+
+━━━ MET — High Priority (Table 3) ━━━
+
+[✓] RS.MI-01.R1 — Manual Containment Selection
+    Evidence: §5.2 "Incident handlers may override automated containment and
+    manually select containment actions based on incident context."
+
+...
+
+━━━ MEDIUM PRIORITY (Table 2) ━━━
+[✗] GV.PO.R1 — IR policy not referenced in document
+[✓] ID.IM-04.R2 — Review cycle stated as annual in §1.3
+...
+
+━━━ SCORECARD ━━━
+High-priority R-items:   Met: 14  Partial: 6  Not Met: 8  (Total: 28)
+Medium-priority R-items: Met: 8   Partial: 3  Not Met: 7  (Total: 18)
+Recommendation compliance: 57% (High), 49% (Medium)
+
+Top 3 highest-impact Not Met items:
+1. DE.AE-08.R1 — Incident declaration criteria (blocks all RS.MA)
+2. RS.CO-02.R3 — Regulatory notification compliance (legal exposure)
+3. RC.RP-06.R1 — After-action report requirement (no lessons-learned loop)
+```
+
+---
+
+## Deliverable
+
+A citation-anchored audit report with Met/Partial/Not Met assessments for all in-scope R-items, suitable for:
+- Direct input to `nist-800-61r3-maturity-scorer`
+- Audit findings for compliance documentation
+- IR program improvement backlog items
+
+---
+
+## NEVER
+
+- **NEVER mark an R-item Met without direct documentary evidence** — inference does not count
+- **NEVER omit the verbatim R-item text from SP 800-61r3** — every finding must include the exact requirement
+- **NEVER conflate C-items with R-items** — C-items are "should consider" (lower obligation); track them separately
+- **NEVER assess C-items unless R-items for the same element are fully covered** — prioritize the mandatory recommendations first
diff --git a/skills/project-manager/README.md b/skills/project-manager/README.md
new file mode 100644
index 0000000..f9a22c1
--- /dev/null
+++ b/skills/project-manager/README.md
@@ -0,0 +1,27 @@
+# project-manager
+
+Central OS Agent for multi-agent development workflows.
+
+Tracks live agent states, maintains `.agent_os_state.json`, enforces file-level dependency locking between agents, and renders a scannable CLI dashboard on demand.
+
+## Triggers
+
+- `/project-manager`
+- "system status", "agent dashboard", "project status"
+- "register agent", "check blockers", "update agent status"
+
+## Dashboard output
+
+```
+## 🖥️ AGENT OS DIGITAL TWIN DASHBOARD
+-----------------------------------------------------------------------
+[Agent ID]    [Active Skill]    [Git Status]    [Current Task]   [Status]
+-----------------------------------------------------------------------
+Agent-01      git, web-scrap     2 Commits Ahead Fix Login Bug    ⚙️ RUNNING
+Agent-02      db-schema, sql     Clean           Migrate Users    ⏸️ IDLE
+Agent-03      jest, coverage     Uncommitted     Write UI Tests   🛑 BLOCKED (by 01)
+```
+
+## State file
+
+Persists agent state to `.agent_os_state.json` in the working directory root.
diff --git a/skills/project-manager/SKILL.md b/skills/project-manager/SKILL.md
new file mode 100644
index 0000000..7fd4fb7
--- /dev/null
+++ b/skills/project-manager/SKILL.md
@@ -0,0 +1,88 @@
+---
+name: project-manager
+description: >
+  Central OS Agent for multi-agent development workflows. Tracks live agent states (ID,
+  specialization, work item, active files, skills, git status, health/blockers), maintains
+  .agent_os_state.json, enforces dependency locking between agents, and renders a scannable
+  CLI dashboard. Use when asked for "system status", "agent dashboard", "project status",
+  or to track/manage agents across a multi-agent workflow.
+---
+
+You are the Central Operating System (OS) Agent for a multi-agent development workflow. Your job is to aggregate, visualize, and orchestrate the live operational states of all active agents working on the project.
+
+## Core State Tracking
+
+Track these metrics for every agent:
+- **Agent ID & Specialization** (e.g., UI-Agent, Auth-Agent, Test-Agent)
+- **Current Work Item / Ticket ID** (linked to the project roadmap)
+- **Active File Paths** (exact files the agent is currently reading or writing)
+- **Skills Loaded** (e.g., git-tools, db-migrator, regex-parser, test-runner)
+- **Git/Push Status** (e.g., Clean, 3 commits ahead, Uncommitted changes, Merge conflict)
+- **Health & Blockers** (e.g., Idle, Executing, Blocked by Agent-02)
+
+## State Management Rules
+
+1. **State Synchronization**: Create and maintain `.agent_os_state.json` in the root directory. Read and update this file whenever an agent reports a state change.
+2. **Dependency Locking**: If Agent-A's push status is "Pending Review" and touches files that Agent-B needs, automatically flag Agent-B as "Blocked" and notify the supervisor.
+3. **Skill Management**: Dynamically recommend or inject specific tool scripts or "skills" into an agent's workspace based on their current execution errors or Git push failures.
+4. **Terminal Dashboard**: Provide a scannable CLI dashboard layout showing holistic project health.
+
+## State File Schema
+
+Maintain `.agent_os_state.json` with this shape:
+
+```json
+{
+  "last_updated": "ISO-8601 timestamp",
+  "agents": {
+    "Agent-01": {
+      "specialization": "auth",
+      "ticket": "TICKET-42",
+      "active_files": ["src/auth/login.ts"],
+      "skills": ["git", "web-scrap"],
+      "git_status": "2 commits ahead",
+      "health": "RUNNING",
+      "blocker": null
+    }
+  },
+  "locks": {
+    "src/auth/login.ts": "Agent-01"
+  }
+}
+```
+
+When asked to update an agent's state, read the current `.agent_os_state.json`, apply the change, and write it back. If the file doesn't exist, create it with an empty agents object.
+
+## System Status Dashboard Output
+
+When asked for "system status", output this exact markdown structure:
+
+```
+## 🖥️ AGENT OS DIGITAL TWIN DASHBOARD
+-----------------------------------------------------------------------
+[Agent ID]    [Active Skill]    [Git Status]    [Current Task]   [Status]
+-----------------------------------------------------------------------
+Agent-01      git, web-scrap     2 Commits Ahead Fix Login Bug    ⚙️ RUNNING
+Agent-02      db-schema, sql     Clean           Migrate Users    ⏸️ IDLE
+Agent-03      jest, coverage     Uncommitted     Write UI Tests   🛑 BLOCKED (by 01)
+
+### 🔄 Active Interactivity Matrix:
+- Agent-01 is modifying `src/auth/login.ts`.
+- Agent-03 execution paused: Target file `src/auth/login.ts` is locked by Agent-01.
+
+### 🛠️ Required Actions:
+- [ ] Merge Agent-01 branch to release Agent-03.
+- [ ] Load `performance-profiler` skill to Agent-02 for the upcoming database migration.
+```
+
+Status icons: ⚙️ RUNNING | ⏸️ IDLE | 🛑 BLOCKED | ✅ DONE | ⚠️ ERROR
+
+## Commands
+
+- **"system status"** / **"dashboard"**: Render the full dashboard from `.agent_os_state.json`.
+- **"register agent [ID] as [specialization]"**: Add a new agent entry to state.
+- **"update [Agent-ID] status to [status]"**: Update an agent's health field.
+- **"agent [ID] is working on [file]"**: Lock the file to that agent, check for conflicts.
+- **"agent [ID] finished [file]"**: Release the file lock.
+- **"check blockers"**: List all agents blocked and why.
+- **"recommend skills for [Agent-ID]"**: Based on their current git/error state, suggest skills to inject.

From 3c0e6d53aee738c42a3c190c5a0d8d5361a99e7a Mon Sep 17 00:00:00 2001
From: Jay Vucic <jay.vucic@gmail.com>
Date: Fri, 29 May 2026 12:18:16 -0700
Subject: [PATCH 2/3] =?UTF-8?q?feat(skills):=20SKL-REMED=20full=20library?=
 =?UTF-8?q?=20remediation=20=E2=80=94=20Sprints=200-7?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Sprint 0 — P0 bug fixes:
- domain-name-brainstormer: remove fabricated availability claims
- jira: fix MCP namespace (mcp__atlassian__ → mcp__claude_ai_Atlassian_Rovo__), remove broken jira-link-issues ref
- mui: fix Grid v2 size prop syntax, replace broken useMuiSnackbar hook ref
- meme-factory: resolve ~n vs %0A encoding contradiction (canonical: %0A)
- reducing-entropy: inline worse-is-better philosophy, fix broken reference
- professional-communication: wire orphaned remote-async-communication.md trigger

Sprint 1 — Library templates:
- NEVER-list-template.md, Mindset-Navigation-Philosophy-template.md
- references-directory-convention.md, README-policy.md, error-handling-stub-template.md

Sprints 2-5 — All 50 skills remediated:
- Canonical Mindset/Navigation/Philosophy/NEVER/When-Things-Go-Wrong sections added to every skill
- 19 F-tier skills: full structural rebuilds with expert knowledge injection
- 16 D-tier skills: NEVER lists expanded, error handling added, references/ created
- 15 C-tier skills: targeted gap fixes, inline content moved to references/

Sprint 7 — Structural repairs + new skills:
- New skills: skill-creator, api-design, security-review, nist-800-61r3-incident-coordinator
- NIST CSF element registry consolidated into nist-800-61r3-shared/references/
- react-dev: 461 → 96 lines; 5 reference files created
- 10 cross-skill Navigation callouts wired
- 5 frontmatter descriptions improved (react-dev, humanizer, command-creator, game-changing-features, ship-learn-next)
- Orphaned root-level files relocated to references/ (react-useeffect, writing-clearly-and-concisely)
- session-handoff evals/ directory removed (Pattern 8 auxiliary files)

Library baseline → post-remediation:
- Average score: 75/120 (D) → est. 87-90/120 (C+)
- F-grade skills: 19 → 0
- Total skills: 50 → 55

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
 ...125002-agents-md-refactor-config-update.md | 111 +++
 Mindset-Navigation-Philosophy-template.md     | 225 +++++
 NEVER-list-template.md                        | 101 +++
 README-policy.md                              |  71 ++
 error-handling-stub-template.md               | 193 +++++
 references-directory-convention.md            | 135 +++
 skills/agent-md-refactor/SKILL.md             | 294 ++-----
 .../references/anti-patterns-catalog.md       |  88 ++
 .../references/refactor-workflow.md           | 103 +++
 skills/api-design/SKILL.md                    | 412 ++++++++++
 .../backend-to-frontend-handoff-docs/SKILL.md | 163 ++--
 .../references/TEMPLATE.md                    | 161 ++++
 skills/c4-architecture/SKILL.md               | 316 +++----
 skills/codex/SKILL.md                         | 138 ++--
 .../references/model-sandbox-reference.md     |  55 ++
 skills/command-creator/SKILL.md               | 252 +++---
 skills/commit-work/SKILL.md                   | 151 ++--
 skills/crafting-effective-readmes/SKILL.md    | 114 +--
 skills/daily-meeting-update/SKILL.md          | 433 +++-------
 .../references/async-remote-patterns.md       | 111 +++
 .../references/integration-details.md         |  97 +++
 skills/database-schema-designer/SKILL.md      | 735 ++---------------
 skills/datadog-cli/SKILL.md                   | 166 ++--
 skills/dependency-updater/SKILL.md            | 511 ++----------
 .../references/commands-by-language.md        | 194 +++++
 skills/design-system-starter/SKILL.md         | 626 +++-----------
 .../SKILL.md                                  | 278 ++-----
 skills/domain-name-brainstormer/SKILL.md      | 237 ++----
 .../references/naming-frameworks.md           |  83 ++
 skills/draw-io/SKILL.md                       | 346 +++-----
 skills/excalidraw/SKILL.md                    | 261 ++----
 .../references/delegation-templates.md        | 130 +++
 skills/feedback-mastery/SKILL.md              | 322 ++------
 .../frontend-to-backend-requirements/SKILL.md | 217 ++---
 .../references/elicitation-questions.md       |  47 ++
 .../references/output-template.md             |  77 ++
 .../references/worked-example.md              | 137 ++++
 skills/game-changing-features/SKILL.md        | 295 ++-----
 .../references/opportunity-categories.md      |  56 ++
 skills/gemini/SKILL.md                        | 272 ++----
 skills/gemini/references/model-comparison.md  |  43 +
 skills/gemini/references/prompt-patterns.md   |  76 ++
 skills/gepetto/SKILL.md                       |  87 +-
 skills/humanizer/SKILL.md                     | 453 ++--------
 skills/humanizer/references/patterns.md       | 225 +++++
 skills/jira/SKILL.md                          | 190 +++--
 skills/jira/references/cloud-vs-server.md     | 176 ++++
 skills/jira/references/mcp.md                 |  60 +-
 skills/lesson-learned/SKILL.md                | 167 ++--
 .../references/git-edge-cases.md              |  98 +++
 .../references/output-template.md             |  70 ++
 skills/marp-slide/SKILL.md                    | 336 +++-----
 skills/meme-factory/README.md                 |   2 +-
 skills/meme-factory/SKILL.md                  | 364 ++-------
 skills/mermaid-diagrams/SKILL.md              | 289 +++----
 skills/mui/SKILL.md                           | 535 ++----------
 skills/naming-analyzer/SKILL.md               | 377 ++-------
 .../references/safe-rename-checklist.md       |  69 ++
 .../semantic-analysis-frameworks.md           |  92 +++
 .../SKILL.md                                  | 274 +++----
 .../references/aar-completeness-checklist.md  |  48 ++
 .../comms-recovery-lessons-checklist.md       |  97 +++
 .../root-cause-evidence-checklist.md          |  73 ++
 skills/nist-800-61r3-csf-mapper/SKILL.md      | 218 +++--
 .../references/csf-element-baseline.md        |  12 +
 skills/nist-800-61r3-gap-analyzer/SKILL.md    | 227 ++---
 .../references/csf-element-baseline.md        |  12 +
 .../SKILL.md                                  | 345 ++++++++
 skills/nist-800-61r3-maturity-scorer/SKILL.md | 185 ++---
 .../references/element-registry.md            |  13 +
 skills/nist-800-61r3-policy-reviewer/SKILL.md | 249 +++---
 .../references/policy-checklist.md            | 123 +++
 .../SKILL.md                                  | 275 +++----
 .../references/r-item-catalog.md              | 189 +++++
 .../references/csf-element-registry.md        | 209 +++++
 skills/openapi-to-typescript/SKILL.md         | 405 ++-------
 .../references/edge-cases.md                  | 157 ++++
 skills/perplexity/SKILL.md                    | 182 ++---
 skills/plugin-forge/SKILL.md                  | 248 ++----
 skills/professional-communication/SKILL.md    | 286 ++-----
 skills/project-manager/SKILL.md               | 147 ++--
 .../references/dashboard-format.md            |  50 ++
 .../references/state-schema.md                |  47 ++
 skills/qa-test-planner/SKILL.md               | 772 ++----------------
 .../qa-test-planner/references/heuristics.md  | 164 ++++
 skills/react-dev/SKILL.md                     | 425 ++--------
 .../references/component-patterns.md          | 110 +++
 .../references/generic-components.md          | 172 ++++
 skills/react-dev/references/hooks-typing.md   | 145 ++++
 .../react-dev/references/react-19-changes.md  | 110 +++
 .../react-dev/references/server-components.md | 129 +++
 skills/react-useeffect/SKILL.md               | 109 ++-
 .../{ => references}/alternatives.md          |   0
 .../{ => references}/anti-patterns.md         |   0
 skills/reducing-entropy/SKILL.md              | 131 ++-
 .../adding-reference-mindsets.md              |  96 ---
 .../references/worse-is-better.md             |  49 ++
 skills/requirements-clarity/README.md         | 260 ------
 skills/requirements-clarity/SKILL.md          | 353 ++------
 .../references/clarity-scoring.md             |  41 +
 .../references/prd-template.md                |  81 ++
 skills/security-review/SKILL.md               | 176 ++++
 skills/session-handoff/SKILL.md               | 215 ++---
 .../evals/model-expectations.md               | 170 ----
 .../evals/results-opus-baseline.md            |  88 --
 .../session-handoff/evals/setup_test_env.py   | 415 ----------
 .../session-handoff/evals/test-scenarios.md   | 222 -----
 skills/ship-learn-next/SKILL.md               | 350 ++------
 .../references/content-extraction.md          |  60 ++
 .../references/rep-templates.md               |  86 ++
 skills/skill-creator/SKILL.md                 | 158 ++++
 .../skill-creator/references/antipatterns.md  | 104 +++
 skills/skill-judge/README.md                  | 245 ------
 skills/skill-judge/SKILL.md                   | 751 ++---------------
 .../references/failure-patterns.md            |  74 ++
 .../skill-judge/references/scoring-rubric.md  | 182 +++++
 .../skill-judge/references/worked-example.md  | 122 +++
 skills/web-to-markdown/README.md              | 204 -----
 skills/web-to-markdown/SKILL.md               | 116 +--
 .../references/wait-strategies.md             |  62 ++
 skills/writing-clearly-and-concisely/SKILL.md | 135 +--
 .../{ => references}/signs-of-ai-writing.md   |   0
 122 files changed, 10993 insertions(+), 12583 deletions(-)
 create mode 100644 .claude/handoffs/2026-05-27-125002-agents-md-refactor-config-update.md
 create mode 100644 Mindset-Navigation-Philosophy-template.md
 create mode 100644 NEVER-list-template.md
 create mode 100644 README-policy.md
 create mode 100644 error-handling-stub-template.md
 create mode 100644 references-directory-convention.md
 create mode 100644 skills/agent-md-refactor/references/anti-patterns-catalog.md
 create mode 100644 skills/agent-md-refactor/references/refactor-workflow.md
 create mode 100644 skills/api-design/SKILL.md
 create mode 100644 skills/backend-to-frontend-handoff-docs/references/TEMPLATE.md
 create mode 100644 skills/codex/references/model-sandbox-reference.md
 create mode 100644 skills/daily-meeting-update/references/async-remote-patterns.md
 create mode 100644 skills/daily-meeting-update/references/integration-details.md
 create mode 100644 skills/dependency-updater/references/commands-by-language.md
 create mode 100644 skills/domain-name-brainstormer/references/naming-frameworks.md
 create mode 100644 skills/excalidraw/references/delegation-templates.md
 create mode 100644 skills/frontend-to-backend-requirements/references/elicitation-questions.md
 create mode 100644 skills/frontend-to-backend-requirements/references/output-template.md
 create mode 100644 skills/frontend-to-backend-requirements/references/worked-example.md
 create mode 100644 skills/game-changing-features/references/opportunity-categories.md
 create mode 100644 skills/gemini/references/model-comparison.md
 create mode 100644 skills/gemini/references/prompt-patterns.md
 create mode 100644 skills/humanizer/references/patterns.md
 create mode 100644 skills/jira/references/cloud-vs-server.md
 create mode 100644 skills/lesson-learned/references/git-edge-cases.md
 create mode 100644 skills/lesson-learned/references/output-template.md
 create mode 100644 skills/naming-analyzer/references/safe-rename-checklist.md
 create mode 100644 skills/naming-analyzer/references/semantic-analysis-frameworks.md
 create mode 100644 skills/nist-800-61r3-after-action-reviewer/references/aar-completeness-checklist.md
 create mode 100644 skills/nist-800-61r3-after-action-reviewer/references/comms-recovery-lessons-checklist.md
 create mode 100644 skills/nist-800-61r3-after-action-reviewer/references/root-cause-evidence-checklist.md
 create mode 100644 skills/nist-800-61r3-csf-mapper/references/csf-element-baseline.md
 create mode 100644 skills/nist-800-61r3-gap-analyzer/references/csf-element-baseline.md
 create mode 100644 skills/nist-800-61r3-incident-coordinator/SKILL.md
 create mode 100644 skills/nist-800-61r3-maturity-scorer/references/element-registry.md
 create mode 100644 skills/nist-800-61r3-policy-reviewer/references/policy-checklist.md
 create mode 100644 skills/nist-800-61r3-recommendation-auditor/references/r-item-catalog.md
 create mode 100644 skills/nist-800-61r3-shared/references/csf-element-registry.md
 create mode 100644 skills/openapi-to-typescript/references/edge-cases.md
 create mode 100644 skills/project-manager/references/dashboard-format.md
 create mode 100644 skills/project-manager/references/state-schema.md
 create mode 100644 skills/qa-test-planner/references/heuristics.md
 create mode 100644 skills/react-dev/references/component-patterns.md
 create mode 100644 skills/react-dev/references/generic-components.md
 create mode 100644 skills/react-dev/references/hooks-typing.md
 create mode 100644 skills/react-dev/references/react-19-changes.md
 create mode 100644 skills/react-dev/references/server-components.md
 rename skills/react-useeffect/{ => references}/alternatives.md (100%)
 rename skills/react-useeffect/{ => references}/anti-patterns.md (100%)
 delete mode 100644 skills/reducing-entropy/adding-reference-mindsets.md
 create mode 100644 skills/reducing-entropy/references/worse-is-better.md
 delete mode 100644 skills/requirements-clarity/README.md
 create mode 100644 skills/requirements-clarity/references/clarity-scoring.md
 create mode 100644 skills/requirements-clarity/references/prd-template.md
 create mode 100644 skills/security-review/SKILL.md
 delete mode 100644 skills/session-handoff/evals/model-expectations.md
 delete mode 100644 skills/session-handoff/evals/results-opus-baseline.md
 delete mode 100755 skills/session-handoff/evals/setup_test_env.py
 delete mode 100644 skills/session-handoff/evals/test-scenarios.md
 create mode 100644 skills/ship-learn-next/references/content-extraction.md
 create mode 100644 skills/ship-learn-next/references/rep-templates.md
 create mode 100644 skills/skill-creator/SKILL.md
 create mode 100644 skills/skill-creator/references/antipatterns.md
 delete mode 100644 skills/skill-judge/README.md
 create mode 100644 skills/skill-judge/references/failure-patterns.md
 create mode 100644 skills/skill-judge/references/scoring-rubric.md
 create mode 100644 skills/skill-judge/references/worked-example.md
 delete mode 100644 skills/web-to-markdown/README.md
 create mode 100644 skills/web-to-markdown/references/wait-strategies.md
 rename skills/writing-clearly-and-concisely/{ => references}/signs-of-ai-writing.md (100%)

diff --git a/.claude/handoffs/2026-05-27-125002-agents-md-refactor-config-update.md b/.claude/handoffs/2026-05-27-125002-agents-md-refactor-config-update.md
new file mode 100644
index 0000000..05e4427
--- /dev/null
+++ b/.claude/handoffs/2026-05-27-125002-agents-md-refactor-config-update.md
@@ -0,0 +1,111 @@
+# Handoff: [TASK_TITLE - replace this]
+
+## Session Metadata
+- Created: 2026-05-27 12:50:02
+- Project: /home/nexxicon/soar/agent-toolkit
+- Branch: main
+- Session duration: [estimate how long you worked]
+
+### Recent Commits (for context)
+  - 2afb3b1 feat(skills): add NIST 800-61r3 reviewer suite and project-manager skill
+  - 3027f20 docs: add CONTRIBUTING.md (#16)
+  - 62b5df5 fix(ci): skip dist/ validation on push to main
+  - 7e4ba85 chore: auto-bump version [skip ci]
+  - 0c042f2 feat: add lesson-learned skill (#11)
+
+## Handoff Chain
+
+- **Continues from**: None (fresh start)
+- **Supersedes**: None
+
+> This is the first handoff for this task.
+
+## Current State Summary
+
+[TODO: Write one paragraph describing what was being worked on, current status, and where things left off]
+
+## Codebase Understanding
+
+### Architecture Overview
+
+[TODO: Document key architectural insights discovered during this session]
+
+### Critical Files
+
+| File | Purpose | Relevance |
+|------|---------|-----------|
+| [TODO: Add critical files] | | |
+
+### Key Patterns Discovered
+
+[TODO: Document important patterns, conventions, or idioms found in this codebase]
+
+## Work Completed
+
+### Tasks Finished
+
+- [ ] [TODO: List completed tasks]
+
+### Files Modified
+
+| File | Changes | Rationale |
+|------|---------|-----------|
+| [no modified files detected] | | |
+
+### Decisions Made
+
+| Decision | Options Considered | Rationale |
+|----------|-------------------|-----------|
+| [TODO: Document key decisions] | | |
+
+## Pending Work
+
+### Immediate Next Steps
+
+1. [TODO: Most critical next action]
+2. [TODO: Second priority]
+3. [TODO: Third priority]
+
+### Blockers/Open Questions
+
+- [ ] [TODO: List any blockers or open questions]
+
+### Deferred Items
+
+- [TODO: Items deferred and why]
+
+## Context for Resuming Agent
+
+### Important Context
+
+[TODO: This is the MOST IMPORTANT section - write critical information the next agent MUST know]
+
+### Assumptions Made
+
+- [TODO: List assumptions made during this session]
+
+### Potential Gotchas
+
+- [TODO: Document things that might trip up a new agent]
+
+## Environment State
+
+### Tools/Services Used
+
+- [TODO: List relevant tools and their configuration]
+
+### Active Processes
+
+- [TODO: Note any running processes, servers, etc.]
+
+### Environment Variables
+
+- [TODO: List relevant env var NAMES only - NEVER include actual values/secrets]
+
+## Related Resources
+
+- [TODO: Add links to relevant docs and files]
+
+---
+
+**Security Reminder**: Before finalizing, run `validate_handoff.py` to check for accidental secret exposure.
diff --git a/Mindset-Navigation-Philosophy-template.md b/Mindset-Navigation-Philosophy-template.md
new file mode 100644
index 0000000..4648ba2
--- /dev/null
+++ b/Mindset-Navigation-Philosophy-template.md
@@ -0,0 +1,225 @@
+# Mindset / Navigation / Philosophy — Skill Header Template
+
+**Canonical standard for the opening sections of all agent-toolkit skills.**
+
+These three sections come FIRST in every `SKILL.md`, before any workflow content, quick-reference tables, or technical reference material. They are not optional boilerplate — they are the load-bearing contract between the skill and the agent that activates it.
+
+---
+
+## Why Three Sections
+
+Skills are activated by agents making autonomous routing decisions. Without explicit guidance, agents over-trigger on superficial keyword matches, under-trigger on ambiguous phrasing, and default to novice-level judgment when domain expertise is what the user actually needs. These three sections address each failure mode directly:
+
+| Section | Failure it prevents | What it provides |
+|---|---|---|
+| **Mindset** | Generic, novice-quality output | Expert stances and hard-won heuristics |
+| **Navigation** | Wrong-context activation | Explicit when-to-use and when-NOT-to-use gates |
+| **Philosophy** | Inconsistent decisions across the workflow | One overarching principle that resolves all trade-offs |
+
+---
+
+## Section 1: Mindset
+
+**Purpose:** Communicate what distinguishes expert thinking from novice thinking in this domain. These are not definitions — they are the stances an expert holds reflexively, things that took years of practice to internalize.
+
+**Format rules:**
+- 3–5 bullet heuristics, each starting with a bold "expert stance" phrase
+- Each bullet should read like a hard-won lesson, not a textbook summary
+- Do not pad to reach 5 bullets — 3 sharp heuristics beat 5 generic ones
+- Do not use sub-bullets; each heuristic stands alone
+
+**Fill-in template:**
+
+```markdown
+## Mindset
+
+- **[Expert stance phrase]** — [One to two sentences explaining why novices get this wrong and what the expert sees instead.]
+- **[Expert stance phrase]** — [One to two sentences explaining the counterintuitive insight or priority inversion an expert applies.]
+- **[Expert stance phrase]** — [One to two sentences on what the expert refuses to do or what trap they have learned to recognize immediately.]
+- **[Expert stance phrase]** — [Optional: a constraint or quality bar that experts enforce that novices skip.]
+- **[Expert stance phrase]** — [Optional: a systemic or long-term perspective that novices miss because they are focused on the immediate task.]
+```
+
+---
+
+## Section 2: Navigation
+
+**Purpose:** Tell the agent exactly when to activate this skill and — equally important — when NOT to activate it. Skills that never specify exclusion criteria get triggered in wrong contexts, producing off-target output and consuming context budget on irrelevant material.
+
+**Format rules:**
+- Include a "Use when" list with concrete trigger phrases or conditions
+- Include a "Do not use when" list — this is mandatory, not optional
+- Include a short decision tree or disambiguation block for the ambiguous middle cases
+- The "do not use when" cases must be specific enough to be actionable (not "when the topic is unrelated")
+
+**Fill-in template:**
+
+```markdown
+## Navigation
+
+**Use this skill when:**
+- [Specific trigger phrase or condition — e.g., "user asks to design a component library"]
+- [Another trigger — e.g., "user mentions tokens, theming, or visual consistency"]
+- [Another trigger — name the verbs, nouns, or intent signals that indicate this skill]
+
+**Do not use this skill when:**
+- [Exclusion case — e.g., "user wants a single one-off component, not a system"]
+- [Exclusion case — e.g., "scope is a bug fix or refactor, not new design work"]
+- [Exclusion case — the adjacent skill that covers this instead, and when to hand off to it]
+
+**Ambiguous inputs — quick decision tree:**
+- If the request mentions [signal A] → [route or action]
+- If the request mentions [signal B] → [route or action]
+- If unclear → [default behavior or clarifying question to ask]
+```
+
+---
+
+## Section 3: Philosophy
+
+**Purpose:** State the one core principle that drives every decision in this skill's domain. When two approaches are in tension, this principle is what breaks the tie. It is not a list of values — it is a single governing idea expressed in 1–3 sentences.
+
+**Format rules:**
+- Prose only — no bullet lists, no headers, no tables
+- 1–3 sentences maximum
+- Must be opinionated enough to actually resolve trade-offs
+- Avoid generic platitudes ("quality matters", "keep it simple") — the statement must be specific enough that it would lead to a different decision than the opposite stance would
+
+**Fill-in template:**
+
+```markdown
+## Philosophy
+
+[One to three sentences stating the governing principle of this skill's domain. This should be specific enough that a reader could use it to resolve a trade-off — e.g., it should tell you whether to favor flexibility or convention, speed or correctness, user control or sensible defaults. It is not a mission statement; it is a decision rule.]
+```
+
+---
+
+## Placement in SKILL.md
+
+The three sections appear immediately after the frontmatter and the skill's one-line description, before any workflow, process, or reference content:
+
+```markdown
+---
+name: skill-name
+description: One sentence. Trigger phrases. When to use.
+---
+
+# Skill Title
+
+Brief description of what the skill does.
+
+## Mindset
+
+...
+
+## Navigation
+
+...
+
+## Philosophy
+
+...
+
+---
+
+## [First workflow or reference section]
+```
+
+Do not insert quick-reference tables, "How It Works" sections, or script documentation before the three header sections. Agents read top-to-bottom — if stance and routing gates are buried below workflow prose, they are already past the activation decision by the time they encounter them.
+
+---
+
+## Worked Example 1: Design Skill (creative domain)
+
+**Skill:** `design-system-starter`
+
+```markdown
+## Mindset
+
+- **Tokens before components** — Novices reach for a Button component; experts reach for the color and spacing contract first. A component built on ad-hoc values is a component that breaks the moment the brand evolves.
+- **Consistency is a feature, not a constraint** — The value of a design system compounds only when deviation requires a deliberate exception. Every one-off shortcut is debt owed to the next designer who inherits the codebase.
+- **Accessibility is load-bearing, not decorative** — Contrast ratios and focus states are not a checklist appended at the end; they are structural decisions that determine whether the component architecture is valid. Design for the constraint first.
+- **The system serves the product, not the other way around** — A design system that blocks shipping is worse than no design system. Scope aggressively: establish the 20% of tokens and components that cover 80% of surfaces, then expand.
+
+## Navigation
+
+**Use this skill when:**
+- User asks to create, establish, or audit a design system or component library
+- User mentions design tokens, theming, color contracts, or spacing scales
+- User wants to enforce visual consistency across a multi-surface product
+- User asks for dark mode architecture or brand-level theming
+
+**Do not use this skill when:**
+- User wants a single component or a one-off UI element — use `react-dev` or `mui` instead
+- User is refactoring or debugging an existing component, not architecting a system
+- User wants a static mockup or prototype — this skill produces production-grade architecture, not wireframes
+- The product has one screen and no foreseeable reuse — over-engineering costs more than it saves
+
+**Ambiguous inputs — quick decision tree:**
+- If user says "component library" with no mention of tokens or theming → clarify whether they want a full system or a set of isolated components
+- If user says "design system" for a solo/prototype project → proceed but scope to minimal tokens + 3–5 components, flag when to stop
+- If unclear → ask: "Are you building something reusable across multiple surfaces, or is this for a single product?"
+
+## Philosophy
+
+A design system is a bet that the cost of the constraints it imposes now is lower than the cost of the inconsistency it prevents later. Every architectural decision in this skill optimizes for the long-term maintainability of that bet — not for the fastest path to a visible result.
+```
+
+---
+
+## Worked Example 2: Technical Skill (API integration domain)
+
+**Skill:** `openapi-to-typescript`
+
+```markdown
+## Mindset
+
+- **The spec is the source of truth, not the implementation** — If the OpenAPI spec says a field is optional but the server always returns it, the spec is what you type against. The server can change; the contract should not.
+- **Generate, do not hand-write** — Hand-written types drift. The moment a developer manually edits a generated type, that type becomes a maintenance liability. Generate, validate, regenerate.
+- **Discriminated unions over optional fields** — A response type with five optional fields and no discriminant is not a type; it is a guess. Push back on schemas that conflate multiple response shapes into a single flat object.
+- **Treat 4xx and 5xx as first-class types** — Error responses are not exceptions to be caught; they are values to be typed. A client that does not model error shapes will produce runtime surprises that a type system could have caught at build time.
+- **Version the contract, not the workaround** — When the API changes, update the spec and regenerate. Do not add conditional logic to paper over a breaking change in the generated types.
+
+## Navigation
+
+**Use this skill when:**
+- User has an OpenAPI (2.x or 3.x) or Swagger spec and wants TypeScript types
+- User asks to generate a typed API client from a spec file
+- User wants to validate that their TypeScript client matches a remote API contract
+- User mentions `openapi-typescript`, `swagger-codegen`, or `orval` in context
+
+**Do not use this skill when:**
+- User is writing a REST client by hand with no spec — use `react-dev` patterns instead
+- The spec is internal and already has a generated SDK maintained by another team — consume that SDK, do not regenerate
+- User wants GraphQL types — different toolchain, different skill
+- User wants to *write* an OpenAPI spec from scratch — this skill consumes specs, it does not author them
+
+**Ambiguous inputs — quick decision tree:**
+- If user pastes a JSON/YAML blob → check for `openapi:` or `swagger:` key; if present, proceed; if absent, ask what format it is
+- If user says "generate types for my API" with no spec → ask for the spec URL or file before proceeding
+- If the spec has no `components/schemas` → warn that output will be limited to endpoint-level types only
+
+## Philosophy
+
+Type safety at the API boundary is only as strong as the discipline to keep the generated types current. This skill treats regeneration as a routine operation and manual type editing as a red flag — the goal is a workflow where the TypeScript compiler enforces the API contract automatically, without ongoing human maintenance.
+```
+
+---
+
+## Common Mistakes to Avoid
+
+**In Mindset:**
+- Listing definitions ("Tokens are reusable values") instead of stances ("Tokens before components")
+- Padding with obvious advice that applies to any domain ("test your work", "read the docs")
+- Exceeding 5 bullets — more heuristics signal less curation
+
+**In Navigation:**
+- Omitting the "do not use when" block — this is the most common and most costly omission
+- Writing exclusions too vaguely ("when not relevant") — name the adjacent skill or the specific condition
+- Skipping the ambiguous-inputs block — ambiguity is where wrong-context activation happens most
+
+**In Philosophy:**
+- Writing a list formatted as prose ("First do X, then Y, then Z")
+- Using generic language that would be true for any skill ("quality matters", "think about the user")
+- Stating a process instead of a principle — philosophy answers "why", not "how"
diff --git a/NEVER-list-template.md b/NEVER-list-template.md
new file mode 100644
index 0000000..43c2114
--- /dev/null
+++ b/NEVER-list-template.md
@@ -0,0 +1,101 @@
+# NEVER List Template — agent-toolkit Canonical Standard
+
+## Purpose
+
+NEVER lists encode hard-earned practitioner knowledge about failure modes that are not obvious from documentation, type signatures, or surface-level reading of code. They exist because some mistakes are silent, some consequences are delayed, and some anti-patterns look correct until they destroy something in production.
+
+A vague NEVER item ("NEVER do X — it causes errors") is nearly worthless. The value is in the mechanism: WHY does it fail, and HOW does it fail in a way that would surprise a competent engineer? NEVER items must name the non-obvious consequence — the corrupted state, the silent discard, the race condition, the phantom retry — so the reader understands the failure mode, not just the prohibition.
+
+## Format Standard
+
+Each NEVER item must contain two parts:
+
+1. **The specific thing to avoid** — not a category, not a vague verb. Name the exact action, flag, parameter pattern, call sequence, or assumption.
+2. **The non-obvious reason WHY** — not "it causes errors" or "it breaks things." Explain the mechanism: what gets corrupted, what gets silently ignored, what downstream system misreads the output, what timing assumption breaks.
+
+Template structure:
+
+```
+- NEVER [specific action or pattern] — because [precise mechanism of failure from practitioner experience]
+```
+
+Both parts are required. An item missing either part must be revised before it is accepted into a SKILL.md.
+
+## Minimum Requirement
+
+Every `SKILL.md` in the agent-toolkit library **must contain a dedicated `## NEVER` section with at least 5 items** that meet the format standard above.
+
+Skills with fewer than 5 items, items missing the WHY clause, or items with vague WHY clauses ("it may fail," "it could cause issues") do not meet the standard and must be updated before the skill is considered complete.
+
+## Placement
+
+The `## NEVER` section belongs **near the top of `SKILL.md`, before the first workflow section** (before `## Usage`, `## Steps`, `## Workflow`, or equivalent). Practitioners reading a new skill should encounter failure modes before they encounter instructions — the NEVER list is a pre-flight checklist, not an appendix.
+
+Recommended order in `SKILL.md`:
+
+```
+# Skill Name
+[one-line description]
+
+## NEVER
+[5+ items]
+
+## [First workflow or usage section]
+...
+```
+
+## Fill-In Template
+
+Copy this block into a new `SKILL.md` and replace each placeholder:
+
+```markdown
+## NEVER
+
+- NEVER [specific action] — because [non-obvious consequence: what breaks, how it breaks, what the failure looks like]
+- NEVER [specific action] — because [non-obvious consequence: what breaks, how it breaks, what the failure looks like]
+- NEVER [specific action] — because [non-obvious consequence: what breaks, how it breaks, what the failure looks like]
+- NEVER [specific action] — because [non-obvious consequence: what breaks, how it breaks, what the failure looks like]
+- NEVER [specific action] — because [non-obvious consequence: what breaks, how it breaks, what the failure looks like]
+```
+
+Add additional items beyond 5 whenever field experience surfaces a new failure mode.
+
+## Examples: GOOD vs BAD
+
+### Example 1 — Git operations in shallow clones
+
+**BAD (vague):**
+> NEVER amend commits in shallow clones — it may cause push failures.
+
+**GOOD (specific + non-obvious mechanism):**
+> NEVER run `git commit --amend` inside a `--depth 1` shallow clone — because amend rewrites the tip commit without the full ancestry graph present, producing an orphan commit whose SHA is unknown to the remote; a subsequent `push --force` then overwrites the remote branch with a history that begins at the orphan, permanently discarding all prior commits that no local ref still points to.
+
+---
+
+### Example 2 — Airtable field ID substitution
+
+**BAD (vague):**
+> NEVER use field names instead of field IDs — it will not work.
+
+**GOOD (specific + non-obvious mechanism):**
+> NEVER substitute a human-readable field name (e.g., `"Status"`) for an Airtable field ID in filter or sort parameters — because the API silently accepts the name without error but treats it as an unknown field, returning an empty result set rather than raising a validation error; the caller receives zero records and has no indication that the filter was discarded rather than applied.
+
+---
+
+### Example 3 — Mermaid diagram rendering flags
+
+**BAD (vague):**
+> NEVER pass unsupported flags to mmdc — it causes rendering to fail.
+
+**GOOD (specific + non-obvious mechanism):**
+> NEVER pass `--no-sandbox` as a direct CLI flag to mmdc v11.x — because mmdc does not recognize it as a top-level argument and silently drops it, causing Puppeteer to launch with the sandbox enabled inside a rootless container, which then crashes with a cryptic `SIGILL` or permission error that appears to be a Chromium binary fault rather than a flag-parsing issue; the correct mechanism is to pass sandbox args through a Puppeteer config JSON file via the `-p` flag (e.g., `-p puppeteer.json` where the file contains `{"args":["--no-sandbox","--disable-setuid-sandbox"]}`).
+
+---
+
+## Checklist Before Submitting a NEVER Item
+
+- [ ] The action described is specific enough that a reader cannot misinterpret which action is prohibited
+- [ ] The WHY clause names a mechanism, not just an outcome
+- [ ] The failure mode would surprise a competent engineer who had not encountered it before
+- [ ] The item was derived from actual practitioner experience or a documented incident, not a general heuristic
+- [ ] The item is in the `## NEVER` section, placed before the first workflow section in the file
diff --git a/README-policy.md b/README-policy.md
new file mode 100644
index 0000000..1060297
--- /dev/null
+++ b/README-policy.md
@@ -0,0 +1,71 @@
+# README Policy
+
+## The Core Test
+
+Before adding any content to `README.md`, ask:
+
+> "Would a human user need this to set up the skill, or does an agent need this to execute a task?"
+
+- If a **human user** needs it to install, configure, or get started: it belongs in `README.md`.
+- If an **agent** needs it to understand behavior, follow a workflow, or produce correct output: it belongs in `SKILL.md` or `references/`.
+
+README.md is documentation for humans. SKILL.md is the instruction set for the agent. They must not duplicate each other.
+
+## What README.md Must Contain
+
+README.md should only include content in these four categories:
+
+1. **Installation steps** — how to add the skill to Claude Code, what files to copy, what commands to run.
+2. **Configuration** — environment variables, API keys, settings.json entries, file paths that need to be updated.
+3. **Quick-start** — a minimal example showing the skill working end-to-end, intended to confirm setup succeeded.
+4. **Changelog** — a record of significant changes across versions, for humans tracking upgrades.
+
+If a section does not fit one of these four categories, it does not belong in `README.md`.
+
+## What README.md Must NOT Contain
+
+Do not put any of the following in `README.md`:
+
+- Guidance that duplicates `SKILL.md` — if the skill body already explains how to use a feature, README must not re-explain it.
+- Examples that Claude needs to follow during task execution — these are agent instructions and belong in `SKILL.md`.
+- Workflow steps, decision logic, or output format rules — these are agent instructions.
+- Capability lists intended to help the agent understand what it can do — these belong in `SKILL.md`.
+- Reference tables, lookup data, or taxonomy lists — these belong in `references/`.
+- Any content written in second-person addressed to "you" where "you" means the agent, not the human reader.
+
+## When to Delete README.md
+
+If a skill has no meaningful installation steps, no configuration requirements, and no environment variables to set, delete the `README.md` entirely.
+
+A README that says only "This skill does X" or "Ask Claude to do Y" adds no value and creates a maintenance burden. Remove it.
+
+Signs a README should be deleted:
+- The entire README is a description of what the skill does (that description belongs in `SKILL.md` as the purpose statement).
+- The README has no install section or the install section says only "drop the file in the plugins folder."
+- The README duplicates the first paragraph of `SKILL.md` word-for-word.
+- The README was generated to satisfy a template and contains no real human-facing content.
+
+## README vs SKILL.md — Side by Side
+
+| Content | README.md | SKILL.md |
+|---|---|---|
+| Installation command | Yes | No |
+| API key setup | Yes | No |
+| settings.json entry | Yes | No |
+| Quick-start invocation example | Yes | No |
+| Changelog | Yes | No |
+| Workflow steps | No | Yes |
+| Output format rules | No | Yes |
+| Capability description | No | Yes |
+| Examples Claude must follow | No | Yes |
+| Loading triggers for references/ | No | Yes |
+
+## Enforcement
+
+When reviewing a skill's files, flag any README.md that:
+- Contains a "Usage" section with workflow guidance.
+- Contains a "Features" or "Capabilities" section.
+- Is longer than the skill's `SKILL.md`.
+- Contains language like "Claude will..." or "The agent should..." — these are agent instructions and belong in `SKILL.md`.
+
+Any flagged content should be moved to `SKILL.md` or deleted if it already exists there.
diff --git a/error-handling-stub-template.md b/error-handling-stub-template.md
new file mode 100644
index 0000000..4786f35
--- /dev/null
+++ b/error-handling-stub-template.md
@@ -0,0 +1,193 @@
+# Error Handling Stub Template for SKILL.md
+
+A reference for skill authors on why error handling matters, what the minimum requirements are, and how to write error sections that actually help agents recover rather than stall.
+
+---
+
+## Why Error Handling Matters in Skills
+
+Agents are stateless within a step. When a skill's workflow hits an unexpected state — a missing API key, an ambiguous command argument, a tool that returns no output — the agent has no built-in mechanism to recover. It will either:
+
+1. Loop: re-attempt the same failing action with the same parameters, indefinitely.
+2. Stall: stop and ask the user for clarification, producing an unhelpful open-ended question.
+3. Hallucinate a recovery: invent a plausible-looking next step that makes the situation worse.
+
+A skill without documented error paths is a skill that breaks silently in production. The skill author is the only person who knows what "success" looks like and therefore the only person who can define what "failure" looks like and what to do about it.
+
+**Error handling in a SKILL.md is not optional documentation. It is the contract between the skill and every agent that runs it.**
+
+---
+
+## Minimum Requirement
+
+Every skill that defines a workflow (a numbered sequence of steps) MUST document at least one error path. The error path must specify:
+
+- What the failure condition looks like (observable signal, not a feeling)
+- Why it typically occurs (root cause, not symptom)
+- What the agent should do next (a concrete action, not "try again")
+
+A skill with zero error paths will be rejected in review.
+
+---
+
+## Stub Template
+
+Copy this block into any `SKILL.md` that has a workflow section. Replace bracketed placeholders with real content before submitting.
+
+```markdown
+## When Things Go Wrong
+
+| Situation | Likely Cause | Recovery Action |
+|-----------|-------------|-----------------|
+| [most common tool/command failure] | [why it happens — missing dep, wrong env, bad arg] | [exact next step — which flag to change, which env var to set, which fallback to use] |
+| [ambiguous input state] | [why the input is ambiguous — two valid interpretations, missing required field] | [decision rule — how to pick between options without asking the user] |
+| [partial completion — some steps succeeded, then it stopped] | [why partial completion happens — rate limit, auth expiry mid-run, missing downstream dep] | [how to resume — idempotency check, which step to restart from, what state to verify first] |
+```
+
+Minimum: one row per failure mode you have personally observed or can reason about from the workflow. Three rows is a reasonable floor for any non-trivial skill.
+
+---
+
+## Three Failure Categories
+
+Treat these as distinct. Conflating them produces recovery actions that do not match the actual problem.
+
+### (a) Tool / Command Failures
+
+A tool returned a non-zero exit code, an HTTP error, or no output when output was expected.
+
+Examples: `npx` fails because the package is not installed; a CLI returns `401 Unauthorized`; a script exits with `command not found`.
+
+Recovery pattern: check the prerequisite (is the tool installed, is the auth valid, is the argument correctly formed), fix that specific thing, re-run the exact same step. Do not skip to the next step in the workflow.
+
+### (b) Ambiguous Input States
+
+The agent received input that matches two or more valid interpretations and cannot proceed without choosing one.
+
+Examples: the user said "deploy" without specifying environment; the config file has two entries with the same name; a search returns 40 results with no clear best match.
+
+Recovery pattern: apply a decision rule documented in the skill (prefer staging over production; use the most recently modified entry; pick the result with the highest relevance score). Only escalate to the user if the decision rule cannot resolve the ambiguity. When escalating, present the two options explicitly — do not ask an open-ended question.
+
+### (c) Partial Completions
+
+Some steps in the workflow completed successfully before the failure. The system is now in a mixed state: some resources exist, some do not.
+
+Examples: three of five files were uploaded before a rate limit hit; a database migration ran the first two steps before auth expired; a commit was staged but not pushed when the network dropped.
+
+Recovery pattern: before re-running, verify what already completed (check file existence, query state, read a log). Resume from the first incomplete step, not from the beginning. Document which steps are idempotent (safe to re-run) and which are not (create operations that would duplicate records).
+
+---
+
+## Worked Examples
+
+### Example 1: Tool Skill (Datadog CLI)
+
+A tool skill wraps a CLI or API. Failures are usually environment or network failures.
+
+```markdown
+## When Things Go Wrong
+
+| Situation | Likely Cause | Recovery Action |
+|-----------|-------------|-----------------|
+| `Error: Missing DD_API_KEY environment variable` | API key not exported in the current shell session | Run `export DD_API_KEY="..."` and `export DD_APP_KEY="..."` before retrying. Keys are at https://app.datadoghq.com/organization-settings/api-keys |
+| Query returns zero results when errors are expected | Time window is too narrow or query syntax uses wrong field name | Widen `--from` to `4h`, then to `24h`. If still empty, verify field names with `npx @leoflores/datadog-cli logs search --query "*" --from 15m` and compare field names in the output |
+| `npx` hangs or times out on first run | Large package download on slow connection | Kill the process, run `npx @leoflores/datadog-cli --version` once to cache the package, then retry the original command |
+| Non-US site returns 403 | Default site is `datadoghq.com`; account is on EU or another region | Add `--site datadoghq.eu` (or the correct regional site) to every command |
+```
+
+### Example 2: Creative Skill (Domain Name Brainstormer)
+
+A creative skill has no external tool failures but does have ambiguous input and quality failures.
+
+```markdown
+## When Things Go Wrong
+
+| Situation | Likely Cause | Recovery Action |
+|-----------|-------------|-----------------|
+| All suggested domains are taken | Common keywords are heavily registered | Pivot strategy: use compound words, portmanteaus, or invented words. Ask the user for 2-3 adjectives that describe the brand feeling (not the product) and generate a second pass |
+| User says "I don't like any of these" without elaboration | Suggestions missed an unstated constraint (length, tone, language) | Ask exactly two clarifying questions: maximum character count, and one word that captures the brand's personality. Do not regenerate until both are answered |
+| Availability check is inconclusive (WHOIS timeout) | WHOIS rate limiting or DNS propagation delay | Mark those domains as "unverified" and recommend the user check registrar.com manually. Do not present them as available |
+| User provides a name in a language other than English | Transliteration or cultural meaning may be undesirable | Flag potential issues (homonyms, negative connotations in target markets) before presenting the name as a recommendation |
+```
+
+### Example 3: Process Skill (Commit Work)
+
+A process skill orchestrates multi-step Git operations. Partial completions are the dominant failure mode.
+
+```markdown
+## When Things Go Wrong
+
+| Situation | Likely Cause | Recovery Action |
+|-----------|-------------|-----------------|
+| `git add -p` exits with no hunks selected | File has no unstaged changes or was already fully staged | Run `git status` and `git diff --cached` to confirm current state. If the file is already staged, proceed to the review step |
+| Pre-commit hook fails after staging | Hook detected lint errors, secrets, or test failures | Do NOT use `--no-verify`. Read the hook output, fix the specific violation, re-stage only the affected files, and retry the commit. Amending is not an option if the previous commit is on a remote branch |
+| Commit message is rejected by hook (subject too long, wrong type) | Project enforces Conventional Commits format with a subject length limit | Rewrite the subject line to `type(scope): short summary` under 72 characters. Common types: `feat`, `fix`, `chore`, `docs`, `refactor`, `test` |
+| `git diff --cached` shows unexpected files | Earlier patch staging accidentally included an unrelated hunk | Run `git restore --staged <path>` for the unexpected file, then re-run `git add -p` for that file only |
+```
+
+---
+
+## NEVER Patterns
+
+These patterns consistently cause agent loops or security failures. Do not use them in any skill.
+
+### NEVER: "Try again" without specifying what to change
+
+Bad:
+```
+If the command fails, try again with different parameters.
+```
+
+Why it fails: "different" is undefined. The agent will retry with identical parameters, or guess a change that makes the situation worse.
+
+Good:
+```
+If the command returns exit code 1, add the `--verbose` flag to see the full error, then fix the specific error shown before retrying.
+```
+
+### NEVER: Authentication failures without a documented fallback path
+
+Bad:
+```
+If authentication fails, check your credentials.
+```
+
+Why it fails: the agent will attempt to "check credentials" by reading environment variables, config files, and potentially trying alternative credential sources — exposing secrets in the process. It will not know when to stop.
+
+Good:
+```
+If authentication returns 401:
+1. Confirm the env var is set: `echo $MY_API_KEY` (shows length only — do not print the value).
+2. If the variable is empty, stop and ask the user to set it. Do not attempt to retrieve or infer the key.
+3. If the variable is set but auth still fails, the key may be expired or revoked. Direct the user to the credential management page (URL here). Do not try alternative keys.
+```
+
+### NEVER: Open-ended escalation questions
+
+Bad:
+```
+If unsure how to proceed, ask the user.
+```
+
+Why it fails: the agent produces a vague question ("How should I proceed?") that the user cannot usefully answer without already knowing the state of the workflow.
+
+Good:
+```
+If the ambiguity cannot be resolved by the decision rules above, stop and present the user with exactly two options in this format:
+"I found two matches: [option A] and [option B]. Which should I use?"
+Do not proceed until the user answers.
+```
+
+---
+
+## Checklist for Skill Authors
+
+Before submitting a skill with a workflow section, confirm:
+
+- [ ] At least one row in the "When Things Go Wrong" table
+- [ ] Every row names a concrete observable failure signal (not "something goes wrong")
+- [ ] Every recovery action specifies what to do, not just what the problem is
+- [ ] Authentication failures have a documented stop condition (agent will not try to discover credentials)
+- [ ] Partial completion steps identify which operations are idempotent
+- [ ] No row uses "try again" without a specific parameter or condition change
+- [ ] Ambiguous input states have a decision rule, not an open-ended user question
diff --git a/references-directory-convention.md b/references-directory-convention.md
new file mode 100644
index 0000000..a2e2259
--- /dev/null
+++ b/references-directory-convention.md
@@ -0,0 +1,135 @@
+# References Directory Convention
+
+## Purpose
+
+The `references/` directory holds supplementary content that a skill loads on demand rather than on every invocation. It exists to keep `SKILL.md` focused and concise while still making deep reference material available when needed.
+
+## When to Use references/
+
+Create a `references/` directory when:
+
+- The skill's `SKILL.md` would exceed 300 lines if all content were inlined.
+- Content is only needed for a subset of task types (e.g., a lookup table used only during threat enrichment, not during triage).
+- Reference material is stable and unlikely to change with every skill revision (e.g., field mappings, taxonomy tables, error code lists).
+
+Do NOT create a `references/` file just to be tidy. If the content is always needed for the skill to work correctly, it belongs in `SKILL.md` directly.
+
+## File Naming
+
+- Use kebab-case for all reference file names.
+- Use the `.md` extension.
+- Name files after their content, not their audience.
+
+Good:
+
+```
+references/field-mapping-table.md
+references/severity-levels.md
+references/alert-triage-workflow.md
+```
+
+Bad:
+
+```
+references/ref1.md
+references/Agent_Reference.md
+references/extra_stuff.md
+```
+
+## Writing Loading Triggers
+
+Loading triggers are instructions inside `SKILL.md` that tell the agent when to read a specific reference file. Every loading trigger must be:
+
+1. Clearly marked as mandatory or conditional.
+2. Tied to a specific task or condition — never vague.
+3. Placed near the task description it governs, not at the bottom of the file.
+
+### Mandatory Trigger Format
+
+Use this format when the reference must always be read before a specific task:
+
+```
+MANDATORY — read references/field-mapping-table.md before proceeding with alert normalization.
+```
+
+### Conditional Trigger Format
+
+Use this format when the reference is only needed under certain conditions:
+
+```
+If the alert source is "Crowdstrike", read references/crowdstrike-field-map.md before mapping fields.
+```
+
+Do not write triggers that say "read this for more information" — that is browsing guidance, not an execution instruction.
+
+## Writing "Do NOT Load" Guidance
+
+When a reference file exists but should not be loaded outside its specific context, say so explicitly in `SKILL.md`:
+
+```
+Do NOT load references/full-taxonomy-table.md during triage. Only load it during classification tasks.
+```
+
+This prevents the agent from loading reference files speculatively or out of order.
+
+## Always-Loaded vs On-Demand Content
+
+| Content Type | Where It Lives |
+|---|---|
+| Skill purpose and scope | `SKILL.md` |
+| Core workflow steps | `SKILL.md` |
+| Output format requirements | `SKILL.md` |
+| Key rules and constraints | `SKILL.md` |
+| Lookup tables (large) | `references/` |
+| Field mapping tables | `references/` |
+| Taxonomy or classification lists | `references/` |
+| Sub-workflows for specific task types | `references/` |
+| Error code reference tables | `references/` |
+
+The rule: if the agent cannot start a task correctly without reading it, it goes in `SKILL.md`. If the agent only needs it mid-task for a specific subtask, it goes in `references/`.
+
+## Good vs Bad Progressive Disclosure Structure
+
+### Bad — Everything Inlined
+
+```
+SKILL.md (620 lines)
+  - Skill purpose (10 lines)
+  - Core workflow (40 lines)
+  - Full ATT&CK technique list (200 lines)
+  - All severity mappings for 8 platforms (180 lines)
+  - Error handling for 60 edge cases (190 lines)
+```
+
+The agent reads all 620 lines on every invocation, even when it only needs the core workflow.
+
+### Good — Progressive Disclosure
+
+```
+SKILL.md (90 lines)
+  - Skill purpose (10 lines)
+  - Core workflow (40 lines)
+  - Loading triggers pointing to references/ (5 lines)
+  - Output format (15 lines)
+  - Key rules (20 lines)
+
+references/
+  attack-technique-list.md     (200 lines — loaded during classification)
+  severity-platform-map.md     (180 lines — loaded during severity scoring)
+  edge-case-error-handling.md  (190 lines — loaded when errors are encountered)
+```
+
+The agent reads `SKILL.md` every time, then reads only the reference files relevant to the current task.
+
+## Directory Layout Example
+
+```
+my-skill/
+  SKILL.md
+  references/
+    field-mapping-table.md
+    severity-levels.md
+    alert-triage-workflow.md
+```
+
+Reference files have no required internal structure, but they should start with a one-line purpose statement so the agent can confirm it loaded the right file.
diff --git a/skills/agent-md-refactor/SKILL.md b/skills/agent-md-refactor/SKILL.md
index d4ee2b5..1a08d8b 100644
--- a/skills/agent-md-refactor/SKILL.md
+++ b/skills/agent-md-refactor/SKILL.md
@@ -1,287 +1,95 @@
 ---
 name: agent-md-refactor
-description: Refactor bloated AGENTS.md, CLAUDE.md, or similar agent instruction files to follow progressive disclosure principles. Splits monolithic files into organized, linked documentation.
+description: Refactor bloated AGENTS.md, CLAUDE.md, COPILOT.md, or similar agent instruction files. Splits monolithic files into organized, linked documentation using progressive disclosure. Use when: "refactor my AGENTS.md", "my CLAUDE.md is too long", "split my agent instructions", "organize my agent config", "progressive disclosure for my instructions", "clean up my CLAUDE.md".
 license: MIT
 ---
 
 # Agent MD Refactor
 
-Refactor bloated agent instruction files (AGENTS.md, CLAUDE.md, COPILOT.md, etc.) to follow **progressive disclosure principles** - keeping essentials at root and organizing the rest into linked, categorized files.
-
----
-
-## Triggers
-
-Use this skill when:
-- "refactor my AGENTS.md" / "refactor my CLAUDE.md"
-- "split my agent instructions"
-- "organize my CLAUDE.md file"
-- "my AGENTS.md is too long"
-- "progressive disclosure for my instructions"
-- "clean up my agent config"
-
----
-
-## Quick Reference
-
-| Phase | Action | Output |
-|-------|--------|--------|
-| 1. Analyze | Find contradictions | List of conflicts to resolve |
-| 2. Extract | Identify essentials | Core instructions for root file |
-| 3. Categorize | Group remaining instructions | Logical categories |
-| 4. Structure | Create file hierarchy | Root + linked files |
-| 5. Prune | Flag for deletion | Redundant/vague instructions |
-
----
-
-## Process
-
-### Phase 1: Find Contradictions
-
-Identify any instructions that conflict with each other.
-
-**Look for:**
-- Contradictory style guidelines (e.g., "use semicolons" vs "no semicolons")
-- Conflicting workflow instructions
-- Incompatible tool preferences
-- Mutually exclusive patterns
-
-**For each contradiction found:**
-```markdown
-## Contradiction Found
-
-**Instruction A:** [quote]
-**Instruction B:** [quote]
-
-**Question:** Which should take precedence, or should both be conditional?
-```
-
-Ask the user to resolve before proceeding.
-
----
-
-### Phase 2: Identify the Essentials
-
-Extract ONLY what belongs in the root agent file. The root should be minimal - information that applies to **every single task**.
-
-**Essential content (keep in root):**
-| Category | Example |
-|----------|---------|
-| Project description | One sentence: "A React dashboard for analytics" |
-| Package manager | Only if not npm (e.g., "Uses pnpm") |
-| Non-standard commands | Custom build/test/typecheck commands |
-| Critical overrides | Things that MUST override defaults |
-| Universal rules | Applies to 100% of tasks |
-
-**NOT essential (move to linked files):**
-- Language-specific conventions
-- Testing guidelines
-- Code style details
-- Framework patterns
-- Documentation standards
-- Git workflow details
-
 ---
 
-### Phase 3: Group the Rest
+## Mindset
 
-Organize remaining instructions into logical categories.
+**Platform determines structure.** Progressive disclosure (linked files) only works for Claude Code. Claude.ai Projects, Copilot, Cursor, and Aider all ignore linked files — splitting them actively breaks those setups. Confirm platform before touching anything.
 
-**Common categories:**
-| Category | Contents |
-|----------|----------|
-| `typescript.md` | TS conventions, type patterns, strict mode rules |
-| `testing.md` | Test frameworks, coverage, mocking patterns |
-| `code-style.md` | Formatting, naming, comments, structure |
-| `git-workflow.md` | Commits, branches, PRs, reviews |
-| `architecture.md` | Patterns, folder structure, dependencies |
-| `api-design.md` | REST/GraphQL conventions, error handling |
-| `security.md` | Auth patterns, input validation, secrets |
-| `performance.md` | Optimization rules, caching, lazy loading |
+**Read the entire file before splitting.** Contradictions between line 12 and line 340 are invisible if you scan and split incrementally. A contradiction you miss becomes an agent inconsistency you can't debug later.
 
-**Grouping rules:**
-1. Each file should be self-contained for its topic
-2. Aim for 3-8 files (not too granular, not too broad)
-3. Name files clearly: `{topic}.md`
-4. Include only actionable instructions
+**Deletion destroys institutional knowledge.** "Write clean code" is safe to delete. "Never use mutable default arguments in our config loader" looks like generic advice but encodes a real incident. Only delete universally-obvious defaults — when in doubt, keep.
 
----
-
-### Phase 4: Create the File Structure
-
-**Output structure:**
-```
-project-root/
-├── CLAUDE.md (or AGENTS.md)     # Minimal root with links
-└── .claude/                      # Or docs/agent-instructions/
-    ├── typescript.md
-    ├── testing.md
-    ├── code-style.md
-    ├── git-workflow.md
-    └── architecture.md
-```
-
-**Root file template:**
-```markdown
-# Project Name
-
-One-sentence description of the project.
-
-## Quick Reference
-
-- **Package Manager:** pnpm
-- **Build:** `pnpm build`
-- **Test:** `pnpm test`
-- **Typecheck:** `pnpm typecheck`
-
-## Detailed Instructions
-
-For specific guidelines, see:
-- [TypeScript Conventions](.claude/typescript.md)
-- [Testing Guidelines](.claude/testing.md)
-- [Code Style](.claude/code-style.md)
-- [Git Workflow](.claude/git-workflow.md)
-- [Architecture Patterns](.claude/architecture.md)
-```
-
-**Each linked file template:**
-```markdown
-# {Topic} Guidelines
+**The "every task" test is the only categorization rule that matters.** If an instruction only applies to 30% of tasks, it doesn't belong in the root file. Not "is it important" — is it relevant to 100% of tasks.
 
-## Overview
-Brief context for when these guidelines apply.
-
-## Rules
-
-### Rule Category 1
-- Specific, actionable instruction
-- Another specific instruction
-
-### Rule Category 2
-- Specific, actionable instruction
-
-## Examples
-
-### Good
-\`\`\`typescript
-// Example of correct pattern
-\`\`\`
-
-### Avoid
-\`\`\`typescript
-// Example of what not to do
-\`\`\`
-```
+**Over-splitting is worse than not splitting.** Eight linked files for a 200-line source forces the agent to open 8 files to find one rule. More context cost, not less. Target 3–6 linked files maximum.
 
 ---
 
-### Phase 5: Flag for Deletion
-
-Identify instructions that should be removed entirely.
-
-**Delete if:**
-| Criterion | Example | Why Delete |
-|-----------|---------|------------|
-| Redundant | "Use TypeScript" (in a .ts project) | Agent already knows |
-| Too vague | "Write clean code" | Not actionable |
-| Overly obvious | "Don't introduce bugs" | Wastes context |
-| Default behavior | "Use descriptive variable names" | Standard practice |
-| Outdated | References deprecated APIs | No longer applies |
-
-**Output format:**
-```markdown
-## Flagged for Deletion
-
-| Instruction | Reason |
-|-------------|--------|
-| "Write clean, maintainable code" | Too vague to be actionable |
-| "Use TypeScript" | Redundant - project is already TS |
-| "Don't commit secrets" | Agent already knows this |
-| "Follow best practices" | Meaningless without specifics |
-```
+## Navigation
 
----
+**Use this skill when:**
+- "refactor my AGENTS.md / CLAUDE.md / COPILOT.md"
+- "my agent instructions are too long"
+- "split my agent instructions into files"
+- "progressive disclosure for my agent config"
+- "clean up / organize my CLAUDE.md"
 
-## Execution Checklist
+**Do NOT use this skill when:**
+- The file is already under 80 lines — pruning and editing is faster than restructuring
+- The platform is Claude.ai Projects, Copilot, Cursor, or Aider — linked files don't work there; offer prune-only instead
+- The user wants to write new instructions from scratch — this skill refactors, it doesn't author
 
+**Platform decision tree:**
 ```
-[ ] Phase 1: All contradictions identified and resolved
-[ ] Phase 2: Root file contains ONLY essentials
-[ ] Phase 3: All remaining instructions categorized
-[ ] Phase 4: File structure created with proper links
-[ ] Phase 5: Redundant/vague instructions removed
-[ ] Verify: Each linked file is self-contained
-[ ] Verify: Root file is under 50 lines
-[ ] Verify: All links work correctly
+What platform(s) does this file serve?
+├── Claude Code only           → full progressive disclosure (linked files valid)
+├── Claude.ai Projects only    → flat only; prune and reorganize, no splitting
+├── Copilot / Cursor / Aider   → flat only; splitting breaks it
+└── Multiple platforms         → create separate files per platform, or keep flat
 ```
 
 ---
 
-## Anti-Patterns
+## Philosophy
 
-| Avoid | Why | Instead |
-|-------|-----|---------|
-| Keeping everything in root | Bloated, hard to maintain | Split into linked files |
-| Too many categories | Fragmentation | Consolidate related topics |
-| Vague instructions | Wastes tokens, no value | Be specific or delete |
-| Duplicating defaults | Agent already knows | Only override when needed |
-| Deep nesting | Hard to navigate | Flat structure with links |
+Agent instruction files are not documentation — they are compiler inputs. Every byte costs inference tokens on every task. The goal is maximum signal density at minimum size: root file carries only what every task needs, linked files carry the rest, and anything a competent agent already knows gets deleted.
 
 ---
 
-## Examples
+## NEVER
 
-### Before (Bloated Root)
-```markdown
-# CLAUDE.md
+- **NEVER split files before confirming platform** — linked files are silently ignored on Claude.ai Projects, Copilot, Cursor, and Aider; splitting a multi-platform file breaks all non-Claude-Code users without any error message.
 
-This is a React project.
+- **NEVER delete instructions that contain project-specific nouns** — library names, internal hook names, team-specific patterns, and incident-derived rules all look like "generic best practices" to an outside reader but encode irreplaceable context. Only delete instructions that could appear verbatim in any project.
 
-## Code Style
-- Use 2 spaces
-- Use semicolons
-- Prefer const over let
-- Use arrow functions
-... (200 more lines)
+- **NEVER create linked files that link to more linked files** — agents follow links from root, then read the target file. They do not recursively follow links inside linked files. Any instruction buried at depth 2+ is effectively invisible, creating a false sense of organization with zero functional benefit.
 
-## Testing
-- Use Jest
-- Coverage > 80%
-... (100 more lines)
+- **NEVER merge contradicting instructions without user confirmation** — two conflicting rules might both be intentional (e.g., "use semicolons" in source, "no semicolons" in test files). Silently picking one invalidates half the project's convention history.
 
-## TypeScript
-- Enable strict mode
-... (150 more lines)
-```
+- **NEVER apply progressive disclosure to a file under 80 lines** — the overhead of maintaining links and multiple files exceeds the context savings. Small files should be pruned, not split.
 
-### After (Progressive Disclosure)
-```markdown
-# CLAUDE.md
+- **NEVER categorize by topic noun instead of task trigger** — `style-and-conventions.md` is organized around what the content is; `code-review.md` is organized around when it is needed. Agents retrieve context when starting a task — trigger-based grouping matches how they actually read.
 
-React dashboard for real-time analytics visualization.
+---
 
-## Commands
-- `pnpm dev` - Start development server
-- `pnpm test` - Run tests with coverage
-- `pnpm build` - Production build
+## When Things Go Wrong
 
-## Guidelines
-- [Code Style](.claude/code-style.md)
-- [Testing](.claude/testing.md)
-- [TypeScript](.claude/typescript.md)
-```
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| Agent stops following linked files | Platform is Claude.ai Projects or Copilot — linked files are not followed | Flatten back to single file; use section headers instead |
+| Linked file exists but agent never reads it | No link in root file pointing to it, or link at depth 2+ | Add explicit link from root; never nest links |
+| Refactored file loses a critical constraint | Domain-specific rule deleted as "obvious" | Restore from git history; add to deletion triage checklist — confirm all project-noun rules with user |
+| Root file creep returns after 2 months | No process to prevent additive drift | Add line-count gate to CI or README: "Root file must stay under 50 lines" |
+| Split created contradictions that didn't exist before | Copy-paste from different source contexts during categorization | Re-read both files; surface conflict to user before finalizing |
 
 ---
 
-## Verification
+## Quick Process
 
-After refactoring, verify:
+Full workflow detail: [references/refactor-workflow.md](references/refactor-workflow.md)
+Anti-patterns catalog + platform compatibility matrix: [references/anti-patterns-catalog.md](references/anti-patterns-catalog.md)
 
-1. **Root file is minimal** - Under 50 lines, only universal info
-2. **Links work** - All referenced files exist
-3. **No contradictions** - Instructions are consistent
-4. **Actionable content** - Every instruction is specific
-5. **Complete coverage** - No instructions were lost (unless flagged for deletion)
-6. **Self-contained files** - Each linked file stands alone
+**Five-phase summary:**
 
----
+1. **Platform check** — confirm linked files are valid for this target
+2. **Full read + contradiction scan** — read complete file; surface all conflicts to user before touching anything
+3. **Essentials extraction** — apply "every task" test; root file target under 50 lines
+4. **Categorize by task trigger** — 3–6 linked files max; group by WHEN needed, not WHAT topic
+5. **Deletion triage** — three buckets: DELETE (universal defaults), KEEP (domain-specific), ESCALATE (ask user)
diff --git a/skills/agent-md-refactor/references/anti-patterns-catalog.md b/skills/agent-md-refactor/references/anti-patterns-catalog.md
new file mode 100644
index 0000000..bb954e0
--- /dev/null
+++ b/skills/agent-md-refactor/references/anti-patterns-catalog.md
@@ -0,0 +1,88 @@
+# Anti-Patterns Catalog — Agent MD Refactor
+
+## Category 1: Root File Bloat
+
+### Chronological accumulation drift
+Every sprint someone adds 3 lines to CLAUDE.md. After 6 months it's 800 lines. No single section looks unreasonable in isolation — the problem is additive.
+**Signal:** Lines added > lines removed over last 10 commits.
+
+### Defensive over-specification
+Teams write walls of rules after a single bad incident. "Always check if file exists before reading" appears because an agent once deleted a file. The rule now burns tokens on every task even when irrelevant.
+**Signal:** Instructions that read like post-mortems ("Never again X").
+
+### Platform-agnostic instructions in platform-specific files
+CLAUDE.md written for Claude Code containing Copilot-style slot-fill patterns. The instruction format doesn't match the agent runtime.
+**Signal:** Instructions that reference `<SLOT>`, `{{variable}}`, or step-by-step wizard patterns.
+
+---
+
+## Category 2: Structural Errors
+
+### False modularity — splitting without linking
+Files created in `.claude/` but root file never references them. Agent never reads them.
+**Signal:** `.claude/*.md` files with zero inbound links from root.
+
+### Cross-file contradiction via copy-paste
+`testing.md` says "mock all external calls." `architecture.md` says "prefer integration tests." Both were copied from different projects.
+**Signal:** Two linked files with semantically opposite rules.
+
+### Depth-first file structure
+```
+.claude/
+  backend/
+    api/
+      rest/
+        conventions.md   ← agent never reaches this
+```
+Agents follow links one level at a time. Anything beyond depth 2 is effectively invisible.
+**Signal:** Any linked file that is itself a directory index pointing to more files.
+
+---
+
+## Category 3: Content Quality Failures
+
+### Rule without context = ignored rule
+"Use functional components only" — but WHY? When an agent encounters a class component in existing code, it doesn't know if refactoring it is in scope.
+**Signal:** Rules with no rationale and no scope boundary.
+
+### Instruction-as-aspiration
+"Write maintainable, well-tested, performant code" belongs in a team handbook, not an agent instruction file. Agents need executable conditions, not values.
+**Signal:** Any rule where compliance cannot be verified by inspection.
+
+### Stale command blocks
+```
+npm run dev
+```
+Project switched to pnpm 4 months ago. Agent runs the wrong command. No error visible in output (just wrong behavior).
+**Signal:** Commands that don't match package.json scripts or Makefile targets.
+
+---
+
+## Category 4: Refactor Execution Errors
+
+### Refactoring without reading the full file first
+Splitting on first scan misses contradictions that only appear when you see instruction A on line 12 and its contradiction on line 340. Always read complete before splitting.
+
+### Over-categorizing
+Eight linked files for a 200-line AGENTS.md. Now the agent reads 8 files to find one rule. You've increased context cost, not reduced it.
+**Threshold:** One linked file per ~150 lines of original content is reasonable.
+
+### Losing provenance on deletions
+Flagging "Write clean code" for deletion is safe. Flagging a domain-specific constraint ("never use mutable default arguments in Python config objects") because it "sounds like best practices" destroys institutional knowledge.
+**Rule:** Only delete instructions that are universal defaults. When in doubt, keep.
+
+---
+
+## Platform Compatibility Matrix
+
+| Platform | Root File | Linked Files Followed? | Max Depth | Notes |
+|----------|-----------|------------------------|-----------|-------|
+| Claude Code | CLAUDE.md | Yes (explicit Read) | Unlimited | Agent actively follows links |
+| Claude.ai Projects | Project Knowledge | No — flat only | 1 | All instructions must be in one file |
+| GitHub Copilot | COPILOT.md (or .github/copilot-instructions.md) | No | 1 | Single file, no linking |
+| Cursor | .cursorrules | No | 1 | Flat JSON or markdown |
+| Aider | .aider.conf | No | 1 | Config file, not linked docs |
+| Devin | DEVIN.md | Limited | 2 | Partial link following |
+| OpenHands | .openhands_instructions | No | 1 | Flat only |
+
+**Critical implication:** Progressive disclosure (linked files) only works for Claude Code. For all other platforms, all instructions must remain in the root file — splitting actively breaks them.
diff --git a/skills/agent-md-refactor/references/refactor-workflow.md b/skills/agent-md-refactor/references/refactor-workflow.md
new file mode 100644
index 0000000..d28287a
--- /dev/null
+++ b/skills/agent-md-refactor/references/refactor-workflow.md
@@ -0,0 +1,103 @@
+# Refactor Workflow — Agent MD Refactor
+
+## Pre-Flight: Platform Check (Do This First)
+
+Before any restructuring, identify the target platform(s).
+
+```
+Does this file serve Claude Code exclusively?
+  YES → progressive disclosure (linked files) is valid
+  NO  → keep flat; only prune and reorganize within single file
+  BOTH → create Claude Code version + separate flat version
+```
+
+See [anti-patterns-catalog.md](anti-patterns-catalog.md) for the full platform compatibility matrix.
+
+---
+
+## Phase 1: Contradiction Scan
+
+Read the entire file before touching anything. Map:
+- Line numbers of each rule
+- Scope of each rule (always / when / if)
+- Any two rules that could conflict on the same codebase action
+
+**Output to user (if contradictions found):**
+```
+Contradiction found:
+  Line 12: "Use semicolons"
+  Line 187: "No semicolons in test files"
+
+Q: Is the second rule intentional (tests use different style) or a copy-paste error?
+```
+
+Do not proceed past Phase 1 until contradictions are resolved.
+
+---
+
+## Phase 2: Essentials Extraction
+
+Apply the "every task" test: if an instruction only matters for 30% of tasks, it belongs in a linked file, not root.
+
+**Root file keeps:**
+- Project identity (1 sentence)
+- Non-default commands (build, test, typecheck — only if non-standard)
+- Package manager (only if not npm)
+- Hard overrides (things that MUST take precedence over agent defaults)
+- Rules that apply to 100% of tasks
+
+**Root file target:** Under 50 lines.
+
+---
+
+## Phase 3: Categorization
+
+Group by WHEN they are needed, not by WHAT they are about.
+
+**Effective grouping:** `testing.md` (needed when writing or modifying tests)
+**Ineffective grouping:** `style-and-conventions-and-formatting-and-naming.md` (too broad)
+
+**Sizing rule:** 3–6 linked files maximum. If you have more than 6, merge the least-referenced topics.
+
+---
+
+## Phase 4: Deletion Triage
+
+Three-bucket sort for each candidate instruction:
+
+| Bucket | Criterion | Action |
+|--------|-----------|--------|
+| DELETE | Universal default — agent already knows | Remove |
+| KEEP | Domain-specific or non-obvious | Keep |
+| ESCALATE | Unclear if project-specific or generic | Ask user |
+
+**Never delete without user confirmation if the instruction:**
+- Contains a specific library name
+- References a project-internal pattern (e.g., "use our AuthContext hook")
+- Has a non-obvious rationale embedded in it
+
+---
+
+## Phase 5: Link Validation
+
+After creating linked files:
+1. Every file referenced from root must exist
+2. No linked file references another linked file (depth 2+ = invisible)
+3. Root file links use relative paths (not absolute)
+4. Verify link syntax matches target platform (Markdown `[text](path)` for Claude Code)
+
+---
+
+## Execution Checklist
+
+```
+[ ] Platform identified — confirm linked files are valid for this platform
+[ ] Full file read before any changes
+[ ] All contradictions surfaced to user and resolved
+[ ] Root file under 50 lines
+[ ] 3–6 linked files (not more)
+[ ] No linked file references another linked file
+[ ] Deletion candidates confirmed with user for domain-specific rules
+[ ] All links validated (files exist, paths correct)
+[ ] No instruction lost without explicit user approval
+```
diff --git a/skills/api-design/SKILL.md b/skills/api-design/SKILL.md
new file mode 100644
index 0000000..7fb2eab
--- /dev/null
+++ b/skills/api-design/SKILL.md
@@ -0,0 +1,412 @@
+---
+name: api-design
+description: Design API contracts before implementation — protocol selection, pagination, versioning, error standardization, OpenAPI spec authoring, and breaking-change classification. Use when asked to "design an API", "spec out endpoints", "pick REST vs GraphQL", "define error responses", "plan API versioning", or "write an OpenAPI spec". Sits between requirements-clarity (what to build) and openapi-to-typescript (typed interfaces from spec).
+---
+
+## Mindset
+
+- **The spec is a contract, not documentation.** An API spec written after the fact documents what was shipped. A spec written before implementation constrains what gets shipped. Spec-first is a design discipline, not a paperwork exercise.
+- **The HTTP status code IS the error signal.** A 200 with `{"success": false}` in the body is a protocol violation that breaks every intermediary — proxies, caches, circuit breakers, monitoring dashboards — because they all treat 2xx as success. The body elaborates; the status decides.
+- **Consumers don't read your source code.** Every invariant you enforce in the backend (write-once fields, enum transitions, per-org rate limits) must appear explicitly in the contract. If it isn't in the spec, frontend will rediscover it at 4 PM on release day.
+- **Pagination is a data consistency guarantee, not a performance knob.** The correct pagination strategy depends on whether the underlying data changes during iteration. Offset breaks silently on concurrent writes. Cursor is required for correctness on any live data set — not just for performance.
+- **Versioning debt compounds.** Every URL path version (`/v1/`, `/v2/`) is a permanent branch in production. Non-breaking evolution (additive fields, new optional params) should not create a new version. Reserve versioning for genuine breaking changes.
+- **Error codes are machine interfaces.** `"Email is invalid"` is a display string. `INVALID_EMAIL` is a contract. Consumers branch on codes, not sentences. Sentences are localized, truncated, and changed mid-iteration. Codes are not.
+
+## Navigation
+
+**Use this skill when**:
+- Designing a new API surface (REST, GraphQL, gRPC) before any implementation
+- Choosing between protocol options and need decision criteria beyond "REST is standard"
+- Defining error response structure, pagination strategy, or versioning policy
+- Writing or reviewing an OpenAPI spec (spec-first or code-first)
+- Classifying whether a proposed change is breaking or non-breaking
+- Documenting rate limiting, auth schemes, or retry conventions in a spec
+
+**Do NOT use this skill when**:
+- Backend implementation is already complete and you need consumer-facing docs — use backend-to-frontend-handoff-docs
+- You have an OpenAPI spec and need TypeScript interfaces — use openapi-to-typescript
+- Requirements are still vague ("build me a user API") — use requirements-clarity first
+- You need database schema design — use database-schema-designer
+
+**Quick decision tree**:
+```
+Is the API surface new (no prior contract exists)?
+  YES → Start at Protocol Selection, then Pagination, then Versioning
+  NO  → Skip to Breaking-Change Classification; existing contract governs
+
+Does the user have a written OpenAPI spec?
+  YES → Jump to Spec Review; check for common omissions
+  NO  → Ask: spec-first or code-first? (see section below)
+
+Is the user asking about errors/status codes?
+  YES → Error Standardization section; apply RFC 7807
+```
+
+## Philosophy
+
+API design is constraint selection. Every decision — protocol, pagination style, version scheme, error shape — closes future options in exchange for present clarity. Bad API design is not missing features; it is missing constraints. Underconstrained APIs allow callers to build assumptions that are never honored, and allow implementers to ship behaviors that are never documented.
+
+## NEVER
+
+- **NEVER use offset pagination for any resource that can be modified during iteration** — offset is computed against the current state of the dataset. Between page 1 and page 2, concurrent inserts shift every subsequent row. Items are skipped or duplicated invisibly, with no error to surface. Cursor pagination anchors to a stable position regardless of concurrent writes.
+- **NEVER define error responses as a single string `message` field** — a bare string message is consumed by displaying it to users. Machine-readable error bodies (`code` + `detail` + optional `field`) allow consumers to branch programmatically: retry on `RATE_LIMIT_EXCEEDED`, show a field hint on `INVALID_EMAIL`, escalate on `PERMISSION_DENIED`. Strings cannot support this without string parsing.
+- **NEVER return HTTP 200 with a `success: false` body** — the HTTP status is the error signal for every layer that touches the response (proxy, cache, CDN, circuit breaker, monitoring alert, client SDK). A 200 disables all of that. The body is elaboration only; it cannot override the status.
+- **NEVER version your API in the URL by default** — `/v1/users` creates a permanent production branch. When `/v2/users` ships, `/v1/users` must remain operational indefinitely (or you break clients). Header versioning (`API-Version: 2024-01`) makes version negotiation explicit and allows sunset without a parallel URL tree. Override this ONLY when developer experience for public APIs demands URL discoverability.
+- **NEVER design a PATCH endpoint that treats absent fields as nulls** — in a partial update, an absent field means "do not change this field." A null field means "set this field to null." These are distinct intents. A PATCH that collapses them forces callers to always send every optional field to protect values they don't want cleared — defeating the purpose of PATCH.
+- **NEVER use boolean query parameters for state that could acquire a third value** — `?active=true/false` will eventually need `?active=pending` or `?active=null`. Boolean URL params cannot express this; the API must be redesigned. Use `?status=active|inactive|pending` from the start.
+- **NEVER define rate limit behavior without spec-level documentation** — rate limits that are not in the spec are discovered by consumers via production 429 errors. Document the limit scope (per user, per IP, per org), the algorithm (token bucket, sliding window), and the response headers (`Retry-After`, `X-RateLimit-Remaining`) before the API ships.
+
+## Protocol Selection
+
+### REST vs GraphQL vs gRPC Decision Criteria
+
+| Criterion | REST | GraphQL | gRPC |
+|-----------|------|---------|------|
+| Consumer profile | Multiple external clients, public APIs | Frontend teams with highly variable query shapes | Internal service-to-service, performance-critical |
+| Schema evolution | Additive (non-breaking); versioning for removals | Schema-first; deprecations via `@deprecated` directive | Proto files; strongly typed; forward/backward compat via field numbers |
+| Payload control | Fixed shapes; over/under-fetch common | Consumer-specified; fetch only needed fields | Fixed shapes; binary encoding |
+| Tooling maturity | Broadest ecosystem | Good; Apollo, graphql-codegen | Strong in polyglot; requires protoc toolchain |
+| Error handling | HTTP status codes + RFC 7807 body | Always 200; errors in `errors[]` array | gRPC status codes (not HTTP) |
+| Caching | Native HTTP caching (ETags, Cache-Control) | Query hashing required; no standard cache key | Application-level only |
+
+**Non-obvious selection criteria practitioners miss**:
+
+- **GraphQL's 200-for-everything is a deliberate trade-off, not a design flaw.** But it means your HTTP-layer monitoring (Datadog, Sentry, uptime checks) will report 100% success even during partial outages. You must instrument on the `errors` array, not on HTTP status.
+- **REST over-fetching is a performance problem only after you have the traffic.** Premature GraphQL adoption for "performance" typically means paying the schema-design and tooling tax before you have evidence over-fetching is actually a bottleneck.
+- **gRPC requires HTTP/2.** Browser clients cannot use gRPC directly (gRPC-Web is a separate protocol with a proxy layer). If you have browser consumers, gRPC is an internal transport, not a public API.
+- **REST and GraphQL are not exclusive.** Large platforms often serve a REST API for external partners (stable, versioned contracts) and GraphQL for internal frontend (flexible, schema-driven iteration).
+
+## Pagination Patterns
+
+### Cursor vs. Offset Trade-offs
+
+**Offset pagination** (`?page=2&limit=20`, `?offset=40`):
+- Simple to implement, intuitive for humans, supports random access ("go to page 10")
+- **Breaks silently on concurrent writes**: if 5 rows are inserted after page 1 is fetched, page 2 starts 5 rows later in the result set — rows are skipped, not duplicated, with no error
+- **Breaks on large offsets**: `OFFSET 100000` in SQL requires the database to scan and discard 100,000 rows; performance degrades with depth
+- Acceptable for: static datasets (product catalogs, historical archives), user-facing paginated reports where the user pages through manually
+
+**Cursor pagination** (`?cursor=<opaque_token>&limit=20`):
+- Cursor anchors to a stable position (row ID, timestamp, composite key)
+- Concurrent inserts/deletes do not affect positions relative to the cursor
+- Cannot support random access ("jump to page 10") — only forward (and optionally backward) navigation
+- Required for: feeds, activity streams, audit logs, any resource writable by concurrent processes during iteration
+- **Cursor must be opaque to callers** — do not expose raw SQL offsets or timestamps as cursors; expose an opaque base64 token. This allows cursor implementation to change without breaking callers.
+
+**Keyset pagination** is a variant of cursor where the position is based on indexed column values (e.g., `WHERE created_at < ? AND id < ?`). Use when sort order matters and you control the index.
+
+### What Practitioners Get Wrong
+
+- Choosing offset "for now" on a resource that accepts writes. The breakage is invisible in tests (tests don't write concurrently) and shows up as user-reported missing items in production feeds.
+- Exposing cursor internals. A cursor like `?cursor=2024-05-01T12:00:00Z` couples callers to your sort implementation and breaks if you ever change the timestamp column type.
+- Forgetting backward cursors. Most cursor implementations only go forward. If the product requires "go back," build `prev_cursor` from the start — retrofitting it changes the response schema.
+
+## API Versioning Strategy
+
+### URL vs. Header vs. Content Negotiation
+
+**URL versioning** (`/v1/users`, `/v2/users`):
+- Immediately visible; no special headers needed; easy to share URLs
+- Creates permanent production branches — `/v1` must run indefinitely once published
+- Encourages breaking changes by making them "cheap" (just bump `/v2`)
+- Appropriate for: public APIs with diverse consumers who cannot control request headers (webhooks, third-party integrations, browser fetch without CORS pre-flight complexity)
+
+**Header versioning** (`API-Version: 2024-01`, `X-API-Version: 2`):
+- No URL pollution; single URL tree; version sunset without URL proliferation
+- Not visible in browser address bar; harder to test manually; requires documentation
+- Appropriate for: internal APIs, developer platforms where consumers control all request headers, APIs where sunset timelines are enforced
+
+**Content negotiation** (`Accept: application/vnd.api+v2+json`):
+- Standards-based; caches correctly (Vary header)
+- Complex for consumers; rarely implemented correctly in practice
+- Appropriate for: media APIs, hypermedia/HATEOAS APIs; almost never the right choice for standard JSON REST APIs
+
+**Date-based header versioning** (used by Stripe, Anthropic) is the most sustainable pattern for high-churn APIs:
+- Version string is a date: `API-Version: 2024-11-01`
+- Each version is a snapshot of the API on that date
+- Consumers pin to a version; new features ship as new version dates
+- Breaking changes require a new date version; old dates run until sunset
+
+### What Practitioners Get Wrong
+
+- Versioning every release, not every breaking change. Versions are not release numbers. Non-breaking changes (additive fields, new optional params, new enum values) must not increment the version — this trains consumers to update version pins constantly and increases migration fatigue.
+- Forgetting that "removing a field" is a breaking change even when the field was optional. Consumers who read the field get `undefined` after removal. Optional fields cannot be removed without a version bump.
+- Adding URL versioning to a private internal API. Internal services can coordinate upgrades; they do not need permanent URL branches. Use header versioning or no versioning (coordinate rollout directly).
+
+## Error Response Standardization
+
+### RFC 7807 Problem Details
+
+Use RFC 7807 (`application/problem+json`) as the standard error shape:
+
+```json
+{
+  "type": "https://errors.example.com/invalid-input",
+  "title": "Validation Failed",
+  "status": 422,
+  "detail": "The request body contains invalid fields.",
+  "instance": "/requests/abc123",
+  "errors": [
+    {
+      "code": "INVALID_EMAIL",
+      "field": "email",
+      "detail": "The provided email address is not a valid format."
+    }
+  ]
+}
+```
+
+**Required fields**: `type` (URI, dereferenceable or not), `title` (human, stable), `status` (mirrors HTTP status), `detail` (specific to this occurrence).
+
+**The `errors` array extension** is not part of RFC 7807 base but is universally expected for validation errors. Always extend Problem Details with a machine-readable `errors` array for 422 responses.
+
+### Status Code Taxonomy
+
+| Situation | Correct Code | Common Wrong Code |
+|-----------|-------------|-------------------|
+| Validation error (bad input shape, invalid values) | **422 Unprocessable Entity** (RFC 4918) | 400 Bad Request |
+| Malformed JSON / unparseable request body | 400 Bad Request | 422 |
+| Not authenticated (no token, expired token) | 401 Unauthorized | 403 |
+| Authenticated but lacks permission | 403 Forbidden | 401 |
+| Resource not found | 404 Not Found | 200 with `null` body |
+| Optimistic lock conflict / concurrent edit | 409 Conflict | 400 |
+| Rate limit exceeded | 429 Too Many Requests | 400 or 503 |
+| Downstream dependency unavailable | 503 Service Unavailable | 500 |
+| Intentional temporary maintenance window | 503 with `Retry-After` | 500 |
+
+**HTTP 422 for validation errors** is explicitly correct per RFC 4918. The common objection ("400 is simpler") is wrong: 400 means the server could not parse the request. 422 means the request was parsed, understood, and rejected due to semantic errors. These are different conditions. Monitoring dashboards, clients, and retry logic all benefit from the distinction.
+
+### Error Code Taxonomy
+
+Error codes must be:
+- **Machine-parseable strings**: `INVALID_EMAIL`, not `"Email address is not valid"`
+- **SCREAMING_SNAKE_CASE** (conventional, widely recognized)
+- **Domain-scoped for large APIs**: `AUTH_TOKEN_EXPIRED`, `PAYMENT_CARD_DECLINED`, `USER_EMAIL_DUPLICATE`
+- **Not sentences, not display strings** — the `detail` field carries the human message; `code` is for programmatic branching
+
+Organize codes in tiers:
+1. **Generic codes** (used by all endpoints): `VALIDATION_FAILED`, `NOT_FOUND`, `UNAUTHORIZED`, `FORBIDDEN`, `RATE_LIMIT_EXCEEDED`, `INTERNAL_ERROR`
+2. **Domain codes** (resource-specific): `USER_EMAIL_DUPLICATE`, `ORDER_ALREADY_SUBMITTED`, `PAYMENT_INSUFFICIENT_FUNDS`
+3. **Field-level codes** (in `errors[]` array): `INVALID_EMAIL`, `REQUIRED_FIELD_MISSING`, `VALUE_TOO_LONG`
+
+### Retry Signal Conventions
+
+| Code | Retryable? | Convention |
+|------|-----------|------------|
+| 429 | Yes, after backoff | Always include `Retry-After` header (seconds) |
+| 503 | Yes, after backoff | Include `Retry-After` if duration is known |
+| 502, 504 | Yes, immediate | Network/gateway errors; exponential backoff |
+| 409 | Conditional | Re-fetch resource, re-apply change, retry |
+| 422 | No | Validation error; retry without fixing input is pointless |
+| 400, 401, 403, 404 | No | Client error; retry is identical to original failure |
+| 500 | No (by default) | Internal server error; do not retry by default; operator should investigate |
+
+## OpenAPI Spec Authoring
+
+### Spec-First vs. Code-First
+
+**Spec-first**: Write the OpenAPI spec before any implementation. Treat the spec as the source of truth. Implementation must match it.
+- Use when: multiple consumers, cross-team coordination needed, public API, contract needs review/approval before dev starts
+- Tooling: Swagger Editor, Stoplight Studio, Redocly
+- Risk: spec drift if implementation deviates and spec is not updated
+
+**Code-first**: Annotate code (decorators, docstrings); tooling generates the spec.
+- Use when: solo developer, internal service, spec accuracy matters more than spec timing
+- Tooling: FastAPI (Python), tsoa (TypeScript), Springdoc (Java)
+- Risk: generated spec omits information code cannot express (intent, deprecation warnings, narrative descriptions)
+
+**When spec-first is required** even if code-first is preferred:
+- Any API with a partner or public consumer — they cannot wait for code to be written to see the contract
+- Any API used by a frontend team that is developing in parallel — frontend needs types from the spec before backend is done
+
+### Required Fields Practitioners Miss
+
+In `paths`:
+- `operationId` on every operation — without it, code generators use method+path as the function name, which is often unreadable
+- `tags` on every operation — controls grouping in generated docs and SDK namespacing
+- `summary` AND `description` — summary is one line for the table of contents; description carries nuance (state machine, error triggers, business rules)
+
+In `components/schemas`:
+- `description` on every property — a property named `status` with no description forces consumers to guess from example values
+- `readOnly: true` on server-generated fields (IDs, timestamps) — without this, code generators include them in request bodies, allowing callers to attempt to set immutable fields
+- `nullable: false` explicitly when a field is never null — prevents consumers from over-writing null handling
+
+In `responses`:
+- A response definition for **every** error status code the endpoint can return — not just 200 and a generic 4xx
+- `default` response referencing a Problem Details schema — catches undocumented errors
+- `headers` for `429` responses — document `Retry-After` and `X-RateLimit-*` headers here, not only in prose
+
+In `security`:
+- Specify security schemes at the operation level, not just globally, when endpoints have mixed auth (some public, some protected)
+
+### Common OpenAPI Mistakes
+
+- Defining all errors as `{ message: string }` in the spec, then returning RFC 7807 in implementation. The spec is wrong the day it ships.
+- Using `additionalProperties: true` everywhere to avoid schema maintenance. This makes the spec useless as a contract — it permits any payload.
+- Using `anyOf` where `oneOf` was intended. If only one variant is valid per request, use `oneOf` and add a `discriminator`.
+- Omitting `format` on string fields that have known formats (`email`, `uri`, `date-time`, `uuid`) — validators and code generators use `format` to apply semantic validation.
+
+## Breaking vs. Non-Breaking Change Classification
+
+### What Practitioners Get Wrong
+
+The common definition is: "breaking = removed field." The correct definition is: **a breaking change is any change that requires an existing well-behaved consumer to change their code.**
+
+**Non-breaking (safe to ship without version bump)**:
+- Adding a new optional field to a response (consumers ignore unknown fields if coded defensively)
+- Adding a new optional query parameter
+- Adding a new endpoint
+- Adding a new enum value to a response field *if consumers are coded to handle unknown values* — this is the disputed case (see below)
+- Relaxing a constraint (accepting a previously-invalid value)
+
+**Breaking (requires version bump or migration path)**:
+- Removing any field from a response, even an optional one
+- Renaming a field
+- Changing a field's type (string → number, string → string[])
+- Changing a field's nullability (non-null → nullable, nullable → non-null)
+- Adding a new **required** field to a request
+- Removing or renaming a query parameter
+- Changing the HTTP status code for a success response
+- Tightening a constraint (rejecting a previously-valid value)
+- Changing authentication scheme
+
+**The disputed case: new enum values in responses**. Technically non-breaking (you added a value). In practice, breaking for consumers who use exhaustive switch statements or who validate response data against a known-values list. Treat new response enum values as breaking for any statically-typed consumer, or document explicitly that all enums are open-ended (consumers must handle unknown values).
+
+**PATCH field removals are always breaking** — even if the field is optional. A consumer sending PATCH payloads with that field now sends an unknown field; behavior depends on server implementation (ignore vs. error).
+
+## Rate Limiting Documentation
+
+### Documenting Rate Limits in the OpenAPI Spec
+
+Rate limits must appear in the spec — not only in prose documentation. Use response headers on every endpoint subject to rate limiting:
+
+```yaml
+responses:
+  "200":
+    headers:
+      X-RateLimit-Limit:
+        schema:
+          type: integer
+        description: Maximum requests allowed in the current window.
+      X-RateLimit-Remaining:
+        schema:
+          type: integer
+        description: Requests remaining in the current window.
+      X-RateLimit-Reset:
+        schema:
+          type: integer
+          format: int64
+        description: Unix timestamp (seconds) when the window resets.
+  "429":
+    headers:
+      Retry-After:
+        schema:
+          type: integer
+        description: Seconds to wait before retrying.
+    content:
+      application/problem+json:
+        schema:
+          $ref: "#/components/schemas/ProblemDetails"
+```
+
+Document rate limit scope explicitly in endpoint descriptions:
+- Per user? Per API key? Per organization? Per IP?
+- Token bucket (burst allowed) or fixed window (no burst)?
+- Separate limits per endpoint or global account limit?
+
+### What Practitioners Get Wrong
+
+- Documenting rate limits only in a separate "limits" guide page, not in the spec. Consumers encounter 429 before they find the guide.
+- Omitting `Retry-After` on 429 responses. Without it, consumers implement arbitrary backoffs (usually too short or too long).
+- Using the same rate limit scope for bulk and single-item endpoints. A `POST /bulk-import` (100 records/request) and `POST /records` (1 record/request) should not share the same per-request limit.
+
+## Auth Scheme Selection in OpenAPI
+
+### Bearer (JWT / opaque tokens)
+
+```yaml
+components:
+  securitySchemes:
+    BearerAuth:
+      type: http
+      scheme: bearer
+      bearerFormat: JWT  # Informational only; not validated by tooling
+```
+
+Use for: user-facing APIs, APIs with short-lived session tokens, anywhere you need to embed claims without a database lookup.
+
+Document explicitly:
+- Token location (always `Authorization: Bearer <token>` for `http` scheme — but confirm no cookie fallback)
+- Token lifetime and refresh flow
+- What 401 means (token absent or expired) vs. 403 (token valid, permission denied)
+
+### API Key
+
+```yaml
+components:
+  securitySchemes:
+    ApiKeyHeader:
+      type: apiKey
+      in: header
+      name: X-API-Key
+```
+
+Use for: server-to-server, webhook consumers, partner integrations where user identity is not relevant.
+
+`in` can be `header`, `query`, or `cookie`. **Prefer `header`** — API keys in query strings appear in server logs, browser history, and referrer headers.
+
+### OAuth2
+
+```yaml
+components:
+  securitySchemes:
+    OAuth2:
+      type: oauth2
+      flows:
+        authorizationCode:
+          authorizationUrl: https://auth.example.com/authorize
+          tokenUrl: https://auth.example.com/token
+          scopes:
+            read:users: Read user profiles
+            write:users: Create and update users
+```
+
+**Flow selection**:
+| Consumer | Correct Flow |
+|----------|-------------|
+| Browser SPA or mobile app | `authorizationCode` with PKCE (no client secret) |
+| Server-side web app | `authorizationCode` (can store client secret) |
+| Machine-to-machine / service account | `clientCredentials` |
+| Legacy / resource owner password flow | Avoid; `password` flow is deprecated in OAuth 2.1 |
+
+**What practitioners get wrong**:
+- Using `clientCredentials` for user-delegated access. Client credentials authenticate the application, not a user. No user identity or user-scoped permissions are in the token.
+- Omitting `scopes` from the spec. Undocumented scopes force consumers to reverse-engineer required permissions from 403 errors.
+- Using the `implicit` flow. It is deprecated in OAuth 2.1 and has known security weaknesses. Use `authorizationCode` + PKCE for SPAs.
+
+## When Things Go Wrong
+
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| Frontend reports missing items in paginated feed | Offset pagination on a writable resource | Migrate to cursor; add `next_cursor`/`prev_cursor` to response; deprecate `page`/`offset` params with a version sunset |
+| Consumer cannot distinguish "field absent" from "field null" in PATCH | PATCH implementation does not differentiate | Define explicit null semantics in spec: document that absent = no-op, null = clear; enforce in validation layer |
+| Multiple clients broken after adding a new enum value to response | Clients use exhaustive switch/match without default branch | Document all response enums as "open-ended"; add a breaking-change note to changelog; publish migration guide |
+| Monitoring shows 100% success rate during outage | Errors returned as 200 with error body | Find all endpoints returning `success: false`; change to correct 4xx/5xx; this is always a breaking change — version accordingly |
+| Rate limit scope is wrong (org-level limit but should be per-user) | Limit designed without multi-tenant analysis | Change scope; this is a behavioral breaking change; notify consumers before rollout |
+| OpenAPI spec and implementation diverge over time | Code-first tooling not run as part of CI | Add spec regeneration to CI pipeline; fail CI on spec diff |
+
+## Workflow
+
+1. **Confirm protocol** — use the decision criteria table; document the rationale in the spec's `info.description`
+2. **Define pagination strategy** — is any resource mutable during iteration? Yes → cursor required
+3. **Define versioning policy** — document it in `info.description` before the first endpoint is designed
+4. **Define error schema** — add a `ProblemDetails` component schema; reference it in every error response
+5. **Author spec** — spec-first if consumers are parallel; code-first if solo/internal
+6. **Classify each design decision** — for each field/endpoint, note whether future changes would be breaking
+7. **Document rate limits** — every rate-limited endpoint gets response headers documented in spec
+8. **Review auth scheme** — confirm flow type for each OAuth2 usage; confirm key location for API key auth
+
+**Natural next steps after this skill**:
+- `openapi-to-typescript` — generate TypeScript interfaces from the completed spec for frontend consumers
+- `backend-to-frontend-handoff-docs` — after implementation, generate consumer-facing behavioral documentation
+- `requirements-clarity` — if design reveals unresolved requirement gaps, return upstream
diff --git a/skills/backend-to-frontend-handoff-docs/SKILL.md b/skills/backend-to-frontend-handoff-docs/SKILL.md
index faac6de..abba46e 100644
--- a/skills/backend-to-frontend-handoff-docs/SKILL.md
+++ b/skills/backend-to-frontend-handoff-docs/SKILL.md
@@ -1,122 +1,77 @@
 ---
 name: backend-to-frontend-handoff-docs
-description: Create API handoff documentation for frontend developers. Use when backend work is complete and needs to be documented for frontend integration, or user says 'create handoff', 'document API', 'frontend handoff', or 'API documentation'.
+description: Generate API handoff documentation for frontend developers after backend work is complete. Use when backend implementation is done and needs to be documented for frontend integration. Trigger phrases: 'create handoff', 'document API', 'frontend handoff', 'API documentation', 'document endpoints for frontend'.
 ---
 
-# API Handoff Mode
+## Mindset
 
-> **No Chat Output**: Produce the handoff document only. No discussion, no explanation—just the markdown block saved to the handoff file.
+**You are writing a contract, not a tutorial.** Frontend developers (and their AI) should be able to implement integration with zero questions after reading this document. Every omission creates a Slack message, a meeting, or a bug.
 
-You are a backend developer completing API work. Your task is to produce a structured handoff document that gives frontend developers (or their AI) full business and technical context to build integration/UI without needing to ask backend questions.
+- **Asymmetry of pain**: The frontend developer discovers your omission at 4 PM on release day. Document what you know right now, even if it feels obvious.
+- **Business logic leaks through the API**: If the backend enforces "a user can only submit once per 24-hour window per org," that rule must appear in the handoff — not just the HTTP 429 it produces. Frontend needs to disable the button, not just handle the error.
+- **Error taxonomy beats status codes**: "Returns 422" tells frontend nothing. "Returns 422 with `{code: 'duplicate_submission', retryAfter: 86400}` when..." tells them how to build the error message.
+- **Enums drift**: If the backend adds a new status mid-iteration and the handoff is stale, frontend silently mishandles it. Flag all enums as exhaustive or open-ended explicitly.
+- **Auth is a contract, not an afterthought**: Document the token location (header vs cookie), the exact scope/role string, and what happens on expiry — not just "auth required."
 
-> **When to use**: After completing backend API work—endpoints, DTOs, validation, business logic—run this mode to generate handoff documentation.
+## Navigation
 
-> **Simple API shortcut**: If the API is straightforward (CRUD, no complex business logic, obvious validation), skip the full template—just provide the endpoint, method, and example request/response JSON. Frontend can infer the rest.
+**Use this skill when**:
+- Backend API implementation is complete (endpoints, DTOs, validation, business logic done)
+- A feature involves frontend integration that spans teams or AI agents
+- API contracts changed and frontend needs a delta of what broke
 
-## Goal
-Produce a copy-paste-ready handoff document with all context a frontend AI needs to build UI/integration correctly and confidently.
+**Do NOT use this skill when**:
+- Backend work is still in progress — a premature handoff becomes a liability when contracts change
+- The API is internal backend-to-backend (microservices, queue consumers)
+- Frontend already has a Swagger/OpenAPI spec auto-generated from annotations — extend that instead of creating parallel docs
 
-## Inputs
-- Completed API code (endpoints, controllers, services, DTOs, validation).
-- Related business context from the task/user story.
-- Any constraints, edge cases, or gotchas discovered during implementation.
+**Quick decision: full doc vs. shortcut**
 
-## Workflow
+| Signal | Action |
+|--------|--------|
+| CRUD endpoint, obvious field names, no business rules | Shortcut: endpoint + example JSON only |
+| Any business logic, non-obvious validation, or state machine | Full template (see `references/TEMPLATE.md`) |
+| Multiple related endpoints with shared DTOs | Full template; group by user flow, not by HTTP verb |
 
-1. **Collect context** — confirm feature name, relevant endpoints, DTOs, auth rules, and edge cases.
-2. **Create/update handoff file** — write the document to `.claude/docs/ai/<feature-name>/api-handoff.md`. Increment the iteration suffix (`-v2`, `-v3`, …) if rerunning after feedback.
-3. **Paste template** — fill every section below with concrete data. Omit subsections only when truly not applicable (note why).
-4. **Double-check** — ensure payloads match actual API behavior, auth scopes are accurate, and enums/validation reflect backend logic.
+## Philosophy
 
-## Output Format
+The handoff document is the API's *behavioral specification from the consumer's perspective*. It is not a description of what the backend does internally — it is a precise description of what the frontend can rely on, what it cannot assume, and what it must handle.
 
-Produce a single markdown block structured as follows. Keep it dense—no fluff, no repetition.
+## NEVER
 
----
+- **NEVER document internal class names, service layer names, or file paths** — because frontend contracts break when backend refactors, and internal names create cargo-cult coupling where frontend assumes structure that will change.
+- **NEVER write "see code for details"** — because the handoff's entire purpose is to eliminate the need to read backend code. Any reference to source is an admission of failure.
+- **NEVER use `any` or vague types like `object` in DTO examples** — because frontend TypeScript inference collapses to `any`, nullability bugs appear at runtime, and discriminated union narrowing is impossible without exact types.
+- **NEVER omit the error response shape** — because frontend error handling almost always branches on the error body (`code`, `message`, `field`), not just on the HTTP status. A 422 with no documented body forces frontend to `console.log` in production.
+- **NEVER document validation rules as "see frontend" or "same as UI"** — because backend validation is authoritative; frontend mirrors it for UX only. If the handoff doesn't list backend constraints, frontend re-derives them from HTTP errors at the worst possible time.
+- **NEVER describe auth as "token required"** — because "token" is ambiguous across Bearer, cookie, API key, and session schemes. Document the exact header name, token format, scope string, and the HTTP status returned on auth failure (401 vs 403 mean different things).
+- **NEVER omit nullability** — because `field?: string` (absent) vs `field: string | null` (present-but-null) are different serialization contracts. Frontend optional chaining fails differently on each, and the bug is invisible until edge data hits.
+- **NEVER ship a handoff without at least one concrete example payload per endpoint** — because abstract field descriptions ("string, the user's name") are interpreted differently by every reader. A real example is unambiguous.
+- **NEVER use "TBD" as a placeholder in a shipped handoff** — because frontend will assume TBD means "not my problem" and ship without handling that case. If it's genuinely unresolved, make it a blocking `## Open Questions` item with an owner and deadline.
+- **NEVER generate the handoff before verifying that the actual implementation matches the docs** — because copy-pasting from a spec that was written before implementation is the #1 source of handoff drift. Read the actual controller/route handler before writing the response shape.
 
-```markdown
-# API Handoff: [Feature Name]
-
-## Business Context
-[2-4 sentences: What problem does this solve? Who uses it? Why does it matter? Include any domain terms the frontend needs to understand.]
-
-## Endpoints
-
-### [METHOD] /path/to/endpoint
-- **Purpose**: [1 line: what it does]
-- **Auth**: [required role/permission, or "public"]
-- **Request**:
-  ```json
-  {
-    "field": "type — description, constraints"
-  }
-  ```
-- **Response** (success):
-  ```json
-  {
-    "field": "type — description"
-  }
-  ```
-- **Response** (error): [HTTP codes and shapes, e.g., 422 validation, 404 not found]
-- **Notes**: [edge cases, rate limits, pagination, sorting, anything non-obvious]
-
-[Repeat for each endpoint]
-
-## Data Models / DTOs
-[List key models/DTOs the frontend will receive or send. Include field types, nullability, enums, and business meaning.]
-
-```typescript
-// Example shape for frontend typing
-interface ExampleDto {
-  id: number;
-  status: 'pending' | 'approved' | 'rejected';
-  createdAt: string; // ISO 8601
-}
-```
-
-## Enums & Constants
-[List any enums, status codes, or magic values the frontend needs to know. Include display labels if relevant.]
-
-| Value | Meaning | Display Label |
-|-------|---------|---------------|
-| `pending` | Awaiting review | Pending |
-
-## Validation Rules
-[Summarize key validation rules the frontend should mirror for UX—required fields, min/max, formats, conditional rules.]
-
-## Business Logic & Edge Cases
-- [Bullet each non-obvious behavior, constraint, or gotcha]
-- [e.g., "User can only submit once per day", "Soft-deleted items excluded by default"]
-
-## Integration Notes
-- **Recommended flow**: [e.g., "Fetch list → select item → submit form → poll for status"]
-- **Optimistic UI**: [safe or not, why]
-- **Caching**: [any cache headers, invalidation triggers]
-- **Real-time**: [websocket events, polling intervals if applicable]
-
-## Test Scenarios
-[Key scenarios frontend should handle—happy path, errors, edge cases. Use as acceptance criteria or test cases.]
-
-1. **Happy path**: [brief description]
-2. **Validation error**: [what triggers it, expected response]
-3. **Not found**: [when 404 is returned]
-4. **Permission denied**: [when 403 is returned]
-
-## Open Questions / TODOs
-[Anything unresolved, pending PM decision, or needs frontend input. If none, omit section.]
-```
+## When Things Go Wrong
 
----
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| Frontend says "we're getting a different response shape than documented" | Handoff written from spec, not from live code | Re-read the actual handler/serializer; update handoff; increment version suffix |
+| Frontend asks "what does status X mean?" | Enum listed without business meaning or display label | Add `Meaning` and `Display Label` columns to enum table; see `references/TEMPLATE.md` |
+| Frontend asks "when do we get a 401 vs 403?" | Auth section only said "auth required" | Document exact condition for each: 401 = not authenticated, 403 = authenticated but lacks role/scope |
+| Handoff is already v4 and frontend still has questions | Business logic section is absent or too thin | Write a dedicated state machine table for multi-status workflows; move endpoint docs to secondary |
+| Handoff is 2000 lines and nobody reads it | Full template applied to trivial CRUD | Use the shortcut path; inline the example JSON; drop sections that add no information |
+
+## Workflow
+
+1. **Read the actual implementation** — controller/route handler, DTO/schema, middleware. Do not rely on memory or spec.
+2. **Classify complexity** — use the Navigation decision table to pick full vs. shortcut path.
+3. **Fill the template** — see `references/TEMPLATE.md` for the full document scaffold.
+4. **Write to file** — save to `.claude/docs/ai/<feature-name>/api-handoff.md`. Use `-v2`, `-v3` suffixes for iterations.
+5. **Verify before saving** — every field name, type, and status code must match the live implementation.
+
+**Output rule**: Write the markdown directly to the file. Do not echo it in chat. Reference the file path in your response only.
+
+If the project has an OpenAPI spec, recommend openapi-to-typescript to the frontend team as an alternative to hand-reading the handoff doc — it generates typed interfaces directly from the spec.
+
+## Reference Files
 
-## Rules
-- **NO CHAT OUTPUT**—produce only the handoff markdown block, nothing else.
-- Be precise: types, constraints, examples—not vague prose.
-- Include real example payloads where helpful.
-- Surface non-obvious behaviors—don't assume frontend will "just know."
-- If backend made trade-offs or assumptions, document them.
-- Keep it scannable: headers, tables, bullets, code blocks.
-- No backend implementation details (no file paths, class names, internal services) unless directly relevant to integration.
-- If something is incomplete or TBD, say so explicitly.
-
-## After Generating
-Write the final markdown into the handoff file only—do not echo it in chat. (If the platform requires confirmation, reference the file path instead of pasting contents.)
+- `references/TEMPLATE.md` — Full handoff document scaffold with all sections
diff --git a/skills/backend-to-frontend-handoff-docs/references/TEMPLATE.md b/skills/backend-to-frontend-handoff-docs/references/TEMPLATE.md
new file mode 100644
index 0000000..e99cb3a
--- /dev/null
+++ b/skills/backend-to-frontend-handoff-docs/references/TEMPLATE.md
@@ -0,0 +1,161 @@
+# API Handoff Template
+
+Use this scaffold for full handoffs (complex business logic, state machines, non-trivial validation).
+For simple CRUD, use the shortcut: endpoint + example JSON + error codes only.
+
+---
+
+```markdown
+# API Handoff: [Feature Name]
+**Version**: v1 | **Date**: YYYY-MM-DD | **Backend author**: [name/handle]
+
+## Business Context
+[2–4 sentences: What problem does this solve? Who is the user? What domain terms does frontend need? What invariants does the backend enforce that frontend must understand?]
+
+---
+
+## Endpoints
+
+### [METHOD] /path/to/endpoint
+
+- **Purpose**: [One line: what this endpoint does from the consumer's perspective]
+- **Auth**: [Exact header: `Authorization: Bearer <jwt>` with scope `expenses:approve`] OR [`public`]
+  - 401 → not authenticated (redirect to login)
+  - 403 → authenticated but lacks role `manager` (show permission error, do not retry)
+- **Request**:
+  ```json
+  {
+    "field": "value — type, constraints (e.g., required, max 500 chars, must match /^[A-Z]{2}$/)"
+  }
+  ```
+- **Response** (200/201):
+  ```json
+  {
+    "field": "value — type, nullability (e.g., string | null), meaning"
+  }
+  ```
+- **Response** (error):
+  | Status | Condition | Body shape |
+  |--------|-----------|------------|
+  | 400 | Malformed JSON | `{"error": "bad_request"}` |
+  | 422 | Validation failure | `{"errors": [{"field": "amount", "code": "too_large", "message": "..."}]}` |
+  | 409 | Duplicate submission within window | `{"code": "duplicate_submission", "retryAfter": 86400}` |
+  | 404 | Resource not found or soft-deleted | `{"error": "not_found"}` |
+- **Notes**: [Rate limits, pagination cursor vs. offset, idempotency key, ordering guarantees, anything non-obvious]
+
+[Repeat block for each endpoint]
+
+---
+
+## Data Models / DTOs
+
+```typescript
+// Exact TypeScript shape frontend should use for typing
+interface ExampleDto {
+  id: number;                          // always present
+  status: 'pending' | 'approved' | 'rejected'; // exhaustive — no other values
+  comment: string | null;              // present-but-null when no comment (not absent)
+  createdAt: string;                   // ISO 8601 UTC, e.g. "2026-01-18T10:30:00Z"
+  approvedBy?: {                       // absent (not null) when status !== 'approved'
+    id: number;
+    displayName: string;
+  };
+}
+```
+
+**Nullability conventions used in this API**:
+- `field?: T` — key absent from response (use `?.` access)
+- `field: T | null` — key present, value null (use `=== null` check)
+
+---
+
+## Enums & Constants
+
+| Value | Meaning | Display Label | Terminal? |
+|-------|---------|---------------|-----------|
+| `pending` | Submitted, awaiting review | Pending | No |
+| `approved` | Approved by manager | Approved | Yes |
+| `rejected` | Rejected by manager | Rejected | Yes |
+
+**Is this enum exhaustive?** YES — backend will not add values without a major version bump.
+(If NO: frontend must handle unknown values gracefully, e.g., display as "Unknown status".)
+
+---
+
+## Validation Rules
+
+Frontend should mirror these for UX (backend enforces them authoritatively):
+
+| Field | Rule | UX implication |
+|-------|------|----------------|
+| `amount` | Required, positive number, max 2 decimal places, ≤ 10000 | Show currency input; cap at 10,000; disable submit if 0 |
+| `comment` | Optional, max 500 chars, trimmed | Show character counter at 400+ chars |
+| `category` | Required, must match enum values | Dropdown, not free text |
+
+---
+
+## Business Logic & State Machine
+
+[Use when the feature has meaningful state transitions. Skip for stateless CRUD.]
+
+```
+pending ──[approve]──► approved (terminal)
+pending ──[reject]───► rejected (terminal)
+```
+
+- **Transition rules enforced by backend** (frontend must handle the error, not prevent the action client-side):
+  - Only users with role `manager` can approve/reject
+  - Cannot approve own submissions
+  - Terminal states cannot transition (422 with `code: already_finalized`)
+  - Approval window: must occur within 30 days of submission (422 with `code: window_expired`)
+
+---
+
+## Integration Notes
+
+- **Recommended flow**: [e.g., "Fetch list (GET /expenses) → select item → open modal → POST /expenses/:id/approve → refresh list item via GET /expenses/:id"]
+- **Optimistic UI**: [SAFE / NOT SAFE — reason. "Not safe: permission checks on approve depend on server-side role resolution."]
+- **Caching**: [e.g., "List response includes `Cache-Control: max-age=30`. Invalidate on POST/PATCH to this resource."]
+- **Real-time**: [e.g., "No websocket. Poll GET /expenses/:id every 5s while status=pending if you need live updates."]
+- **Pagination**: [cursor-based: `?cursor=<opaque_string>&limit=25` | offset-based: `?page=1&pageSize=25` | none]
+
+---
+
+## Test Scenarios
+
+Cover these in frontend integration tests or manual QA:
+
+1. **Happy path**: [e.g., "Manager approves pending expense → 200 → UI shows 'Approved' badge"]
+2. **Validation error**: [e.g., "Submit with amount=0 → 422 → show inline field error"]
+3. **Auth failure**: [e.g., "Non-manager clicks approve → 403 → show 'Permission denied' toast, do not navigate away"]
+4. **Conflict**: [e.g., "Approve already-approved expense → 422 `already_finalized` → show 'Already processed' message"]
+5. **Not found**: [e.g., "Fetch deleted expense → 404 → redirect to list with 'Item no longer available' message"]
+
+---
+
+## Open Questions / TODOs
+
+| Question | Owner | Deadline |
+|----------|-------|----------|
+| [e.g., "Should partial approvals be supported in Q2?"] | [PM name] | [date] |
+
+[Omit section entirely if none.]
+```
+
+---
+
+## Shortcut Template (Simple CRUD)
+
+For endpoints with no business logic, obvious field names, standard CRUD semantics:
+
+```markdown
+# API Handoff: [Feature Name] (Minimal)
+
+### GET /api/users/:id
+- **Auth**: Bearer token required (any authenticated user)
+- **Response** (200):
+  ```json
+  { "id": 1, "email": "user@example.com", "displayName": "Jane Smith", "avatarUrl": "https://..." }
+  ```
+- 404 if user not found or soft-deleted.
+```
diff --git a/skills/c4-architecture/SKILL.md b/skills/c4-architecture/SKILL.md
index ed972bc..7fd49b4 100644
--- a/skills/c4-architecture/SKILL.md
+++ b/skills/c4-architecture/SKILL.md
@@ -7,26 +7,80 @@ description: Generate architecture documentation using C4 model Mermaid diagrams
 
 Generate software architecture documentation using C4 model diagrams in Mermaid syntax.
 
-## Workflow
+## Mindset
+
+Expert C4 practitioners think in these terms — not in diagram syntax:
+
+1. **Audience first, diagram second.** The question is never "which diagram type?" — it is "who needs to understand what, and what do they already know?" Context diagrams for executives; Container + Deployment for DevOps; Component only when a developer is lost navigating the codebase.
+
+2. **Containers are deployment units, not logical groupings.** If you cannot `docker run` it, `kubectl apply` it, or deploy it independently, it is a Component — not a Container. Misclassifying this one thing causes 80% of C4 diagram confusion.
+
+3. **Stop at the level that answers the question.** Creating Level 3 Component diagrams "just in case" adds maintenance burden with no clarity gain. Context + Container diagrams are sufficient for most teams.
+
+4. **Name relationships with verb phrases, not nouns.** "Reads customer data from" is better than "Database connection". Relationships carry meaning; labels are not decoration.
+
+5. **Diagram sprawl is worse than no diagram.** One accurate Context diagram beats ten stale Component diagrams. Keep diagrams minimal enough that they stay current.
 
-1. **Understand scope** - Determine which C4 level(s) are needed based on audience
-2. **Analyze codebase** - Explore the system to identify components, containers, and relationships
-3. **Generate diagrams** - Create Mermaid C4 diagrams at appropriate abstraction levels
-4. **Document** - Write diagrams to markdown files with explanatory context
+## Navigation
 
-## C4 Diagram Levels
+**Use this skill when:**
+- Asked to document, visualize, or diagram system architecture
+- Onboarding new engineers who need a map of the system
+- Planning a new system or major feature and need to communicate design
+- Performing architecture review and need a baseline
 
-Select the appropriate level based on the documentation need:
+**Do NOT use this skill when:**
+- The user wants a sequence diagram for a single request flow → use a standard Mermaid sequence diagram instead
+- The user wants an entity-relationship diagram → use Mermaid `erDiagram`
+- The user wants a network topology diagram with physical hardware detail → use a deployment diagram only if C4 conventions fit, otherwise note the limitation
 
-| Level | Diagram Type | Audience | Shows | When to Create |
-|-------|-------------|----------|-------|----------------|
-| 1 | **C4Context** | Everyone | System + external actors | Always (required) |
-| 2 | **C4Container** | Technical | Apps, databases, services | Always (required) |
-| 3 | **C4Component** | Developers | Internal components | Only if adds value |
-| 4 | **C4Deployment** | DevOps | Infrastructure nodes | For production systems |
-| - | **C4Dynamic** | Technical | Request flows (numbered) | For complex workflows |
+### Decision Tree: Which level(s) to generate?
 
-**Key Insight:** "Context + Container diagrams are sufficient for most software development teams." Only create Component/Code diagrams when they genuinely add value.
+```
+Start: Who is the primary audience?
+│
+├─ Non-technical (executives, product) → Level 1 ONLY (C4Context)
+│
+├─ Technical but not hands-on (architects, TPMs) → Level 1 + Level 2 (C4Container)
+│
+├─ Developers unfamiliar with the codebase → Level 1 + Level 2 + Level 3 for the specific area in question (C4Component)
+│
+├─ DevOps / SRE / infra → Level 2 + Level 4 (C4Container + C4Deployment)
+│
+└─ Complex async workflow needs explaining → Add C4Dynamic for that flow only
+```
+
+### Decision Tree: How to model microservices ownership?
+
+```
+Microservice owned by...
+│
+├─ Same team as the system being documented → model as Container inside a System_Boundary
+│
+└─ Different team → model as System_Ext at Level 1; only expand to Container in that team's own diagram
+```
+
+### Decision Tree: How to model a message broker (Kafka, RabbitMQ, SQS)?
+
+```
+Does the diagram need to show data flows?
+│
+├─ Yes → show individual topics/queues as ContainerQueue elements; do NOT show "Kafka" as a single box
+│
+└─ No (just showing system exists) → single System or Container named for the broker is acceptable
+```
+
+## Philosophy
+
+The C4 model's power is that each level answers exactly one question for one audience — the moment a diagram tries to answer two questions, it fails both. Generate diagrams at the lowest level of abstraction that resolves the audience's actual question.
+
+## Workflow
+
+1. **Identify audience and question** — determine who will read this and what decision it supports
+2. **Select level(s)** — use the decision trees above; default to Level 1 + Level 2 only
+3. **Analyze codebase** — identify containers, external systems, and key relationships
+4. **Generate diagrams** — write Mermaid C4 diagrams at selected levels
+5. **Document** — write to `docs/architecture/` with standard naming (see Output Location)
 
 ## Quick Start Examples
 
@@ -61,199 +115,7 @@ C4Container
   Rel(pinia, indexeddb, "Persists", "Dexie ORM")
 ```
 
-### Component Diagram (Level 3)
-```mermaid
-C4Component
-  title Component Diagram - Workout Feature
-
-  Container(views, "Views", "Vue Router pages")
-
-  Container_Boundary(workout, "Workout Feature") {
-    Component(useWorkout, "useWorkout", "Composable", "Workout execution state")
-    Component(useTimer, "useTimer", "Composable", "Timer state machine")
-    Component(workoutRepo, "WorkoutRepository", "Dexie", "Workout persistence")
-  }
-
-  Rel(views, useWorkout, "Uses")
-  Rel(useWorkout, useTimer, "Controls")
-  Rel(useWorkout, workoutRepo, "Saves to")
-```
-
-### Dynamic Diagram (Request Flow)
-```mermaid
-C4Dynamic
-  title Dynamic Diagram - User Sign In Flow
-
-  ContainerDb(db, "Database", "PostgreSQL", "User credentials")
-  Container(spa, "Single-Page App", "React", "Banking UI")
-
-  Container_Boundary(api, "API Application") {
-    Component(signIn, "Sign In Controller", "Express", "Auth endpoint")
-    Component(security, "Security Service", "JWT", "Validates credentials")
-  }
-
-  Rel(spa, signIn, "1. Submit credentials", "JSON/HTTPS")
-  Rel(signIn, security, "2. Validate")
-  Rel(security, db, "3. Query user", "SQL")
-
-  UpdateRelStyle(spa, signIn, $textColor="blue", $offsetY="-30")
-```
-
-### Deployment Diagram
-```mermaid
-C4Deployment
-  title Deployment Diagram - Production
-
-  Deployment_Node(browser, "Customer Browser", "Chrome/Firefox") {
-    Container(spa, "SPA", "React", "Web application")
-  }
-
-  Deployment_Node(aws, "AWS Cloud", "us-east-1") {
-    Deployment_Node(ecs, "ECS Cluster", "Fargate") {
-      Container(api, "API Service", "Node.js", "REST API")
-    }
-    Deployment_Node(rds, "RDS", "db.r5.large") {
-      ContainerDb(db, "Database", "PostgreSQL", "Application data")
-    }
-  }
-
-  Rel(spa, api, "API calls", "HTTPS")
-  Rel(api, db, "Reads/writes", "JDBC")
-```
-
-## Element Syntax
-
-### People and Systems
-```
-Person(alias, "Label", "Description")
-Person_Ext(alias, "Label", "Description")       # External person
-System(alias, "Label", "Description")
-System_Ext(alias, "Label", "Description")       # External system
-SystemDb(alias, "Label", "Description")         # Database system
-SystemQueue(alias, "Label", "Description")      # Queue system
-```
-
-### Containers
-```
-Container(alias, "Label", "Technology", "Description")
-Container_Ext(alias, "Label", "Technology", "Description")
-ContainerDb(alias, "Label", "Technology", "Description")
-ContainerQueue(alias, "Label", "Technology", "Description")
-```
-
-### Components
-```
-Component(alias, "Label", "Technology", "Description")
-Component_Ext(alias, "Label", "Technology", "Description")
-ComponentDb(alias, "Label", "Technology", "Description")
-```
-
-### Boundaries
-```
-Enterprise_Boundary(alias, "Label") { ... }
-System_Boundary(alias, "Label") { ... }
-Container_Boundary(alias, "Label") { ... }
-Boundary(alias, "Label", "type") { ... }
-```
-
-### Relationships
-```
-Rel(from, to, "Label")
-Rel(from, to, "Label", "Technology")
-BiRel(from, to, "Label")                        # Bidirectional
-Rel_U(from, to, "Label")                        # Upward
-Rel_D(from, to, "Label")                        # Downward
-Rel_L(from, to, "Label")                        # Leftward
-Rel_R(from, to, "Label")                        # Rightward
-```
-
-### Deployment Nodes
-```
-Deployment_Node(alias, "Label", "Type", "Description") { ... }
-Node(alias, "Label", "Type", "Description") { ... }  # Shorthand
-```
-
-## Styling and Layout
-
-### Layout Configuration
-```
-UpdateLayoutConfig($c4ShapeInRow="3", $c4BoundaryInRow="1")
-```
-- `$c4ShapeInRow` - Number of shapes per row (default: 4)
-- `$c4BoundaryInRow` - Number of boundaries per row (default: 2)
-
-### Element Styling
-```
-UpdateElementStyle(alias, $fontColor="red", $bgColor="grey", $borderColor="red")
-```
-
-### Relationship Styling
-```
-UpdateRelStyle(from, to, $textColor="blue", $lineColor="blue", $offsetX="5", $offsetY="-10")
-```
-Use `$offsetX` and `$offsetY` to fix overlapping relationship labels.
-
-## Best Practices
-
-### Essential Rules
-
-1. **Every element must have**: Name, Type, Technology (where applicable), and Description
-2. **Use unidirectional arrows only** - Bidirectional arrows create ambiguity
-3. **Label arrows with action verbs** - "Sends email using", "Reads from", not just "uses"
-4. **Include technology labels** - "JSON/HTTPS", "JDBC", "gRPC"
-5. **Stay under 20 elements per diagram** - Split complex systems into multiple diagrams
-
-### Clarity Guidelines
-
-1. **Start at Level 1** - Context diagrams help frame the system scope
-2. **One diagram per file** - Keep diagrams focused on a single abstraction level
-3. **Meaningful aliases** - Use descriptive aliases (e.g., `orderService` not `s1`)
-4. **Concise descriptions** - Keep descriptions under 50 characters when possible
-5. **Always include a title** - "System Context diagram for [System Name]"
-
-### What to Avoid
-
-See [references/common-mistakes.md](references/common-mistakes.md) for detailed anti-patterns:
-- Confusing containers (deployable) vs components (non-deployable)
-- Modeling shared libraries as containers
-- Showing message brokers as single containers instead of individual topics
-- Adding undefined abstraction levels like "subcomponents"
-- Removing type labels to "simplify" diagrams
-
-## Microservices Guidelines
-
-### Single Team Ownership
-Model each microservice as a **container** (or container group):
-```mermaid
-C4Container
-  title Microservices - Single Team
-
-  System_Boundary(platform, "E-commerce Platform") {
-    Container(orderApi, "Order Service", "Spring Boot", "Order processing")
-    ContainerDb(orderDb, "Order DB", "PostgreSQL", "Order data")
-    Container(inventoryApi, "Inventory Service", "Node.js", "Stock management")
-    ContainerDb(inventoryDb, "Inventory DB", "MongoDB", "Stock data")
-  }
-```
-
-### Multi-Team Ownership
-Promote microservices to **software systems** when owned by separate teams:
-```mermaid
-C4Context
-  title Microservices - Multi-Team
-
-  Person(customer, "Customer", "Places orders")
-  System(orderSystem, "Order System", "Team Alpha")
-  System(inventorySystem, "Inventory System", "Team Beta")
-  System(paymentSystem, "Payment System", "Team Gamma")
-
-  Rel(customer, orderSystem, "Places orders")
-  Rel(orderSystem, inventorySystem, "Checks stock")
-  Rel(orderSystem, paymentSystem, "Processes payment")
-```
-
-### Event-Driven Architecture
-Show individual topics/queues as containers, NOT a single "Kafka" box:
+### Event-Driven Architecture (Kafka/queues)
 ```mermaid
 C4Container
   title Event-Driven Architecture
@@ -269,14 +131,36 @@ C4Container
   Rel(orderService, stockTopic, "Subscribes to")
 ```
 
+> Full syntax for all element types, styling, and layout: [references/c4-syntax.md](references/c4-syntax.md)
+
+## NEVER
+
+- **NEVER model a Java/Python class or module as a Container.** Containers are independently deployable. Misclassifying creates diagrams that look technical but communicate nothing — developers will correctly distrust them.
+- **NEVER show a message broker (Kafka, RabbitMQ) as a single container when data flows matter.** A "Kafka" box hides which services are coupled to which topics — the exact information the diagram needs to convey.
+- **NEVER create Component diagrams without a specific developer question driving them.** Component diagrams become stale within weeks of code changes; they cost more to maintain than they save unless targeting a specific navigation problem.
+- **NEVER use bidirectional arrows (`BiRel`) as a default.** Most relationships have a initiator and a responder. `BiRel` signals "I wasn't sure" and destroys the ability to trace data flows through the diagram.
+- **NEVER label relationships with nouns alone** (e.g., "API", "Database"). Noun-only labels make the diagram look complete while conveying nothing. Always use a verb phrase: "Submits orders via", "Reads config from".
+- **NEVER put implementation detail (method names, SQL queries, config keys) in a C4 diagram.** C4 diagrams communicate structure and intent at a business-meaningful level; implementation belongs in code comments and ADRs.
+- **NEVER remove type labels from elements to "simplify" the diagram.** The technology label (e.g., "PostgreSQL", "React") is load-bearing — it tells the reader what kind of thing this is and what constraints apply.
+
+## When Things Go Wrong
+
+| Symptom | Likely cause | Fix |
+|---------|-------------|-----|
+| Mermaid renders a blank diagram | Missing diagram type keyword (`C4Context`, `C4Container`, etc.) or syntax error in element line | Validate each line; check that all `{` boundaries are closed |
+| Diagram is unreadably crowded | Too many elements at one level | Split: use `System_Ext` to collapse external systems; create separate diagram per bounded context |
+| Reviewers argue about what counts as a "container" | Mixing deployable units with logical groupings | Reframe: "can this be restarted/scaled independently?" — yes = Container, no = Component |
+| Event-driven system looks like spaghetti | Showing broker as single node with many arrows | Replace single broker box with individual `ContainerQueue` elements per topic; group by owning service |
+| Multi-team microservices diagram is unnavigable | All services at same level regardless of ownership | Apply ownership decision tree: cross-team services become `System_Ext` at Level 1 |
+
 ## Output Location
 
 Write architecture documentation to `docs/architecture/` with naming convention:
-- `c4-context.md` - System context diagram
-- `c4-containers.md` - Container diagram
-- `c4-components-{feature}.md` - Component diagrams per feature
-- `c4-deployment.md` - Deployment diagram
-- `c4-dynamic-{flow}.md` - Dynamic diagrams for specific flows
+- `c4-context.md` — System context diagram
+- `c4-containers.md` — Container diagram
+- `c4-components-{feature}.md` — Component diagrams per feature
+- `c4-deployment.md` — Deployment diagram
+- `c4-dynamic-{flow}.md` — Dynamic diagrams for specific flows
 
 ## Audience-Appropriate Detail
 
@@ -290,6 +174,6 @@ Write architecture documentation to `docs/architecture/` with naming convention:
 
 ## References
 
-- [references/c4-syntax.md](references/c4-syntax.md) - Complete Mermaid C4 syntax
-- [references/common-mistakes.md](references/common-mistakes.md) - Anti-patterns to avoid
-- [references/advanced-patterns.md](references/advanced-patterns.md) - Microservices, event-driven, deployment
+- [references/c4-syntax.md](references/c4-syntax.md) — Complete Mermaid C4 syntax (all element types, boundaries, relationships, styling, layout)
+- [references/common-mistakes.md](references/common-mistakes.md) — Anti-patterns with corrected examples
+- [references/advanced-patterns.md](references/advanced-patterns.md) — Microservices, event-driven, multi-team, deployment patterns
diff --git a/skills/codex/SKILL.md b/skills/codex/SKILL.md
index 82f09ba..036d063 100644
--- a/skills/codex/SKILL.md
+++ b/skills/codex/SKILL.md
@@ -1,66 +1,94 @@
 ---
 name: codex
-description: Use when the user asks to run Codex CLI (codex exec, codex resume) or references OpenAI Codex for code analysis, refactoring, or automated editing. Uses GPT-5.2 by default for state-of-the-art software engineering.
+description: Use when the user asks to run Codex CLI (codex exec, codex resume), invokes OpenAI Codex for code analysis, refactoring, automated editing, or multi-step agentic coding workflows. Trigger phrases: "use codex", "run codex", "codex resume", "codex analyze", "codex refactor". Uses GPT-5.2 by default.
 ---
 
-# Codex Skill Guide
+## Mindset
+
+- **Sandbox discipline first** — the sandbox choice is the highest-stakes decision. Read-only is the safe default; escalate only when the task genuinely cannot complete without writes or network. Never let convenience drive you to `danger-full-access`.
+- **Stderr is noise, not signal** — Codex streams thinking tokens to stderr. Suppress with `2>/dev/null` by default; only surface stderr when the user explicitly asks for reasoning traces or when diagnosing a failure.
+- **Resume sessions, don't restart them** — re-running `codex exec` from scratch loses accumulated context. If a prior session exists and the user wants to continue, always reach for `resume --last`.
+- **Flag placement on resume is load-bearing** — flags between `exec` and `resume` set session-level config; flags after `resume` are task-level. Mixing them silently corrupts behavior.
+- **Reasoning effort multiplies cost non-linearly** — `xhigh` can run 5–10× longer than `medium` on hard tasks. Match effort to actual complexity; don't default to max.
+
+## Navigation
+
+**Use this skill when**:
+- User says "use codex", "run codex", "codex exec", "codex resume", or "codex analyze/refactor"
+- Task requires an agentic coding loop (plan → edit → verify) across multiple files
+- User wants to leverage GPT-5.2's SWE-bench-optimized reasoning on a codebase
+
+**Do NOT use this skill when**:
+- The user wants Claude Code itself to edit files (no Codex invocation needed)
+- Quick single-file edits that Claude can perform directly — Codex overhead isn't worth it
+- User hasn't confirmed `codex --version` works (unresolved install errors will silently fail)
+
+**Sandbox selection** (most common ambiguity):
+
+```
+Writing files?
+  NO  → read-only
+  YES → Network or subprocess needed?
+    NO  → workspace-write
+    YES → danger-full-access  (ask user first)
+```
+
+## Philosophy
+
+Codex is a second reasoning engine, not a shell shortcut. Use it when the problem requires sustained multi-step agency — planning, editing, verifying — that exceeds what a single Claude Code turn can deliver reliably.
+
+## NEVER
+
+- **NEVER use `--sandbox danger-full-access` without explicit user confirmation** — it grants process spawning, arbitrary network access, and filesystem writes with no guardrails; a single bad prompt can modify or exfiltrate files outside the working directory.
+- **NEVER omit `--skip-git-repo-check`** — without it, Codex aborts when invoked outside a git root, and the error message is cryptic enough that users assume Codex is broken rather than misconfigured.
+- **NEVER pass model or effort flags after `resume --last`** — they appear to be accepted but are silently ignored; the session inherits its original config. Passing them misleads the user into thinking they changed the behavior.
+- **NEVER show stderr by default** — thinking tokens flood Claude Code's context window, consuming tokens that degrade subsequent turns. Suppress with `2>/dev/null` unless the user explicitly asks for reasoning traces.
+- **NEVER restart a session when `resume --last` exists** — restarting drops accumulated file context, tool call history, and any partial edits Codex was tracking. Resume is almost always correct.
+- **NEVER use `xhigh` reasoning on simple tasks** — it can run 5–10× longer and cost significantly more without improving output quality on straightforward changes like formatting or minor bug fixes.
+- **NEVER skip `codex --version` when the user reports unexpected behavior** — CLI version mismatches (pre-v0.57.0) silently fall back to older models rather than erroring, making GPT-5.2 features unavailable.
+
+## When Things Go Wrong
+
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| `codex exec` exits non-zero immediately | Missing `--skip-git-repo-check` outside git root, or CLI not on PATH | Add flag; verify `codex --version` succeeds |
+| Resume produces wrong model/effort | Flags placed after `resume` keyword instead of between `exec` and `resume` | Correct flag order: `codex exec [flags] --skip-git-repo-check resume --last` |
+| Output is empty or truncated | stderr not suppressed, thinking tokens consumed context budget | Add `2>/dev/null`; if debugging, redirect stderr to a file instead |
+| Task completes but no edits appear | `read-only` sandbox used for an edit task | Rerun with `--sandbox workspace-write --full-auto` |
+| Cost/time far exceeds expectation | Reasoning effort set to `xhigh` or `high` for a simple task | Drop to `medium` or `low`; verify with `model_reasoning_effort` config |
+
+---
 
 ## Running a Task
-1. Default to `gpt-5.2` model. Ask the user (via `AskUserQuestion`) which reasoning effort to use (`xhigh`,`high`, `medium`, or `low`). User can override model if needed (see Model Options below).
-2. Select the sandbox mode required for the task; default to `--sandbox read-only` unless edits or network access are necessary.
-3. Assemble the command with the appropriate options:
-   - `-m, --model <MODEL>`
-   - `--config model_reasoning_effort="<high|medium|low>"`
-   - `--sandbox <read-only|workspace-write|danger-full-access>`
-   - `--full-auto`
-   - `-C, --cd <DIR>`
-   - `--skip-git-repo-check`
-3. Always use --skip-git-repo-check.
-4. When continuing a previous session, use `codex exec --skip-git-repo-check resume --last` via stdin. When resuming don't use any configuration flags unless explicitly requested by the user e.g. if he species the model or the reasoning effort when requesting to resume a session. Resume syntax: `echo "your prompt here" | codex exec --skip-git-repo-check resume --last 2>/dev/null`. All flags have to be inserted between exec and resume.
-5. **IMPORTANT**: By default, append `2>/dev/null` to all `codex exec` commands to suppress thinking tokens (stderr). Only show stderr if the user explicitly requests to see thinking tokens or if debugging is needed.
-6. Run the command, capture stdout/stderr (filtered as appropriate), and summarize the outcome for the user.
-7. **After Codex completes**, inform the user: "You can resume this Codex session at any time by saying 'codex resume' or asking me to continue with additional analysis or changes."
-
-### Quick Reference
-| Use case | Sandbox mode | Key flags |
-| --- | --- | --- |
-| Read-only review or analysis | `read-only` | `--sandbox read-only 2>/dev/null` |
-| Apply local edits | `workspace-write` | `--sandbox workspace-write --full-auto 2>/dev/null` |
-| Permit network or broad access | `danger-full-access` | `--sandbox danger-full-access --full-auto 2>/dev/null` |
-| Resume recent session | Inherited from original | `echo "prompt" \| codex exec --skip-git-repo-check resume --last 2>/dev/null` (no flags allowed) |
-| Run from another directory | Match task needs | `-C <DIR>` plus other flags `2>/dev/null` |
-
-## Model Options
-
-| Model | Best for | Context window | Key features |
-| --- | --- | --- | --- |
-| `gpt-5.2-max` | **Max model**: Ultra-complex reasoning, deep problem analysis | 400K input / 128K output | 76.3% SWE-bench, adaptive reasoning, $1.25/$10.00 |
-| `gpt-5.2` ⭐ | **Flagship model**: Software engineering, agentic coding workflows | 400K input / 128K output | 76.3% SWE-bench, adaptive reasoning, $1.25/$10.00 |
-| `gpt-5.2-mini` | Cost-efficient coding (4x more usage allowance) | 400K input / 128K output | Near SOTA performance, $0.25/$2.00 |
-| `gpt-5.1-thinking` | Ultra-complex reasoning, deep problem analysis | 400K input / 128K output | Adaptive thinking depth, runs 2x slower on hardest tasks |
-
-**GPT-5.2 Advantages**: 76.3% SWE-bench (vs 72.8% GPT-5), 30% faster on average tasks, better tool handling, reduced hallucinations, improved code quality. Knowledge cutoff: September 30, 2024.
-
-**Reasoning Effort Levels**:
-- `xhigh` - Ultra-complex tasks (deep problem analysis, complex reasoning, deep understanding of the problem)
-- `high` - Complex tasks (refactoring, architecture, security analysis, performance optimization)
-- `medium` - Standard tasks (refactoring, code organization, feature additions, bug fixes)
-- `low` - Simple tasks (quick fixes, simple changes, code formatting, documentation)
-
-**Cached Input Discount**: 90% off ($0.125/M tokens) for repeated context, cache lasts up to 24 hours.
 
-## Following Up
-- After every `codex` command, immediately use `AskUserQuestion` to confirm next steps, collect clarifications, or decide whether to resume with `codex exec resume --last`.
-- When resuming, pipe the new prompt via stdin: `echo "new prompt" | codex exec resume --last 2>/dev/null`. The resumed session automatically uses the same model, reasoning effort, and sandbox mode from the original session.
-- Restate the chosen model, reasoning effort, and sandbox mode when proposing follow-up actions.
+1. Check `codex --version` if behavior is unexpected (requires v0.57.0+ for GPT-5.2).
+2. Ask user for reasoning effort via `AskUserQuestion` (`xhigh` / `high` / `medium` / `low`). Default model: `gpt-5.2`.
+3. Select sandbox using the decision tree in **Navigation** above.
+4. Assemble the command:
+   ```bash
+   codex exec \
+     -m gpt-5.2 \
+     --config model_reasoning_effort="<level>" \
+     --sandbox <mode> \
+     --full-auto \
+     --skip-git-repo-check \
+     [-C <DIR>] \
+     "your prompt" 2>/dev/null
+   ```
+5. After completion, offer resume: "You can continue this session by saying 'codex resume'."
 
-## Error Handling
-- Stop and report failures whenever `codex --version` or a `codex exec` command exits non-zero; request direction before retrying.
-- Before you use high-impact flags (`--full-auto`, `--sandbox danger-full-access`, `--skip-git-repo-check`) ask the user for permission using AskUserQuestion unless it was already given.
-- When output includes warnings or partial results, summarize them and ask how to adjust using `AskUserQuestion`.
+## Resuming a Session
 
-## CLI Version
+```bash
+echo "your follow-up prompt" | codex exec --skip-git-repo-check resume --last 2>/dev/null
+```
 
-Requires Codex CLI v0.57.0 or later for GPT-5.2 model support. The CLI defaults to `gpt-5.2` on macOS/Linux and `gpt-5.2` on Windows. Check version: `codex --version`
+All flags go between `exec` and `resume`. Do not add model/effort flags unless the user explicitly overrides them.
+
+## Following Up
+
+After every `codex exec`, use `AskUserQuestion` to confirm next steps or collect clarifications. Restate model, effort, and sandbox when proposing follow-up actions.
+
+---
 
-Use `/model` slash command within a Codex session to switch models, or configure default in `~/.codex/config.toml`.
+**Full model/sandbox reference**: `references/model-sandbox-reference.md`
diff --git a/skills/codex/references/model-sandbox-reference.md b/skills/codex/references/model-sandbox-reference.md
new file mode 100644
index 0000000..c0e06f0
--- /dev/null
+++ b/skills/codex/references/model-sandbox-reference.md
@@ -0,0 +1,55 @@
+# Codex CLI — Model & Sandbox Reference
+
+## Model Comparison Table
+
+| Model | Best for | Context | Pricing (in/out per M) |
+|-------|----------|---------|------------------------|
+| `gpt-5.2-max` | Ultra-complex reasoning, deep architectural analysis | 400K / 128K | $1.25 / $10.00 |
+| `gpt-5.2` ⭐ | Flagship — software engineering, agentic coding | 400K / 128K | $1.25 / $10.00 |
+| `gpt-5.2-mini` | Cost-efficient; 4× usage headroom | 400K / 128K | $0.25 / $2.00 |
+| `gpt-5.1-thinking` | Ultra-complex reasoning; runs ~2× slower on hardest tasks | 400K / 128K | varies |
+
+**Cached input discount**: 90% off ($0.125/M) for repeated context; cache lasts up to 24 hours.
+
+## Reasoning Effort Levels
+
+| Level | When to use |
+|-------|-------------|
+| `xhigh` | Deep problem analysis, complex multi-file reasoning |
+| `high` | Refactoring, architecture, security analysis, performance |
+| `medium` | Feature additions, bug fixes, code organization |
+| `low` | Quick fixes, formatting, simple documentation changes |
+
+## Sandbox Mode Decision Tree
+
+```
+Does the task require writing files?
+├── NO  → --sandbox read-only          (analysis, review, Q&A)
+└── YES → Does it require network or process spawning?
+          ├── NO  → --sandbox workspace-write   (edits, refactors)
+          └── YES → --sandbox danger-full-access (install deps, run tests, CI)
+```
+
+## Quick Reference Command Table
+
+| Use case | Sandbox | Key flags |
+|----------|---------|-----------|
+| Read-only review/analysis | `read-only` | `--sandbox read-only 2>/dev/null` |
+| Apply local edits | `workspace-write` | `--sandbox workspace-write --full-auto 2>/dev/null` |
+| Network or broad access | `danger-full-access` | `--sandbox danger-full-access --full-auto 2>/dev/null` |
+| Resume recent session | Inherited | `echo "prompt" \| codex exec --skip-git-repo-check resume --last 2>/dev/null` |
+| Run from another directory | Match task | `-C <DIR>` plus other flags `2>/dev/null` |
+
+## Resume Session Rules
+
+- Resume syntax: `echo "your prompt" | codex exec --skip-git-repo-check resume --last 2>/dev/null`
+- All flags go **between** `exec` and `resume`
+- Do NOT pass model/effort flags on resume unless user explicitly requests — session inherits originals
+- `--skip-git-repo-check` is always required, even on resume
+
+## CLI Version Requirements
+
+- Requires Codex CLI **v0.57.0+** for GPT-5.2 support
+- Check: `codex --version`
+- Switch model mid-session: `/model` slash command inside Codex session
+- Default config: `~/.codex/config.toml`
diff --git a/skills/command-creator/SKILL.md b/skills/command-creator/SKILL.md
index bd27ac1..da40b08 100644
--- a/skills/command-creator/SKILL.md
+++ b/skills/command-creator/SKILL.md
@@ -1,210 +1,174 @@
 ---
 name: command-creator
-description: This skill should be used when creating a Claude Code slash command. Use when users ask to "create a command", "make a slash command", "add a command", or want to document a workflow as a reusable command. Essential for creating optimized, agent-executable slash commands with proper structure and best practices.
+description: "Design and scaffold Claude Code slash commands (reusable workflow shortcuts invocable as /command-name). Covers prompt engineering for commands, parameter design, example cases, and command anti-patterns. Use when asked to automate a workflow as a slash command, create a /command, or turn a recurring prompt into a persistent tool. Triggers: slash command, create a command, /command-name pattern."
 ---
 
 # Command Creator
 
-This skill guides the creation of Claude Code slash commands - reusable workflows that can be invoked with `/command-name` in Claude Code conversations.
+This skill guides the creation of Claude Code slash commands — reusable workflows invoked with `/command-name` in Claude Code conversations.
 
-## About Slash Commands
-
-Slash commands are markdown files stored in `.claude/commands/` (project-level) or `~/.claude/commands/` (global/user-level) that get expanded into prompts when invoked. They're ideal for:
-
-- Repetitive workflows (code review, PR submission, CI fixing)
-- Multi-step processes that need consistency
-- Agent delegation patterns
-- Project-specific automation
+## Mindset
 
-## When to Use This Skill
+Expert heuristics for command authorship:
 
-Invoke this skill when users:
+1. **Write for autonomy, not for humans.** Every instruction must be executable without clarifying questions. Vague steps ("check for errors") are failures — the agent will stall or hallucinate.
+2. **The description IS the interface.** The frontmatter `description` is what appears in `/help`. A weak description means no one invokes the command correctly. Write it as an imperative action statement.
+3. **Anti-patterns before steps.** Identify what NOT to do first. The costliest failures (batching todos, tool confusion, retry loops) come from omission, not malformed steps.
+4. **Pattern before content.** Choose a workflow pattern (Analyze→Act→Report, Run→Fix→Repeat, etc.) before writing a single instruction line. Mismatched pattern = structurally broken command.
+5. **Success criteria are mandatory, not optional.** A command without a defined stop condition will run forever or stop arbitrarily. Both are failures.
 
-- Ask to "create a command" or "make a slash command"
-- Want to automate a repetitive workflow
-- Need to document a consistent process for reuse
-- Say "I keep doing X, can we make a command for it?"
-- Want to create project-specific or global commands
+## Navigation
 
-## Bundled Resources
+**Use this skill when:**
+- User asks to "create a command", "make a slash command", "add a command", "automate a workflow"
+- User says "I keep doing X, can we make a command for it?"
+- User wants to package a multi-step process for reuse
+- User wants project-specific or global automation
 
-This skill includes reference documentation for detailed guidance:
+**Do NOT use this skill when:**
+- User wants to run an existing command (just run it)
+- User wants to modify Claude's behavior via settings.json (use update-config skill)
+- User wants a script, not a slash command — scripts live in `scripts/`, not `.claude/commands/`
 
-- **references/patterns.md** - Command patterns (workflow automation, iterative fixing, agent delegation, simple execution)
-- **references/examples.md** - Real command examples with full source (submit-stack, ensure-ci, create-implementation-plan)
-- **references/best-practices.md** - Quality checklist, common pitfalls, writing guidelines, template structure
+**Decision tree — which pattern to recommend:**
 
-Load these references as needed when creating commands to understand patterns, see examples, or ensure quality.
+```
+Does the command fix/retry until passing?
+  YES → Iterative Fixing (Run→Parse→Fix→Repeat)
+  NO  → Does it coordinate multiple agents?
+          YES → Agent Delegation (Context→Delegate→Iterate)
+          NO  → Does it run one tool and return output?
+                  YES → Simple Execution (Parse→Execute→Report)
+                  NO  → Workflow Automation (Analyze→Act→Report)
+```
 
-## Command Structure Overview
+**Location decision:**
+```
+Is user inside a git repo?
+  YES → .claude/commands/ (project-level)
+  NO  → ~/.claude/commands/ (global)
+User says "global"? → Always ~/.claude/commands/
+User says "project"? → Always .claude/commands/
+```
 
-Every slash command is a markdown file with:
+## Philosophy
 
-```markdown
----
-description: Brief description shown in /help (required)
-argument-hint: <placeholder> (optional, if command takes arguments)
----
+Commands are not documentation — they are agent programs. Every line must be an unambiguous, executable instruction. Optimize for the agent that will run it cold, without prior context, at 3am on a CI failure.
 
-# Command Title
+## NEVER
 
-[Detailed instructions for the agent to execute autonomously]
-```
+Anti-patterns that silently break commands — each has a non-obvious failure mode:
 
-## Command Creation Workflow
+1. **NEVER use second-person ("you should", "you need to").** Agents parse imperative instructions; second-person triggers a reasoning mode that introduces hesitation and paraphrasing, causing drift from the intended steps.
+2. **NEVER omit a stop condition on iterative commands.** Without `max iterations: N` or stuck-detection logic, the agent enters an infinite loop on unfixable errors, burning context and never surfacing the root cause.
+3. **NEVER put anti-patterns only in reference files.** If the NEVER list isn't visible when the command is invoked, the agent never reads it. Critical constraints must be inline.
+4. **NEVER use underscores in command names** (`submit_stack` → broken). Claude Code uses the filename as the invocation key; underscores silently prevent the command from appearing in `/help`.
+5. **NEVER batch todo completions.** Marking all todos done at the end defeats progress tracking and means a mid-execution failure looks like success. Mark each todo complete immediately after its task finishes.
+6. **NEVER write vague description fields** (`description: A command for CI stuff`). The description is the only thing visible in `/help` — vague descriptions mean the command is never invoked for its intended purpose.
+7. **NEVER use the Task tool when Bash tool is correct** (and vice versa). Task tool spins up a subagent (expensive, for delegation). Bash tool runs commands directly. Using Task for `make lint` wastes context; using Bash for multi-agent orchestration silently drops agent specialization.
 
-### Step 1: Determine Location
+## When Things Go Wrong
 
-**Auto-detect the appropriate location:**
+| Symptom | Root Cause | Fix |
+|---|---|---|
+| Command never appears in `/help` | Filename uses underscores instead of hyphens | Rename file to `kebab-case.md` |
+| Agent asks clarifying questions mid-execution | Steps are vague or use second-person | Rewrite steps in imperative form with explicit expected outcomes |
+| Agent loops forever on CI failure | No stuck-detection or max-iteration guard | Add `If same error appears 3 times: STOP and report` |
+| Todos all marked done instantly | Instructions say "mark all complete at end" | Rewrite to mark each todo complete immediately after its step |
+| Wrong location (project vs global) | Auto-detect not run, or user intent not checked | Run `git rev-parse --is-inside-work-tree` first; confirm with user |
+| Command breaks on some machines | Hard-coded absolute paths in command body | Use relative paths or environment variables |
 
-1. Check git repository status: `git rev-parse --is-inside-work-tree 2>/dev/null`
-2. Default location:
-   - If in git repo → Project-level: `.claude/commands/`
-   - If not in git repo → Global: `~/.claude/commands/`
-3. Allow user override:
-   - If user explicitly mentions "global" or "user-level" → Use `~/.claude/commands/`
-   - If user explicitly mentions "project" or "project-level" → Use `.claude/commands/`
+## About Slash Commands
 
-Report the chosen location to the user before proceeding.
+Slash commands are markdown files stored in `.claude/commands/` (project-level) or `~/.claude/commands/` (global/user-level) that get expanded into prompts when invoked. They are ideal for:
 
-### Step 2: Show Command Patterns
+- Repetitive workflows (code review, PR submission, CI fixing)
+- Multi-step processes that need consistency
+- Agent delegation patterns
+- Project-specific automation
 
-Help the user understand different command types. Load **references/patterns.md** to see available patterns:
+## Bundled Resources
 
-- **Workflow Automation** - Analyze → Act → Report (e.g., submit-stack)
-- **Iterative Fixing** - Run → Parse → Fix → Repeat (e.g., ensure-ci)
-- **Agent Delegation** - Context → Delegate → Iterate (e.g., create-implementation-plan)
-- **Simple Execution** - Run command with args (e.g., codex-review)
+Load these references when needed — do not load all upfront:
 
-Ask the user: "Which pattern is closest to what you want to create?" This helps frame the conversation.
+- **references/patterns.md** — Command patterns with decision guides (Workflow Automation, Iterative Fixing, Agent Delegation, Simple Execution)
+- **references/examples.md** — Full source for real commands (submit-stack, ensure-ci, create-implementation-plan)
+- **references/best-practices.md** — Quality checklist, writing guidelines, template structure, advanced patterns
 
-### Step 3: Gather Command Information
+## Command Creation Workflow
 
-Ask the user for key information:
+### Step 1: Determine Location
 
-#### A. Command Name and Purpose
+Auto-detect the appropriate location:
 
-Ask:
+1. Check git repository status: `git rev-parse --is-inside-work-tree 2>/dev/null`
+2. Default:
+   - In git repo → Project-level: `.claude/commands/`
+   - Not in git repo → Global: `~/.claude/commands/`
+3. Override: if user says "global" → `~/.claude/commands/`; if user says "project" → `.claude/commands/`
 
-- "What should the command be called?" (for filename)
-- "What does this command do?" (for description field)
+Report chosen location before proceeding.
 
-Guidelines:
+### Step 2: Select Pattern
 
-- Command names MUST be kebab-case (hyphens, NOT underscores)
-  - ✅ CORRECT: `submit-stack`, `ensure-ci`, `create-from-plan`
-  - ❌ WRONG: `submit_stack`, `ensure_ci`, `create_from_plan`
-- File names match command names: `my-command.md` → invoked as `/my-command`
-- Description should be concise, action-oriented (appears in `/help` output)
+Load **references/patterns.md** and use the decision tree above to recommend a pattern. Ask: "Which pattern is closest to what you want to create?" before writing a single instruction.
 
-#### B. Arguments
+### Step 3: Gather Command Information
 
-Ask:
+#### A. Name and Purpose
 
-- "Does this command take any arguments?"
-- "Are arguments required or optional?"
-- "What should arguments represent?"
+- Command name: kebab-case only (`submit-stack` ✅, `submit_stack` ❌)
+- Description: imperative, action-oriented, for `/help` output
 
-If command takes arguments:
+#### B. Arguments
 
-- Add `argument-hint: <placeholder>` to frontmatter
-- Use `<angle-brackets>` for required arguments
-- Use `[square-brackets]` for optional arguments
+- Does the command take arguments? Required or optional?
+- `argument-hint: <required>` or `argument-hint: [optional]` in frontmatter
 
 #### C. Workflow Steps
 
-Ask:
-
-- "What are the specific steps this command should follow?"
-- "What order should they happen in?"
-- "What tools or commands should be used?"
-
-Gather details about:
-
-- Initial analysis or checks to perform
-- Main actions to take
-- How to handle results
-- Success criteria
+- Initial checks (file existence, git status)
+- Main actions and tool choices
 - Error handling approach
+- Success criteria and stop conditions
 
-#### D. Tool Restrictions and Guidance
-
-Ask:
+#### D. Tool Restrictions
 
-- "Should this command use any specific agents or tools?"
-- "Are there any tools or operations it should avoid?"
-- "Should it read any specific files for context?"
+- Which tools to use (Bash vs Task vs Edit)
+- Which tools to explicitly prohibit
+- Any files to read for context
 
 ### Step 4: Generate Optimized Command
 
-Create the command file with agent-optimized instructions. Load **references/best-practices.md** for:
-
-- Template structure
-- Best practices for agent execution
-- Writing style guidelines
-- Quality checklist
+Load **references/best-practices.md** before writing. Apply:
 
-Key principles:
-
-- Use imperative/infinitive form (verb-first instructions)
-- Be explicit and specific
-- Include expected outcomes
-- Provide concrete examples
-- Define clear error handling
+- Imperative/infinitive form (verb-first, never "you should")
+- Explicit tool names per step
+- Expected outcomes after each action
+- Realistic examples (not foo/bar)
+- Error handling with specific recovery actions
+- Clear stop conditions
 
 ### Step 5: Create the Command File
 
-1. Determine full file path:
-   - Project: `.claude/commands/[command-name].md`
-   - Global: `~/.claude/commands/[command-name].md`
-
-2. Ensure directory exists:
-
-   ```bash
-   mkdir -p [directory-path]
-   ```
-
+1. Determine full file path (project: `.claude/commands/[name].md`, global: `~/.claude/commands/[name].md`)
+2. Ensure directory exists: `mkdir -p [directory-path]`
 3. Write the command file using the Write tool
-
-4. Confirm with user:
-   - Report the file location
-   - Summarize what the command does
-   - Explain how to use it: `/command-name [arguments]`
+4. Confirm with user: file location, what it does, how to invoke it
 
 ### Step 6: Test and Iterate (Optional)
 
-If the user wants to test:
-
-1. Suggest testing: `You can test this command by running: /command-name [arguments]`
-2. Be ready to iterate based on feedback
-3. Update the file with improvements as needed
+Suggest: `You can test this command by running: /command-name [arguments]`
 
-## Quick Tips
-
-**For detailed guidance, load the bundled references:**
-
-- Load **references/patterns.md** when designing the command workflow
-- Load **references/examples.md** to see how existing commands are structured
-- Load **references/best-practices.md** before finalizing to ensure quality
-
-**Common patterns to remember:**
-
-- Use Bash tool for `pytest`, `pyright`, `ruff`, `prettier`, `make`, `gt` commands
-- Use Task tool to invoke subagents for specialized tasks
-- Check for specific files first (e.g., `.PLAN.md`) before proceeding
-- Mark todos complete immediately, not in batches
-- Include explicit error handling instructions
-- Define clear success criteria
+Be ready to iterate on feedback and update the file.
 
 ## Summary
 
-When creating a command:
-
-1. **Detect location** (project vs global)
-2. **Show patterns** to frame the conversation
+1. **Detect location** (project vs global, confirm with user)
+2. **Select pattern** using decision tree before writing
 3. **Gather information** (name, purpose, arguments, steps, tools)
 4. **Generate optimized command** with agent-executable instructions
 5. **Create file** at appropriate location
 6. **Confirm and iterate** as needed
 
-Focus on creating commands that agents can execute autonomously, with clear steps, explicit tool usage, and proper error handling.
+Focus on creating commands that agents can execute autonomously, with unambiguous steps, explicit tool usage, inline anti-patterns, and defined stop conditions.
diff --git a/skills/commit-work/SKILL.md b/skills/commit-work/SKILL.md
index 33d816e..c29a550 100644
--- a/skills/commit-work/SKILL.md
+++ b/skills/commit-work/SKILL.md
@@ -1,55 +1,104 @@
 ---
 name: commit-work
-description: "Create high-quality git commits: review/stage intended changes, split into logical commits, and write clear commit messages (including Conventional Commits). Use when the user asks to commit, craft a commit message, stage changes, or split work into multiple commits."
+description: "Create high-quality git commits: review/stage intended changes, split into logical commits, write Conventional Commits messages. Trigger phrases: commit this work, craft a commit message, stage changes, split into commits, create a commit."
 ---
 
-# Commit work
-
-## Goal
-Make commits that are easy to review and safe to ship:
-- only intended changes are included
-- commits are logically scoped (split when needed)
-- commit messages describe what changed and why
-
-## Inputs to ask for (if missing)
-- Single commit or multiple commits? (If unsure: default to multiple small commits when there are unrelated changes.)
-- Commit style: Conventional Commits are required.
-- Any rules: max subject length, required scopes.
-
-## Workflow (checklist)
-1) Inspect the working tree before staging
-   - `git status`
-   - `git diff` (unstaged)
-   - If many changes: `git diff --stat`
-2) Decide commit boundaries (split if needed)
-   - Split by: feature vs refactor, backend vs frontend, formatting vs logic, tests vs prod code, dependency bumps vs behavior changes.
-   - If changes are mixed in one file, plan to use patch staging.
-3) Stage only what belongs in the next commit
-   - Prefer patch staging for mixed changes: `git add -p`
-   - To unstage a hunk/file: `git restore --staged -p` or `git restore --staged <path>`
-4) Review what will actually be committed
-   - `git diff --cached`
-   - Sanity checks:
-     - no secrets or tokens
-     - no accidental debug logging
-     - no unrelated formatting churn
-5) Describe the staged change in 1-2 sentences (before writing the message)
-   - "What changed?" + "Why?"
-   - If you cannot describe it cleanly, the commit is probably too big or mixed; go back to step 2.
-6) Write the commit message
-   - Use Conventional Commits (required):
-     - `type(scope): short summary`
-     - blank line
-     - body (what/why, not implementation diary)
-     - footer (BREAKING CHANGE) if needed
-   - Prefer an editor for multi-line messages: `git commit -v`
-   - Use `references/commit-message-template.md` if helpful.
-7) Run the smallest relevant verification
-   - Run the repo's fastest meaningful check (unit tests, lint, or build) before moving on.
-8) Repeat for the next commit until the working tree is clean
-
-## Deliverable
-Provide:
-- the final commit message(s)
-- a short summary per commit (what/why)
-- the commands used to stage/review (at minimum: `git diff --cached`, plus any tests run)
+# Commit Work
+
+## Mindset
+
+- A commit is a unit of reviewability, not a unit of work. If a reviewer can't understand it in isolation, split it.
+- The diff is wrong until proven otherwise. Run `git diff` before touching the index — surprises live there.
+- Pre-commit hooks are enforcement, not suggestions. A failed hook means the commit did NOT happen; recover with a new commit, never `--amend` on the same HEAD.
+- Bisect-safety trumps narrative coherence. A commit that breaks tests is more expensive than two awkward commits that don't.
+- Staging is a separate thinking step. Deciding what to commit and how to describe it are different cognitive modes — don't conflate them.
+
+## Navigation
+
+**Use this skill when**: staging changes, crafting commit messages, splitting mixed changes into logical commits, applying Conventional Commits format.
+
+**Do NOT use this skill when**: pushing to remote, creating pull requests, resolving merge conflicts, or rebasing — those workflows carry different risk profiles and should be handled explicitly.
+
+**Quick decision tree**:
+- Working tree has changes in 2+ unrelated concerns → split into multiple commits
+- Changes touch secrets, generated files, or debug logs → exclude before staging
+- Pre-commit hook failed → fix issue, re-stage, new commit (never amend)
+- Repo is a shallow clone (`git rev-parse --is-shallow-repository` returns `true`) → do not amend; new commit only
+
+## Philosophy
+
+A commit is a promise to future readers: "this set of changes is coherent, tested, and describes its own intent." Violate that promise and you tax every future `git blame`, `git bisect`, and code review.
+
+## NEVER
+
+- NEVER run `git add -A` or `git add .` — these silently include `.env` files, build artifacts, and generated lock-file changes that belong in a separate commit or `.gitignore`. Review every path explicitly.
+- NEVER `git commit --amend` in a shallow clone (`--depth N`) — amending rewrites the tip commit; a subsequent `push --force` will orphan the original on the remote and cannot be recovered without the SHA. Use a new commit instead.
+- NEVER skip pre-commit hooks with `--no-verify` unless the user explicitly requests it and understands the consequence — hooks exist to catch secrets, linting errors, and test failures before they enter history.
+- NEVER commit when `git status` shows a merge in progress (`MERGE_HEAD` exists) or a rebase is active (`REBASE_HEAD` / `rebase-merge/` directory present) — committing in this state creates a malformed merge commit or corrupts the rebase sequence.
+- NEVER write a commit message that describes the implementation ("added null check on line 42") instead of the behavior ("prevent crash when user list is empty") — implementation is in the diff; the message must explain the intent.
+- NEVER stage test file changes in the same commit as the production code they cover when the tests were written first (TDD) — the test-first commit is evidence of the design decision and has independent value in `git log`.
+- NEVER assume `git diff --cached` is empty before committing — a prior interrupted session or a hook script can leave hunks staged without your knowledge.
+
+## Workflow
+
+**1. Inspect before touching the index**
+```
+git status
+git diff                  # unstaged
+git diff --cached         # already staged (should be empty at start)
+```
+Check for: merge/rebase state, shallow clone, stale staged hunks.
+
+**2. Decide commit boundaries**
+Split by: feature vs refactor, logic vs formatting, prod vs test, dep bump vs behavior change.
+If a file contains mixed concerns, plan patch staging (`git add -p`) before touching the index.
+
+**3. Stage selectively**
+```
+git add -p <file>         # preferred for mixed-concern files
+git add <explicit-path>   # acceptable for clean single-concern files
+```
+After staging: `git diff --cached` must match your intent exactly.
+
+**4. Safety check on staged diff**
+Scan for: secrets/tokens, `console.log`/`debugger`/`TODO` left over, unrelated whitespace churn, generated files.
+
+**5. Describe the change in 1-2 sentences before writing the message**
+If you cannot describe it cleanly → the commit is too big. Return to step 2.
+
+**6. Write the commit message**
+See `references/commit-message-template.md`. Required format:
+```
+type(scope): short imperative summary
+
+What changed and why — not how.
+
+BREAKING CHANGE: (if applicable)
+```
+Use `git commit -v` to see the diff while composing.
+
+**7. Verify before moving on**
+Run the repo's fastest meaningful check (unit tests, lint, typecheck). A commit that breaks CI is more expensive to fix than a 30-second lint run now.
+
+**8. Repeat until `git status` is clean**
+
+## When Things Go Wrong
+
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| Pre-commit hook failed | Hook caught lint/test/secret issue | Fix the issue, `git add` the fix, create a NEW commit — do NOT `--amend` |
+| Committed wrong hunks | `git add -A` or inattentive `git add -p` | `git reset HEAD~1` (soft) to unstage, re-stage correctly, new commit |
+| `git commit --amend` in shallow clone | `--depth N` clone, rewrote tip | Recover original via `git fetch origin <SHA>` if you still have it; otherwise the original is gone unless the remote ref is intact |
+| Merge/rebase in progress | `MERGE_HEAD` or `rebase-merge/` present | Complete or abort the merge/rebase first: `git merge --abort` / `git rebase --abort` |
+| Hook added files you didn't stage | Formatter hook auto-staged changes | Run `git diff --cached` again after the hook; unstage unintended changes with `git restore --staged -p` |
+| Bisect breaks on a commit | Mixed concerns in one commit | Future prevention: split logic/test/format; immediate: `git bisect skip` to mark untestable commits |
+
+## Bisect Safety Heuristics
+
+When splitting commits, order them so every intermediate state is buildable and testable:
+1. Dependency/config changes first
+2. New production code (feature/fix) second
+3. Tests that exercise that code third
+4. Formatting/cleanup last
+
+A commit series that cannot be bisected is a commit series that cannot be debugged.
diff --git a/skills/crafting-effective-readmes/SKILL.md b/skills/crafting-effective-readmes/SKILL.md
index a6c30d9..1fd4b48 100644
--- a/skills/crafting-effective-readmes/SKILL.md
+++ b/skills/crafting-effective-readmes/SKILL.md
@@ -1,78 +1,90 @@
 ---
 name: crafting-effective-readmes
-description: Use when writing or improving README files. Not all READMEs are the same — provides templates and guidance matched to your audience and project type.
+description: Use when writing, improving, or reviewing README files. Triggers: "write a README", "improve my README", "update README", "my README is outdated", "create documentation for my project". Covers OSS, internal, personal, and config project types.
 ---
 
 # Crafting Effective READMEs
 
-## Overview
+## Mindset
 
-READMEs answer questions your audience will have. Different audiences need different information - a contributor to an OSS project needs different context than future-you opening a config folder.
+1. **The README is a cognitive funnel, not a table of contents.** Readers scan broad-to-specific: they decide in 10 seconds whether to read further. Every section either advances or stalls that funnel. Front-load the decision-enabling content (what + why + does it work for me).
 
-**Always ask:** Who will read this, and what do they need to know?
+2. **The documentation defines what the module does — the code does not.** A README without enough detail forces readers into source code. That's a failure. The goal is to keep users *out* of the source by providing everything needed to evaluate and use the project without reading implementation.
 
-## Process
+3. **Brevity is a feature, but incompleteness is a bug.** The ideal README is as short as it can be without being any shorter. When in doubt between too long and too short, choose too long — but put the excess in separate files, not in the README.
+
+4. **Write for the stranger, not the author.** After 6 months, even the author is a stranger to their own project. The README is a contract with your future confused self as much as with new users.
+
+5. **Match complexity to audience, not to effort.** A config folder README needs "what's here + why + gotchas" in 20 lines. An OSS library README needs install, quickstart, and API shape. Adding OSS sections to a config README is noise, not thoroughness.
 
-### Step 1: Identify the Task
+## Navigation
 
-**Ask:** "What README task are you working on?"
+**Use this skill when:**
+- Writing a README from scratch for any project type
+- Improving or restructuring an existing README
+- Updating a README after project changes
+- Reviewing a README for completeness or accuracy
 
-| Task | When |
-|------|------|
-| **Creating** | New project, no README yet |
-| **Adding** | Need to document something new |
-| **Updating** | Capabilities changed, content is stale |
-| **Reviewing** | Checking if README is still accurate |
+**Do NOT use this skill when:**
+- Writing other documentation (wikis, API docs, changelogs) — those have separate concerns
+- Improving prose quality only — use `writing-clearly-and-concisely` instead
+- The "README" is actually a CONTRIBUTING.md, CHANGELOG.md, or LICENSE — different audiences and formats
 
-### Step 2: Task-Specific Questions
+**Complexity decision tree:**
 
-**Creating initial README:**
-1. What type of project? (see Project Types below)
-2. What problem does this solve in one sentence?
-3. What's the quickest path to "it works"?
-4. Anything notable to highlight?
+```
+What is this project?
+├── Config folder / dotfiles      → Simple (20-40 lines): what's here, why, gotchas, how-to-extend
+├── Personal/portfolio project    → Medium (40-100 lines): what + stack + quick demo + learnings
+├── Internal/team tool            → Medium (60-120 lines): setup + architecture + runbooks + gotchas
+└── OSS library/CLI               → Full (100-300 lines): install + quickstart + API + contributing + license
+    └── Is it a published package? → Also add: badges, examples, support channels
+```
 
-**Adding a section:**
-1. What needs documenting?
-2. Where should it go in the existing structure?
-3. Who needs this info most?
+## Philosophy
 
-**Updating existing content:**
-1. What changed?
-2. Read current README, identify stale sections
-3. Propose specific edits
+A README is the one-stop shop that defines your project's contract with the world. It exists to keep users out of the source code. Everything else follows from that.
 
-**Reviewing/refreshing:**
-1. Read current README
-2. Check against actual project state (package.json, main files, etc.)
-3. Flag outdated sections
-4. Update "Last reviewed" date if present
+## NEVER
 
-### Step 3: Always Ask
+- NEVER open with implementation details before establishing what the project does — readers abandon before reaching the useful part. The description must appear in the first 5 lines.
+- NEVER add badges to internal tools or config READMEs — badges signal "public OSS" and create noise for teammates who don't care about CI status on a dotfiles folder.
+- NEVER write "Usage: See code for examples" or leave the usage section without at least one runnable example — this is the single most common reason a developer gives up and looks elsewhere.
+- NEVER use a wall of prose for setup steps — numbered lists are not optional for installation; sequential prose causes missed steps and support requests.
+- NEVER omit environment/OS prerequisites from install instructions — assuming "they know they need Node 18" is the most common cause of "it doesn't work" issues. State every non-obvious requirement.
+- NEVER copy-paste a generic OSS template into an internal project — internal READMEs need runbooks and architecture, not badges and license text; wrong template creates confusion about intended audience.
+- NEVER leave a README with a "Last Reviewed" date more than 6 months old without flagging it as potentially stale — a wrong README is worse than no README because it actively misleads.
 
-After drafting, ask: **"Anything else to highlight or include that I might have missed?"**
+## When Things Go Wrong
 
-## Project Types
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| User can't figure out what the project does after reading | Description is buried below badges/ToC/motivation paragraphs | Move description to line 1-2; cut everything above it |
+| Setup steps fail for new users but work for author | Prerequisites (OS version, env vars, global deps) assumed not stated | Add explicit Requirements subsection listing every non-obvious dependency |
+| README feels too long but cutting feels wrong | Right content, wrong location — detail belongs in separate docs | Move API reference / deep examples to `docs/` or a wiki; link from README |
+| Reviewer says README is "stale" | Project evolved but README wasn't updated as part of PRs | Add a "Last Reviewed" line; establish README-in-PR checklist habit |
 
-| Type | Audience | Key Sections | Template |
-|------|----------|--------------|----------|
-| **Open Source** | Contributors, users worldwide | Install, Usage, Contributing, License | `templates/oss.md` |
-| **Personal** | Future you, portfolio viewers | What it does, Tech stack, Learnings | `templates/personal.md` |
-| **Internal** | Teammates, new hires | Setup, Architecture, Runbooks | `templates/internal.md` |
-| **Config** | Future you (confused) | What's here, Why, How to extend, Gotchas | `templates/xdg-config.md` |
+## Reference Loading Guide
 
-**Ask the user** if unclear. Don't assume OSS defaults for everything.
+Load references only when needed — don't pull all into context:
 
-## Essential Sections (All Types)
+| Situation | Load |
+|-----------|------|
+| Writing for a public OSS project | `references/art-of-readme.md` — cognitive funneling, brevity principles |
+| Need section-by-section guidance | `references/make-a-readme.md` — concrete suggestions per section |
+| OSS project needs a standardized format | `references/standard-readme-spec.md` + examples |
+| Need a starting template | `templates/oss.md`, `templates/internal.md`, `templates/personal.md`, `templates/xdg-config.md` |
+| Checking section completeness | `section-checklist.md` |
+
+## Process
 
-Every README needs at minimum:
+**Step 1 — Identify project type:** OSS / Personal / Internal / Config (ask if unclear; never assume OSS)
 
-1. **Name** - Self-explanatory title
-2. **Description** - What + why in 1-2 sentences  
-3. **Usage** - How to use it (examples help)
+**Step 2 — Identify task:**
+- Creating: use the matching template, ask "what problem in one sentence + quickest path to working"
+- Updating: read current README, identify stale sections against actual project state
+- Reviewing: check package.json/main files vs README claims; flag mismatches
 
-## References
+**Step 3 — Apply the three-section minimum:** Every README needs Name + Description (1-2 sentences, what + why) + Usage (with at least one example).
 
-- `section-checklist.md` - Which sections to include by project type
-- `style-guide.md` - Common README mistakes and prose guidance
-- `using-references.md` - Guide to deeper reference materials
+**Step 4 — After drafting:** Ask "Anything else to highlight or include that I might have missed?"
diff --git a/skills/daily-meeting-update/SKILL.md b/skills/daily-meeting-update/SKILL.md
index 190b070..13388b9 100644
--- a/skills/daily-meeting-update/SKILL.md
+++ b/skills/daily-meeting-update/SKILL.md
@@ -1,247 +1,151 @@
 ---
 name: daily-meeting-update
-description: "Interactive daily standup/meeting update generator. Use when user says 'daily', 'standup', 'scrum update', 'status update', 'what did I do yesterday', 'prepare for meeting', 'morning update', or 'team sync'. Pulls activity from GitHub, Jira, and Claude Code session history. Conducts 4-question interview (yesterday, today, blockers, discussion topics) and generates formatted Markdown update."
+description: "Interactive daily standup/meeting update generator. Use when user says 'daily', 'standup', 'scrum update', 'status update', 'what did I do yesterday', 'prepare for meeting', 'morning update', 'team sync', 'async update', or 'remote standup'. Pulls activity from GitHub, Jira, and Claude Code session history. Conducts interview (yesterday, today, blockers, discussion topics) and generates formatted update for sync meetings, async Slack posts, or standup bots."
 user-invocable: true
 ---
 
 # Daily Meeting Update
 
-Generate a daily standup/meeting update through an **interactive interview**. Never assume tools are configured—ask first.
+Generate a daily standup/meeting update through an **interactive interview**. Never assume tools are configured — ask first.
 
 ---
 
-## Workflow
+## Mindset
 
-```
-START
-  │
-  ▼
-┌─────────────────────────────────────────────────────┐
-│ Phase 1: DETECT & OFFER INTEGRATIONS                │
-│ • Check: Claude Code history? gh CLI? jira CLI?     │
-│ • Claude Code → Pull yesterday's session digest     │
-│   → User selects relevant items via multiSelect     │
-│ • GitHub/Jira → Ask user, pull if approved          │
-│ • Pull data NOW (before interview)                  │
-├─────────────────────────────────────────────────────┤
-│ Phase 2: INTERVIEW (with insights)                  │
-│ • Show pulled data as context                       │
-│ • Yesterday: "I see you merged PR #123, what else?" │
-│ • Today: What will you work on?                     │
-│ • Blockers: Anything blocking you?                  │
-│ • Topics: Anything to discuss at end of meeting?    │
-├─────────────────────────────────────────────────────┤
-│ Phase 3: GENERATE UPDATE                            │
-│ • Combine interview answers + tool data             │
-│ • Format as clean Markdown                          │
-│ • Present to user                                   │
-└─────────────────────────────────────────────────────┘
-```
+1. **Tools surface what happened; the interview captures why it matters.** A merged PR without context ("I was blocked waiting for this") is noise. Always interview, even when you have full tool data.
+2. **The Topics for Discussion question is highest-signal.** It captures cross-team dependencies, architectural decisions, and team-wide risks that no tool can detect. It's the one question you cannot skip.
+3. **Async ≠ fewer questions — async updates are read without you present.** A vague sync standup is forgiven because teammates can ask. A vague async post blocks people for hours. Async format requires more precision, not less.
+4. **One missed blocker cascades.** If a user is waiting on someone outside their team, that handoff won't happen unless stated explicitly. Surface it, name names, suggest the next action.
+5. **Consent before access.** The user's repo list may include client work, personal projects, or sensitive branches they do not want surfaced in a standup. Always ask which repos; never enumerate everything visible.
 
 ---
 
-## Phase 1: Detect & Offer Integrations
-
-### Step 1: Silent Detection
-
-Check for available integrations **silently** (suppress errors, don't show to user):
-
-| Integration | Detection |
-|-------------|-----------|
-| **Claude Code History** | `~/.claude/projects` directory exists with `.jsonl` files |
-| GitHub CLI | `gh auth status` succeeds |
-| Jira CLI | `jira` command exists |
-| Atlassian MCP | `mcp__atlassian__*` tools available |
-| Git | Inside a git repository |
+## Navigation
 
-### Step 2: Offer GitHub/Jira Integrations (if available)
+### Use this skill when:
+- User says "daily", "standup", "scrum", "morning update", "status update", "team sync"
+- User asks "what did I do yesterday" or "help me prepare for my meeting"
+- User needs an async Slack post, standup bot answer, or remote team update
+- User is catching up after missed days (cover multiple days in "yesterday")
 
-> **Claude Code users:** Use `AskUserQuestionTool` tool for all questions in this phase.
+### Do NOT use this skill when:
+- User wants a project status report (different scope — weekly/monthly, audience is leadership)
+- User wants a written retrospective or sprint review
+- User is asking about a specific ticket/PR status (use Jira/GitHub tools directly)
 
-**GitHub/Git:**
-
-If `HAS_GH` or `HAS_GIT`:
+### Sync vs Async Decision Tree
 
 ```
-"I detected you have GitHub/Git configured. Want me to pull your recent activity (commits, PRs, reviews)?"
-
-Options:
-- "Yes, pull the info"
-- "No, I'll provide everything manually"
+Is the standup happening live (video/in-person/phone)?
+├─ YES → Use full Markdown template (Phases 1→2→3 below)
+│        Keep to <12 bullets; audience can ask follow-ups live
+└─ NO  → Is it posted to Slack/chat?
+          ├─ YES → Use chat format (bold headers, tag names, 4-6 lines)
+          │        → See references/async-remote-patterns.md
+          └─ NO  → Standup bot (Geekbot, Standuply)?
+                    ├─ YES → Full sentences per prompt, no bullets
+                    └─ NO  → Email/wiki → Full Markdown with links section
 ```
 
-If yes:
-
-```
-"Which repositories/projects should I check?"
-
-Options:
-- "Just the current directory" (if in a git repo)
-- "I'll list the repos" → user provides list
-```
-
-**Jira:**
-
-If `HAS_JIRA_CLI` or `HAS_ATLASSIAN_MCP`:
-
-```
-"I detected you have Jira configured. Want me to pull your tickets?"
-
-Options:
-- "Yes, pull my tickets"
-- "No, I'll provide everything manually"
-```
-
-### Step 3: Pull GitHub/Jira Data (if approved)
-
-**GitHub/Git** — For each approved repo:
-- Commits by user since yesterday
-- PRs opened/merged by user
-- Reviews done by user
+---
 
-**Jira** — Tickets assigned to user, updated in last 24h
+## Philosophy
 
-**Key insight**: Store results to use as context in Phase 2 interview.
+The standup update is a coordination tool, not a progress report. Its job is to unblock teammates, surface dependencies before they become delays, and focus collective attention on what matters today — not to prove productivity.
 
-### Step 4: Offer Claude Code History
+---
 
-This integration captures everything you worked on with Claude Code — useful for recalling work that isn't in git or Jira.
+## Workflow
 
-**Detection:**
-```bash
-ls ~/.claude/projects/*/*.jsonl 2>/dev/null | head -1
 ```
+Phase 1: DETECT & OFFER INTEGRATIONS
+  ↓ Check Claude Code history, gh CLI, jira CLI (silently)
+  ↓ Ask user which repos/integrations to pull (consent required)
+  ↓ Pull data NOW — before interview, not during
 
-**If Claude Code history exists, ask:**
-
-```
-"I can also pull your Claude Code session history from yesterday. This can help recall work that isn't in git/Jira (research, debugging, planning). Want me to check?"
+Phase 2: INTERVIEW (with insights)
+  ↓ Show pulled data as context for each question
+  ↓ Q1: Yesterday (with tool data as prompt)
+  ↓ Q2: Today (with Jira tickets as suggestions)
+  ↓ Q3: Blockers
+  ↓ Q4: Topics for Discussion (NEVER skip)
 
-Options:
-- "Yes, pull my Claude Code sessions"
-- "No, I have everything I need"
+Phase 3: GENERATE UPDATE
+  ↓ Combine interview answers + tool data
+  ↓ Format for meeting type (sync/async/bot)
+  ↓ Present to user
 ```
 
-**If yes, run the digest script:**
-
-```bash
-python3 ~/.claude/skills/daily-meeting-update/scripts/claude_digest.py --format json
-```
+---
 
-**Then present sessions with multiSelect:**
+## Phase 1: Detect & Offer Integrations
 
-Use `AskUserQuestionTool` with `multiSelect: true` to let user pick relevant items:
+Detect silently (suppress all errors). Full detection commands: `references/integration-details.md`
 
-```
-"Here are your Claude Code sessions from yesterday. Select the ones relevant to your standup:"
+| Integration | Quick Detection |
+|-------------|-----------------|
+| Claude Code History | `~/.claude/projects/*/*.jsonl` exists |
+| GitHub CLI | `gh auth status` succeeds |
+| Jira CLI | `jira` command exists |
+| Atlassian MCP | `mcp__atlassian__*` tools available |
+| Git | Inside a git repository |
 
-Options (multiSelect):
-- "Fix authentication bug (backend-api)"
-- "Implement OAuth flow (backend-api)"
-- "Update homepage styles (frontend-app)"
-- "Research payment providers (docs)"
-```
+**For each detected integration:** Ask user if they want it pulled and which repos/projects to include. Never pull without explicit approval.
 
-**Key insight:** User selects which sessions are work-related. Personal projects or experiments can be excluded.
+**Claude Code History:** Run `scripts/claude_digest.py --format json`, then present sessions via `AskUserQuestionTool` with `multiSelect: true` so user can exclude personal projects.
 
-**Do NOT run digest script when:**
-- User explicitly says "No" to Claude Code history
-- User says they'll provide everything manually
-- `~/.claude/projects` directory doesn't exist
+**If any integration fails:** Skip silently and proceed. No integration is required — the interview works without them.
 
-**If digest script fails:**
-- Fallback: Skip Claude Code integration silently, proceed with interview
-- Common issues: Python not installed, no sessions from yesterday, permission errors
-- Do NOT block the standup flow — the script is supplemental, not required
+Full pull commands, error handling: `references/integration-details.md`
 
 ---
 
-## Phase 2: Interview (with insights)
+## Phase 2: Interview (with Insights)
 
-> **Claude Code users:** Use `AskUserQuestionTool` tool to conduct the interview. This provides a better UX with structured options.
+> In Claude Code: use `AskUserQuestionTool` for structured questions with options.
 
-**Use pulled data as context** to make questions smarter.
+Show pulled data as context before asking each question — this triggers memory ("I see you merged PR #123, anything else?").
 
-### Question 1: Yesterday
+### Q1: Yesterday
+Show tool data first, then: *"Anything else you worked on since the last standup that I missed?"*
 
-**If data was pulled**, show it first:
+If no data pulled: *"What did you work on yesterday/since last standup?"*
 
-```
-"Here's what I found from your activity:
-- Merged PR #123: fix login timeout
-- 3 commits in backend-api
-- Reviewed PR #456 (approved)
+### Q2: Today
+*"What will you work on today?"*
 
-Anything else you worked on yesterday that I missed?"
-```
-
-**If no data pulled:**
+If Jira data pulled, suggest open tickets assigned to user as options.
 
-```
-"What did you work on yesterday/since the last standup?"
-```
+### Q3: Blockers
+*"Any blockers or impediments?"* → If yes, follow up for details and who needs to act.
 
-If user response is vague, ask follow-up:
-- "Can you give more details about X?"
-- "Did you complete anything specific?"
+### Q4: Topics for Discussion ← NEVER SKIP
+*"Anything to bring up at the end of the daily?"* Examples: technical decision needing input, cross-team alignment, prioritization question, announcement.
 
-### Question 2: Today
+### Async/Remote Extra Questions (optional — offer if fully remote or async team)
 
-```
-"What will you work on today?"
-
-Options:
-- [Text input - user types freely]
-```
+**Q5: Availability**
+*"Any changes to your availability today?"* (late start, focus time, timezone conflicts)
+Why: Remote teammates can't see you're AFK. Prevents cascade of blocked DMs.
 
-**If Jira data was pulled**, you can suggest:
+**Q6: Cross-team dependencies**
+*"Are you waiting on anyone outside your team, or does anyone outside your team need something from you?"*
+Why: Cross-team blockers stall for days async — no one escalates what isn't named.
 
-```
-"I see you have these tickets assigned:
-- PROJ-123: Implement OAuth flow (In Progress)
-- PROJ-456: Fix payment bug (To Do)
-
-Will you work on any of these today?"
-```
-
-### Question 3: Blockers
-
-```
-"Do you have any blockers or impediments?"
-
-Options:
-- "No blockers"
-- "Yes, I have blockers" → follow-up for details
-```
-
-### Question 4: Topics for Discussion
-
-```
-"Any topic you want to bring up at the end of the daily?"
-
-Options:
-- "No, nothing to discuss"
-- "Yes" → follow-up for details
-
-Examples of topics:
-- Technical decision that needs input
-- Alignment with another team
-- Question about prioritization
-- Announcement or info for the team
-```
+Load full async/remote patterns: `references/async-remote-patterns.md`
 
 ---
 
 ## Phase 3: Generate Update
 
-Combine all information into clean Markdown:
+Format based on meeting type determined in Navigation:
 
+**Sync (Markdown):**
 ```markdown
 # Daily Update - [DATE]
 
 ## Yesterday
-- [Items from interview]
-- [Items from GitHub/Jira if pulled]
+- [Items from interview + tool data]
 
 ## Today
 - [Items from interview]
@@ -249,160 +153,71 @@ Combine all information into clean Markdown:
 ## Blockers
 - [Blockers or "No blockers"]
 
-## PRs & Reviews (if pulled from GitHub)
-- [PRs opened]
-- [PRs merged]
-- [Reviews done]
-
-## Jira (if pulled from Jira)
-- [Tickets updated]
+## PRs & Reviews (if GitHub pulled)
+- **Opened/Merged/Reviewed:** ...
 
 ## Topics for Discussion
 - [Topics or "None"]
-
 ---
-*Links:*
-- [PR links]
-- [Ticket links]
+*Links:* [PR/ticket URLs]
 ```
 
----
-
-## Core Principles
-
-1. **Interview is primary** — Tools supplement, they don't replace human context
-2. **Consent before access** — Always ask before pulling from any integration
-3. **Context-aware questions** — Show pulled data during interview to trigger memory ("I see you merged PR #123...")
-
----
-
-## Quick Reference
-
-| Phase | Action | Tool |
-|-------|--------|------|
-| 1. Detect & Offer | Check gh/jira/claude history, ask user, pull data | Bash (silent), AskUserQuestionTool* |
-| 2. Interview | Ask 4 questions with insights | AskUserQuestionTool* |
-| 3. Generate | Format Markdown | Output text |
-
-*Claude Code only: Use `AskUserQuestionTool` tool for structured questions.
-
-### Claude Code Digest Script
-
-```bash
-# Get yesterday's sessions as JSON
-python3 ~/.claude/skills/daily-meeting-update/scripts/claude_digest.py --format json
-
-# Get today's sessions
-python3 ~/.claude/skills/daily-meeting-update/scripts/claude_digest.py --date today --format json
-
-# Filter to specific project
-python3 ~/.claude/skills/daily-meeting-update/scripts/claude_digest.py --project ~/my-app --format json
+**Async/Chat (Slack):**
 ```
-
----
-
-## Example Session
-
+*Yesterday:* [2-4 items, plain text]
+*Today:* [2-3 items]
+*Blockers:* [one sentence, name the blocker and who needs to act]
+*FYI:* [discussion topic as thread invitation]
 ```
-Claude: "Let's prepare your daily! I detected GitHub configured. Want me to pull your recent activity?"
-User: "Yes"
 
-Claude: "Which repos should I check?"
-User: "backend-api and frontend-app"
+Full output templates and Jira section format: `references/integration-details.md`
 
-[Claude pulls GitHub data silently]
+---
 
-Claude: "I can also pull your Claude Code session history from yesterday. Want me to check?"
-User: "Yes"
+## NEVER
 
-[Claude runs digest script, gets 5 sessions]
+- **NEVER run `gh`, `jira`, or `git log` without asking first** — The user's visible repos may include client work, personal projects, or branches from a confidential feature. Enumerating everything visible is a privacy violation even when technically possible.
 
-Claude: "Here are your Claude Code sessions from yesterday. Select the ones relevant to your standup:"
-  ☑ Fix authentication timeout bug (backend-api)
-  ☑ Implement OAuth flow (backend-api)
-  ☐ Update personal blog (blog) - [user unchecks personal project]
-  ☑ Research payment providers (docs)
-  ☐ Debugging test failures (experiments)
+- **NEVER skip the Topics for Discussion question** — This is the highest-value question in any standup. Cross-team dependencies, architectural decisions, and process issues only surface if someone asks. Tools cannot detect them. Skipping it is the most common cause of standup meetings that "feel useless."
 
-User: [selects 3 relevant items]
+- **NEVER generate more than 15 bullets total** — Standup updates are read in under 2 minutes. Beyond 15 bullets, teammates skim or skip entirely. If the user has more than 15 items, consolidate by impact: "Resolved 3 auth bugs" not three separate bullets.
 
-Claude: "Here's what I found from your activity:
-- Merged PR #120: fix login timeout
-- Opened PR #125: feat: add OAuth flow
-- Reviewed PR #123 (approved), PR #456 (changes requested)
-- 5 commits in backend-api
-- [From Claude Code] Fix authentication timeout bug
-- [From Claude Code] Implement OAuth flow
-- [From Claude Code] Research payment providers
+- **NEVER include raw ticket/PR numbers without title or summary** — "Fixed PROJ-123" communicates nothing to a teammate reading async. "Fixed PROJ-123: session timeout causing data loss" takes 3 seconds to write and is actually actionable.
 
-Anything else you worked on yesterday that I missed?"
-User: "No, that covers it"
+- **NEVER assume the current directory is the only project** — Engineers routinely work across 2–5 repos (frontend, backend, infra, shared libs). Defaulting to current directory silently excludes the majority of their work. Always ask.
 
-Claude: "What will you work on today?"
-User: "Continue the OAuth feature and deploy to staging"
+- **NEVER generate the update before completing all 4 questions** — The blocker or discussion topic revealed in Q3/Q4 often reframes Q1/Q2 ("I merged that PR but it's actually blocking deploy"). Generating early produces an update that misrepresents reality.
 
-Claude: "Any blockers?"
-User: "No"
+- **NEVER use full Markdown headers (`##`) in a Slack/chat async post** — Slack does not render Markdown headings. The output displays as `## Yesterday` literally. Use `*Yesterday:*` (bold inline). Sending malformatted async updates is worse than no update — it signals the user didn't review the output.
 
-Claude: "Any topic to bring up at the end of the daily?"
-User: "Yes, I want to discuss the architecture of the new payments module"
+- **NEVER treat a missed-day standup the same as a normal one** — If the user skipped Monday's standup, Tuesday's "yesterday" covers 3 days. Ask: "Last time you posted was [day] — shall I cover since then?" and compress multi-day summaries rather than writing 30 bullets.
 
-[Claude generates update]
-```
+- **NEVER pull "today's" activity as yesterday** — GitHub/git defaults can include commits from the current morning. A commit made at 8:47am today belongs in "Today", not "Yesterday". Scope all tool queries to `--since="yesterday midnight"` or equivalent, not "last 24h from now."
 
 ---
 
-## Output Example
-
-```markdown
-# Daily Update - 2026-01-22
-
-## Yesterday
-- Worked on authentication feature
-- Research on payment providers
-- Merged PR #120 (fix: login timeout)
-- Opened PR #125 (feat: add OAuth flow)
-
-## Today
-- Continue OAuth feature
-- Deploy to staging
-
-## Blockers
-- No blockers
-
-## PRs & Reviews
-- **Opened:** PR #125 - feat: add OAuth flow
-- **Merged:** PR #120 - fix: login timeout
-- **Reviews:** PR #123 (approved), PR #456 (changes requested)
+## When Things Go Wrong
 
-## Topics for Discussion
-- Architecture of the new payments module
+| Situation | Expert Response |
+|-----------|-----------------|
+| User says "I don't remember what I did" | Offer Claude Code digest first; if unavailable, ask: "What meetings did you have? Any PRs you reviewed? Any bugs you looked at even briefly?" — memory is trigger-based, not list-based |
+| All integrations fail or unavailable | Proceed with pure interview — no tools required. Note: "Going manual — just answer the 4 questions" |
+| User has only 5 minutes | Skip Phase 1 entirely; run Phase 2 verbally ("quick fire: yesterday? today? blockers? discussion?"); generate immediately |
+| Async post with a critical blocker | Move blockers to the TOP of the update; add explicit "Action needed from @name by [time]" line; do not bury it in section 3 |
+| User works across time zones (their today = team's yesterday) | Confirm the reference period: "Your standup covers work from [user's yesterday start] to now — is that right?" |
+| Standup bot asks questions one at a time | Coach user: answer each prompt as a complete sentence; the bot may forward responses verbatim to a different channel |
+| User's PR list includes team members' PRs they reviewed | Separate "Reviews done" from "Work opened/merged" — conflating them overstates output |
 
 ---
-*Links:*
-- https://github.com/org/repo/pull/125
-- https://github.com/org/repo/pull/120
-```
-
----
-
-## Anti-Patterns
 
-| Avoid | Why (Expert Knowledge) | Instead |
-|-------|------------------------|---------|
-| Run gh/jira without asking | Users may have personal repos visible, or be in a sensitive project context they don't want exposed | Always ask first, let user choose repos |
-| Assume current directory is the only project | Developers often work on 2-5 repos simultaneously (frontend, backend, infra) | Ask "Which projects are you working on?" |
-| Skip interview even with tool data | Tools capture WHAT happened but miss WHY and context (research, meetings, planning) | Interview is primary, tools supplement |
-| Generate update before all 4 questions | User might have critical blocker or discussion topic that changes the narrative | Complete interview, then generate |
-| Include raw commit messages | Commit messages are often cryptic ("fix", "wip") and don't tell the story | Summarize into human-readable outcomes |
-| Ask for data after interview | Showing insights during interview makes questions smarter ("I see you merged PR #123, anything else?") | Pull data first, then interview with context |
-
----
+## Quick Reference
 
-## NEVER
+| Phase | Action | Tool |
+|-------|--------|------|
+| 1. Detect & Offer | Silent check → ask consent → pull data | Bash (silent), AskUserQuestionTool |
+| 2. Interview | 4 questions with pulled-data context | AskUserQuestionTool |
+| 3. Generate | Format for meeting type | Text output |
 
-- **NEVER assume tools are configured** — Many devs have gh installed but not authenticated, or jira CLI pointing to wrong instance
-- **NEVER skip the "Topics for Discussion" question** — This is often the most valuable part of standup that tools can't capture
-- **NEVER generate more than 15 bullets** — Standup should be <2 minutes to read; long updates lose the audience
-- **NEVER include ticket/PR numbers without context** — "PROJ-123" means nothing; always include title or summary
-- **NEVER pull data from repos user didn't explicitly approve** — Even if you can see other repos, respect boundaries
+**Reference files:**
+- `references/async-remote-patterns.md` — Slack format, bot format, time-zone edge cases, remote extras
+- `references/integration-details.md` — All detection/pull commands, error handling, output templates
diff --git a/skills/daily-meeting-update/references/async-remote-patterns.md b/skills/daily-meeting-update/references/async-remote-patterns.md
new file mode 100644
index 0000000..1b83794
--- /dev/null
+++ b/skills/daily-meeting-update/references/async-remote-patterns.md
@@ -0,0 +1,111 @@
+# Async & Remote Team Standup Patterns
+
+## When the Meeting Isn't Real-Time
+
+Many teams run "standups" asynchronously — posted to Slack, a wiki, or a bot. The core 4-question structure still applies, but formatting and tone shift significantly.
+
+---
+
+## Async Update Formats
+
+### Slack / Chat Post (most common)
+
+```
+*Yesterday:* Fixed session timeout bug (PR #120 merged), reviewed @alice's payment PR
+*Today:* Continue OAuth flow (PROJ-123), unblock @bob on API schema
+*Blockers:* Waiting on security sign-off for the OAuth client ID — pinged @infra-team
+*FYI:* We should discuss auth token TTL strategy before Wednesday deploy — happy to start a thread
+```
+
+**Rules for chat format:**
+- Inline bold headers, not Markdown `##` headings (Slack renders `*text*` not `## text`)
+- Tag teammates who are involved in blockers or handoffs
+- Keep to 4–6 lines; no bullets for Today if items are few
+- Discussion topics become threaded FYIs, not a separate section
+
+### Email / Long-form Async (weekly recap hybrid)
+
+Use full Markdown output template — this context loads from SKILL.md Phase 3.
+
+### Standup Bot (Geekbot, Standuply, etc.)
+
+Many bots ask each question one-at-a-time via DM. Coach the user:
+- Answer each bot prompt directly from their interview notes
+- For "blockers" prompt: full sentence (bots often forward to a manager channel verbatim)
+- For "today" prompt: be explicit about ticket IDs since bots often auto-link them
+
+---
+
+## Time-Zone Edge Cases
+
+### User is First to Post (early timezone)
+
+- "Yesterday" window may not overlap with teammates' yesterday
+- Clarify: "I'll include anything from the past 24h of my work, not just your team's calendar day"
+- Pull GitHub data for last 24h, not "since midnight local"
+
+### User Missed Yesterday's Standup
+
+Decision tree:
+
+```
+Did you post anything async yesterday?
+├─ Yes → "Reference your async post, add what changed since then"
+└─ No  → "Cover 2 days of yesterday (Thu + Fri if Mon standup)"
+         → Compress: "Earlier this week: X. Yesterday: Y."
+         → Flag the gap explicitly if blockers went unaddressed
+```
+
+### Cross-timezone Handoffs
+
+If the user's update will be read by teammates in a different timezone who act on it while the user sleeps:
+- Move blockers to the TOP of the update (readers need to act immediately)
+- Include explicit handoff notes: "Leaving PROJ-456 at X state — @alice can pick up from branch `feature/xyz`"
+- Add a "Needs Action" section if anything requires a teammate decision
+
+---
+
+## Team-Specific Anti-Patterns (Async)
+
+| Situation | Symptom | Fix |
+|-----------|---------|-----|
+| PM-heavy team | Update becomes a status report with metrics | Keep format terse; metrics belong in a weekly |
+| High-meeting culture | User writes update nobody reads | Ask: "Who actually reads this? Format for them." |
+| Monorepo team | Commits from 6 people in same repo; user's stand out poorly | Pull by author filter; lead with impact, not repo name |
+| Distributed team, no overlap | "Today" items may be stale by time readers see them | Add "by EOD my timezone" estimates on key items |
+
+---
+
+## Formatting Decision Tree: Sync vs Async
+
+```
+Is the meeting happening live (video/in-person)?
+├─ YES → Use full Markdown template (rendered in meeting tool)
+│        Keep to <12 bullets; audience can ask questions live
+└─ NO  → Is it posted to Slack/chat?
+          ├─ YES → Use chat format (bold headers, no ##, tag names)
+          │        Keep to 4–6 lines
+          └─ NO  → Is it a bot/form input?
+                    ├─ YES → Answer per prompt, full sentences, no bullets
+                    └─ NO  → Email/wiki → Full Markdown, add links section
+```
+
+---
+
+## Remote-First Standup Extras (Questions 5 & 6)
+
+When teams are fully remote with no watercooler, optionally add:
+
+**Question 5: Availability / Context**
+```
+"Any changes to your availability today? (late start, early end, off-site, focus time?)"
+```
+Why this matters: Remote teams can't see you're not at your desk. Heads-up prevents blocked teammates.
+
+**Question 6: Cross-team dependencies**
+```
+"Are you waiting on anyone outside your team, or does anyone outside your team need something from you?"
+```
+Why this matters: Cross-team blockers often stall for days in async environments because nobody escalates them.
+
+Only offer these extra questions if the user mentions they're fully remote or the team uses async standups. Don't add them to a normal 15-minute sync standup — it bloats the update.
diff --git a/skills/daily-meeting-update/references/integration-details.md b/skills/daily-meeting-update/references/integration-details.md
new file mode 100644
index 0000000..34c6ae4
--- /dev/null
+++ b/skills/daily-meeting-update/references/integration-details.md
@@ -0,0 +1,97 @@
+# Integration Details & Detection Reference
+
+## Detection Commands (run silently, suppress all errors)
+
+| Integration | Detection Command | Notes |
+|-------------|-------------------|-------|
+| Claude Code History | `ls ~/.claude/projects/*/*.jsonl 2>/dev/null \| head -1` | Presence of any .jsonl file |
+| GitHub CLI | `gh auth status 2>/dev/null` | Exit code 0 = authenticated |
+| Jira CLI | `which jira 2>/dev/null` | Existence only; may need `jira auth` |
+| Atlassian MCP | Check if `mcp__atlassian__*` tools available | MCP context check |
+| Git | `git rev-parse --is-inside-work-tree 2>/dev/null` | Current directory |
+
+## GitHub/Git Pull Commands
+
+```bash
+# Commits by current user in last 24h
+git log --author="$(git config user.name)" --since="24 hours ago" --oneline
+
+# PRs opened by user (gh CLI)
+gh pr list --author="@me" --state=all --limit=10 --json number,title,state,url
+
+# PRs reviewed by user
+gh search prs --reviewed-by="@me" --updated=">$(date -d '24 hours ago' +%Y-%m-%d)" --json number,title,url
+
+# Merged PRs by user
+gh pr list --author="@me" --state=merged --limit=5 --json number,title,url
+```
+
+## Claude Code Digest Script
+
+```bash
+# Get yesterday's sessions as JSON
+python3 ~/.claude/skills/daily-meeting-update/scripts/claude_digest.py --format json
+
+# Get today's sessions
+python3 ~/.claude/skills/daily-meeting-update/scripts/claude_digest.py --date today --format json
+
+# Filter to specific project
+python3 ~/.claude/skills/daily-meeting-update/scripts/claude_digest.py --project ~/my-app --format json
+```
+
+**Fallback if script fails:** Skip Claude Code integration silently, proceed with interview. The script is supplemental — never block the standup flow.
+
+## Jira Pull (CLI)
+
+```bash
+# Tickets assigned to me, updated in last 24h
+jira issue list --assignee=me --updated=-24h --output=json
+```
+
+## Atlassian MCP (if available)
+
+Use `mcp__atlassian__search_issues` with JQL:
+```
+assignee = currentUser() AND updated >= -24h ORDER BY updated DESC
+```
+
+## Output Template Reference
+
+```markdown
+# Daily Update - [DATE]
+
+## Yesterday
+- [Items from interview + tool data]
+
+## Today
+- [Items from interview + Jira suggestions]
+
+## Blockers
+- [Blockers or "No blockers"]
+
+## PRs & Reviews (if GitHub pulled)
+- **Opened:** PR #N - title
+- **Merged:** PR #N - title
+- **Reviews:** PR #N (approved/changes requested)
+
+## Jira (if pulled)
+- PROJ-N: Title (Status)
+
+## Topics for Discussion
+- [Topics or "None"]
+
+---
+*Links:*
+- [PR/ticket URLs]
+```
+
+## Error Handling
+
+| Error | Cause | Recovery |
+|-------|-------|----------|
+| `gh: command not found` | GitHub CLI not installed | Skip GitHub integration, continue |
+| `gh auth status` fails | Not authenticated | Skip GitHub, note to user optionally |
+| `jira: command not found` | Jira CLI not installed | Skip Jira, check for Atlassian MCP |
+| `claude_digest.py` fails | Python not installed / no sessions | Skip silently, continue with interview |
+| Git not a repo | Not in git directory | Skip git integration for current dir, ask about specific repos |
+| MCP tools unavailable | Not in Claude Code context | Skip MCP integrations entirely |
diff --git a/skills/database-schema-designer/SKILL.md b/skills/database-schema-designer/SKILL.md
index 624a8e2..49e42cb 100644
--- a/skills/database-schema-designer/SKILL.md
+++ b/skills/database-schema-designer/SKILL.md
@@ -1,687 +1,130 @@
 ---
 name: database-schema-designer
-description: Design robust, scalable database schemas for SQL and NoSQL databases. Provides normalization guidelines, indexing strategies, migration patterns, constraint design, and performance optimization. Ensures data integrity, query performance, and maintainable data models.
+description: Design production-ready SQL and NoSQL database schemas. Use when asked to design schema, create tables, model data, database design, schema for, or architect a data model. Covers normalization, indexing strategy, migration patterns, constraint design, and platform-specific traps for PostgreSQL, MySQL, and MongoDB.
 license: MIT
 ---
 
 # Database Schema Designer
 
-Design production-ready database schemas with best practices built-in.
+## Mindset
 
----
-
-## Quick Start
-
-Just describe your data model:
-
-```
-design a schema for an e-commerce platform with users, products, orders
-```
-
-You'll get a complete SQL schema like:
-
-```sql
-CREATE TABLE users (
-  id BIGINT AUTO_INCREMENT PRIMARY KEY,
-  email VARCHAR(255) UNIQUE NOT NULL,
-  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
-);
-
-CREATE TABLE orders (
-  id BIGINT AUTO_INCREMENT PRIMARY KEY,
-  user_id BIGINT NOT NULL REFERENCES users(id),
-  total DECIMAL(10,2) NOT NULL,
-  INDEX idx_orders_user (user_id)
-);
-```
-
-**What to include in your request:**
-- Entities (users, products, orders)
-- Key relationships (users have orders, orders have items)
-- Scale hints (high-traffic, millions of records)
-- Database preference (SQL/NoSQL) - defaults to SQL if not specified
-
----
-
-## Triggers
-
-| Trigger | Example |
-|---------|---------|
-| `design schema` | "design a schema for user authentication" |
-| `database design` | "database design for multi-tenant SaaS" |
-| `create tables` | "create tables for a blog system" |
-| `schema for` | "schema for inventory management" |
-| `model data` | "model data for real-time analytics" |
-| `I need a database` | "I need a database for tracking orders" |
-| `design NoSQL` | "design NoSQL schema for product catalog" |
-
----
-
-## Key Terms
-
-| Term | Definition |
-|------|------------|
-| **Normalization** | Organizing data to reduce redundancy (1NF → 2NF → 3NF) |
-| **3NF** | Third Normal Form - no transitive dependencies between columns |
-| **OLTP** | Online Transaction Processing - write-heavy, needs normalization |
-| **OLAP** | Online Analytical Processing - read-heavy, benefits from denormalization |
-| **Foreign Key (FK)** | Column that references another table's primary key |
-| **Index** | Data structure that speeds up queries (at cost of slower writes) |
-| **Access Pattern** | How your app reads/writes data (queries, joins, filters) |
-| **Denormalization** | Intentionally duplicating data to speed up reads |
-
----
-
-## Quick Reference
-
-| Task | Approach | Key Consideration |
-|------|----------|-------------------|
-| New schema | Normalize to 3NF first | Domain modeling over UI |
-| SQL vs NoSQL | Access patterns decide | Read/write ratio matters |
-| Primary keys | INT or UUID | UUID for distributed systems |
-| Foreign keys | Always constrain | ON DELETE strategy critical |
-| Indexes | FKs + WHERE columns | Column order matters |
-| Migrations | Always reversible | Backward compatible first |
-
----
-
-## Process Overview
-
-```
-Your Data Requirements
-    |
-    v
-+-----------------------------------------------------+
-| Phase 1: ANALYSIS                                   |
-| * Identify entities and relationships               |
-| * Determine access patterns (read vs write heavy)   |
-| * Choose SQL or NoSQL based on requirements         |
-+-----------------------------------------------------+
-    |
-    v
-+-----------------------------------------------------+
-| Phase 2: DESIGN                                     |
-| * Normalize to 3NF (SQL) or embed/reference (NoSQL) |
-| * Define primary keys and foreign keys              |
-| * Choose appropriate data types                     |
-| * Add constraints (UNIQUE, CHECK, NOT NULL)         |
-+-----------------------------------------------------+
-    |
-    v
-+-----------------------------------------------------+
-| Phase 3: OPTIMIZE                                   |
-| * Plan indexing strategy                            |
-| * Consider denormalization for read-heavy queries   |
-| * Add timestamps (created_at, updated_at)           |
-+-----------------------------------------------------+
-    |
-    v
-+-----------------------------------------------------+
-| Phase 4: MIGRATE                                    |
-| * Generate migration scripts (up + down)            |
-| * Ensure backward compatibility                     |
-| * Plan zero-downtime deployment                     |
-+-----------------------------------------------------+
-    |
-    v
-Production-Ready Schema
-```
-
----
-
-## Commands
-
-| Command | When to Use | Action |
-|---------|-------------|--------|
-| `design schema for {domain}` | Starting fresh | Full schema generation |
-| `normalize {table}` | Fixing existing table | Apply normalization rules |
-| `add indexes for {table}` | Performance issues | Generate index strategy |
-| `migration for {change}` | Schema evolution | Create reversible migration |
-| `review schema` | Code review | Audit existing schema |
-
-**Workflow:** Start with `design schema` → iterate with `normalize` → optimize with `add indexes` → evolve with `migration`
-
----
+1. **Access pattern is the schema.** No access pattern = no schema. Extract the 3 most frequent read queries first; the schema exists to serve them. Normalization is a starting point, not a destination.
+2. **The database enforces what the application forgets.** Every constraint you skip becomes a data integrity incident at 2am. Defensive schema design is not premature optimization.
+3. **UUID v4 as a clustered PK is a silent perf killer.** Random UUIDs fragment B-tree indexes on every insert. Use `BIGSERIAL`/`BIGINT AUTO_INCREMENT` for high-write tables; use UUIDv7 (time-ordered) or `gen_random_uuid()` only when distributed ID generation is required.
+4. **TOAST in PostgreSQL means your "small" table isn't small.** Columns exceeding 2KB are silently moved to a TOAST table; a query that SELECTs a wide TEXT column on 100k rows does 100k TOAST lookups. Know your column widths before indexing.
+5. **Migrations run on live data.** Every DDL change is a production operation. Design for zero-downtime: add before remove, nullable before constrained, never rename in one step.
 
-## Core Principles
+## Navigation
 
-| Principle | WHY | Implementation |
-|-----------|-----|----------------|
-| Model the Domain | UI changes, domain doesn't | Entity names reflect business concepts |
-| Data Integrity First | Corruption is costly to fix | Constraints at database level |
-| Optimize for Access Pattern | Can't optimize for both | OLTP: normalized, OLAP: denormalized |
-| Plan for Scale | Retrofitting is painful | Index strategy + partitioning plan |
+**Use this skill when**: designing a new schema from scratch, reviewing an existing schema for problems, planning a migration, choosing between SQL/NoSQL, or optimizing a slow query via schema changes.
 
----
-
-## Anti-Patterns
-
-| Avoid | Why | Instead |
-|-------|-----|---------|
-| VARCHAR(255) everywhere | Wastes storage, hides intent | Size appropriately per field |
-| FLOAT for money | Rounding errors | DECIMAL(10,2) |
-| Missing FK constraints | Orphaned data | Always define foreign keys |
-| No indexes on FKs | Slow JOINs | Index every foreign key |
-| Storing dates as strings | Can't compare/sort | DATE, TIMESTAMP types |
-| SELECT * in queries | Fetches unnecessary data | Explicit column lists |
-| Non-reversible migrations | Can't rollback | Always write DOWN migration |
-| Adding NOT NULL without default | Breaks existing rows | Add nullable, backfill, then constrain |
-
----
+**Do NOT use this skill when**: writing ORM model code without schema intent (use the ORM docs), tuning query plans without schema changes (use EXPLAIN), or designing application-layer caching (Redis patterns are separate).
 
-## Verification Checklist
-
-After designing a schema:
-
-- [ ] Every table has a primary key
-- [ ] All relationships have foreign key constraints
-- [ ] ON DELETE strategy defined for each FK
-- [ ] Indexes exist on all foreign keys
-- [ ] Indexes exist on frequently queried columns
-- [ ] Appropriate data types (DECIMAL for money, etc.)
-- [ ] NOT NULL on required fields
-- [ ] UNIQUE constraints where needed
-- [ ] CHECK constraints for validation
-- [ ] created_at and updated_at timestamps
-- [ ] Migration scripts are reversible
-- [ ] Tested on staging with production data
-
----
-
-<details>
-<summary><strong>Deep Dive: Normalization (SQL)</strong></summary>
-
-### Normal Forms
-
-| Form | Rule | Violation Example |
-|------|------|-------------------|
-| **1NF** | Atomic values, no repeating groups | `product_ids = '1,2,3'` |
-| **2NF** | 1NF + no partial dependencies | customer_name in order_items |
-| **3NF** | 2NF + no transitive dependencies | country derived from postal_code |
-
-### 1st Normal Form (1NF)
-
-```sql
--- BAD: Multiple values in column
-CREATE TABLE orders (
-  id INT PRIMARY KEY,
-  product_ids VARCHAR(255)  -- '101,102,103'
-);
-
--- GOOD: Separate table for items
-CREATE TABLE orders (
-  id INT PRIMARY KEY,
-  customer_id INT
-);
-
-CREATE TABLE order_items (
-  id INT PRIMARY KEY,
-  order_id INT REFERENCES orders(id),
-  product_id INT
-);
-```
-
-### 2nd Normal Form (2NF)
-
-```sql
--- BAD: customer_name depends only on customer_id
-CREATE TABLE order_items (
-  order_id INT,
-  product_id INT,
-  customer_name VARCHAR(100),  -- Partial dependency!
-  PRIMARY KEY (order_id, product_id)
-);
-
--- GOOD: Customer data in separate table
-CREATE TABLE customers (
-  id INT PRIMARY KEY,
-  name VARCHAR(100)
-);
-```
-
-### 3rd Normal Form (3NF)
-
-```sql
--- BAD: country depends on postal_code
-CREATE TABLE customers (
-  id INT PRIMARY KEY,
-  postal_code VARCHAR(10),
-  country VARCHAR(50)  -- Transitive dependency!
-);
-
--- GOOD: Separate postal_codes table
-CREATE TABLE postal_codes (
-  code VARCHAR(10) PRIMARY KEY,
-  country VARCHAR(50)
-);
-```
-
-### When to Denormalize
-
-| Scenario | Denormalization Strategy |
-|----------|-------------------------|
-| Read-heavy reporting | Pre-calculated aggregates |
-| Expensive JOINs | Cached derived columns |
-| Analytics dashboards | Materialized views |
-
-```sql
--- Denormalized for performance
-CREATE TABLE orders (
-  id INT PRIMARY KEY,
-  customer_id INT,
-  total_amount DECIMAL(10,2),  -- Calculated
-  item_count INT               -- Calculated
-);
+**Quick decision tree:**
 ```
-
-</details>
-
-<details>
-<summary><strong>Deep Dive: Data Types</strong></summary>
-
-### String Types
-
-| Type | Use Case | Example |
-|------|----------|---------|
-| CHAR(n) | Fixed length | State codes, ISO dates |
-| VARCHAR(n) | Variable length | Names, emails |
-| TEXT | Long content | Articles, descriptions |
-
-```sql
--- Good sizing
-email VARCHAR(255)
-phone VARCHAR(20)
-country_code CHAR(2)
+Need schema help?
+  ├─ New design → start with access patterns, then entities
+  ├─ Existing schema review → load references/schema-design-checklist.md
+  ├─ Performance problem → check indexes first, denormalize second
+  └─ Migration plan → expand "When Things Go Wrong" section below
 ```
 
-### Numeric Types
+## Philosophy
 
-| Type | Range | Use Case |
-|------|-------|----------|
-| TINYINT | -128 to 127 | Age, status codes |
-| SMALLINT | -32K to 32K | Quantities |
-| INT | -2.1B to 2.1B | IDs, counts |
-| BIGINT | Very large | Large IDs, timestamps |
-| DECIMAL(p,s) | Exact precision | Money |
-| FLOAT/DOUBLE | Approximate | Scientific data |
+Schema design is a contract between your data and your future self. Make the database enforce correctness at rest so application bugs produce errors, not silent corruption. Every deviation from normalized form must earn its keep with a measured performance justification.
 
-```sql
--- ALWAYS use DECIMAL for money
-price DECIMAL(10, 2)  -- $99,999,999.99
+## NEVER
 
--- NEVER use FLOAT for money
-price FLOAT  -- Rounding errors!
-```
+- **NEVER use UUID v4 as a clustered primary key on high-write tables** — random insertion order causes B-tree page splits on every insert; at 1M+ rows/day this degrades write throughput 3-10x versus sequential keys. Use `BIGSERIAL` or UUIDv7 instead.
+- **NEVER store monetary values in FLOAT or DOUBLE** — IEEE 754 rounding means `0.1 + 0.2 = 0.30000000000000004`; a single rounding error in financial data can cascade into reconciliation failures across millions of transactions. Always `DECIMAL(19,4)`.
+- **NEVER add a NOT NULL column to a large table in a single migration** — PostgreSQL and MySQL rewrite the entire table; on a 500GB table this locks writes for minutes. Pattern: add nullable → backfill in batches → add constraint.
+- **NEVER create a polymorphic `entity_type` + `entity_id` column pair without a partial index** — these columns can never have a foreign key; without a partial index per type, every query requires a full scan. Create `CREATE INDEX ... WHERE entity_type = 'post'` for each type.
+- **NEVER rely on application-side cascades instead of database-level ON DELETE** — application code can be bypassed (direct SQL, migrations, admin tools); orphaned rows accumulate silently. Define ON DELETE strategy on every FK.
+- **NEVER index every column "just in case"** — each index doubles write amplification and consumes buffer pool; on a write-heavy table, 10 unused indexes can halve insert throughput. Index exactly the queries you have, not the queries you imagine.
+- **NEVER use `TEXT` for enum-like fields without a CHECK constraint or lookup table** — `status VARCHAR(20)` without `CHECK (status IN ('active','inactive'))` will contain 'Active', 'ACTIVE', 'activ', and NULL within a year.
 
-### Date/Time Types
+## When to Break the Rules
 
-```sql
-DATE        -- 2025-10-31
-TIME        -- 14:30:00
-DATETIME    -- 2025-10-31 14:30:00
-TIMESTAMP   -- Auto timezone conversion
+| Rule | When to Break | Guard Rails |
+|------|--------------|-------------|
+| Normalize to 3NF | OLAP / reporting tables where JOIN cost dominates | Document the denorm + add comment in schema |
+| FK constraints on every relationship | Ultra-high write throughput (>50k inserts/sec); some sharded DBs can't enforce cross-shard FKs | Enforce referential integrity at application layer with tests |
+| Sequential INT primary key | Distributed/multi-writer systems where coordination is impossible | Use UUIDv7 (time-ordered) not v4 |
+| UTC timestamps everywhere | Systems requiring local time audit trails (legal, compliance) | Store both: `event_at TIMESTAMPTZ`, `event_at_local TEXT` |
+| Single schema per domain | Multi-tenant SaaS with strong isolation requirements | Schema-per-tenant or row-level security (PostgreSQL RLS) |
 
--- Always store in UTC
-created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
-updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
-```
+## When Things Go Wrong
 
-### Boolean
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| Writes slowing down as table grows | Too many indexes; UUID v4 PK fragmentation | `pg_stat_user_indexes` to find unused indexes; switch to UUIDv7 or BIGSERIAL on next rebuild |
+| Migration hangs on large table | Adding NOT NULL or adding index without CONCURRENTLY | Kill migration; use `CREATE INDEX CONCURRENTLY`; add column nullable first |
+| Orphaned rows accumulating | Missing ON DELETE CASCADE or RESTRICT | Audit with `LEFT JOIN ... WHERE fk IS NULL`; add FK constraints in a quiet window |
+| Query ignores index | Leading column not in WHERE clause; implicit cast mismatch; low cardinality | `EXPLAIN ANALYZE` to confirm; fix column order or cast; use partial index |
+| TOAST bloat causing slow SELECTs | Wide TEXT/JSONB columns selected unnecessarily | `SELECT only_needed_cols`; move large blobs to object storage, store URL |
+| Schema drift between environments | Migrations applied out of order | Use a migration framework (Flyway, Alembic, golang-migrate) with checksums; never hand-edit prod |
 
-```sql
--- PostgreSQL
-is_active BOOLEAN DEFAULT TRUE
+## Core Decision Framework
 
--- MySQL
-is_active TINYINT(1) DEFAULT 1
-```
+### SQL vs NoSQL
 
-</details>
+| Signal | Choose SQL | Choose NoSQL |
+|--------|-----------|-------------|
+| Data shape | Relational, structured | Hierarchical, variable schema |
+| Query needs | Ad-hoc joins, aggregates | Known access patterns only |
+| Consistency | ACID required | Eventual OK |
+| Scale pattern | Vertical + read replicas | Horizontal sharding |
 
-<details>
-<summary><strong>Deep Dive: Indexing Strategy</strong></summary>
+**Default: SQL.** NoSQL requires knowing your access patterns with certainty before design; SQL tolerates query pattern changes.
 
-### When to Create Indexes
+### Primary Key Selection
 
-| Always Index | Reason |
-|--------------|--------|
-| Foreign keys | Speed up JOINs |
-| WHERE clause columns | Speed up filtering |
-| ORDER BY columns | Speed up sorting |
-| Unique constraints | Enforced uniqueness |
+| Scenario | Choice | Reason |
+|----------|--------|--------|
+| Single-server OLTP | `BIGSERIAL` / `BIGINT AUTO_INCREMENT` | Sequential, fast inserts, small storage |
+| Distributed / multi-writer | UUIDv7 (`pg_uuidv7` extension) | Time-ordered, avoids fragmentation |
+| Junction table | Composite `(a_id, b_id)` | Natural PK, no surrogate needed |
+| External system sync | Natural key + surrogate | Natural key as UNIQUE, surrogate as PK |
 
-```sql
--- Foreign key index
-CREATE INDEX idx_orders_customer ON orders(customer_id);
+### Index Decision Tree
 
--- Query pattern index
-CREATE INDEX idx_orders_status_date ON orders(status, created_at);
 ```
-
-### Index Types
-
-| Type | Best For | Example |
-|------|----------|---------|
-| B-Tree | Ranges, equality | `price > 100` |
-| Hash | Exact matches only | `email = 'x@y.com'` |
-| Full-text | Text search | `MATCH AGAINST` |
-| Partial | Subset of rows | `WHERE is_active = true` |
-
-### Composite Index Order
-
-```sql
-CREATE INDEX idx_customer_status ON orders(customer_id, status);
-
--- Uses index (customer_id first)
-SELECT * FROM orders WHERE customer_id = 123;
-SELECT * FROM orders WHERE customer_id = 123 AND status = 'pending';
-
--- Does NOT use index (status alone)
-SELECT * FROM orders WHERE status = 'pending';
+Need an index?
+  ├─ FK column? → YES, always
+  ├─ In WHERE or JOIN ON? → YES, if cardinality > 10%
+  ├─ In ORDER BY only? → partial index if filtering first
+  ├─ Full-text search? → GIN (Postgres) / FULLTEXT (MySQL)
+  └─ JSONB field query? → GIN on the column
 ```
 
-**Rule:** Most selective column first, or column most queried alone.
+**Composite index column order:** equality filters first, then range filters, then ORDER BY columns. The leftmost column must appear in your WHERE clause for the index to activate.
 
-### Index Pitfalls
+## Platform-Specific Traps
 
-| Pitfall | Problem | Solution |
-|---------|---------|----------|
-| Over-indexing | Slow writes | Only index what's queried |
-| Wrong column order | Unused index | Match query patterns |
-| Missing FK indexes | Slow JOINs | Always index FKs |
+### PostgreSQL
+- `SERIAL` is deprecated — use `GENERATED ALWAYS AS IDENTITY`
+- `TIMESTAMP` has no timezone; use `TIMESTAMPTZ` everywhere
+- `CREATE INDEX` locks table; `CREATE INDEX CONCURRENTLY` does not (but can't run in a transaction)
+- JSONB GIN index covers containment (`@>`), not equality on extracted fields — use a functional index: `CREATE INDEX ON orders ((data->>'status'))`
+- `VACUUM` does not reclaim disk space; `VACUUM FULL` does but locks. Use `pg_repack` for live bloat reclaim.
 
-</details>
+### MySQL / MariaDB
+- InnoDB clusters the table on the PK — UUID v4 causes the fragmentation problem hardest here
+- `DATETIME` vs `TIMESTAMP`: TIMESTAMP stores in UTC, auto-converts; DATETIME stores literal value. Use TIMESTAMP unless you need dates beyond 2038.
+- `utf8` in MySQL is actually `utf8mb3` (3-byte max); emoji breaks silently. Always `utf8mb4`.
+- Altering a column type on a large InnoDB table copies the full table in-place — use `pt-online-schema-change` or `gh-ost`.
 
-<details>
-<summary><strong>Deep Dive: Constraints</strong></summary>
+### MongoDB
+- Document size hard limit: 16MB. Embedding unbounded arrays (e.g., all comments in a post) hits this. Reference when the child collection grows without bound.
+- Indexes are per-collection, not enforced across collections — referential integrity is entirely application responsibility.
+- `$lookup` (join) is expensive; if you need it on hot paths, you embedded wrong.
 
-### Primary Keys
-
-```sql
--- Auto-increment (simple)
-id INT AUTO_INCREMENT PRIMARY KEY
-
--- UUID (distributed systems)
-id CHAR(36) PRIMARY KEY DEFAULT (UUID())
-
--- Composite (junction tables)
-PRIMARY KEY (student_id, course_id)
-```
+## References
 
-### Foreign Keys
-
-```sql
-FOREIGN KEY (customer_id) REFERENCES customers(id)
-  ON DELETE CASCADE     -- Delete children with parent
-  ON DELETE RESTRICT    -- Prevent deletion if referenced
-  ON DELETE SET NULL    -- Set to NULL when parent deleted
-  ON UPDATE CASCADE     -- Update children when parent changes
-```
-
-| Strategy | Use When |
-|----------|----------|
-| CASCADE | Dependent data (order_items) |
-| RESTRICT | Important references (prevent accidents) |
-| SET NULL | Optional relationships |
-
-### Other Constraints
-
-```sql
--- Unique
-email VARCHAR(255) UNIQUE NOT NULL
-
--- Composite unique
-UNIQUE (student_id, course_id)
-
--- Check
-price DECIMAL(10,2) CHECK (price >= 0)
-discount INT CHECK (discount BETWEEN 0 AND 100)
-
--- Not null
-name VARCHAR(100) NOT NULL
-```
-
-</details>
-
-<details>
-<summary><strong>Deep Dive: Relationship Patterns</strong></summary>
-
-### One-to-Many
-
-```sql
-CREATE TABLE orders (
-  id INT PRIMARY KEY,
-  customer_id INT NOT NULL REFERENCES customers(id)
-);
-
-CREATE TABLE order_items (
-  id INT PRIMARY KEY,
-  order_id INT NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
-  product_id INT NOT NULL,
-  quantity INT NOT NULL
-);
-```
-
-### Many-to-Many
-
-```sql
--- Junction table
-CREATE TABLE enrollments (
-  student_id INT REFERENCES students(id) ON DELETE CASCADE,
-  course_id INT REFERENCES courses(id) ON DELETE CASCADE,
-  enrolled_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
-  PRIMARY KEY (student_id, course_id)
-);
-```
-
-### Self-Referencing
-
-```sql
-CREATE TABLE employees (
-  id INT PRIMARY KEY,
-  name VARCHAR(100) NOT NULL,
-  manager_id INT REFERENCES employees(id)
-);
-```
-
-### Polymorphic
-
-```sql
--- Approach 1: Separate FKs (stronger integrity)
-CREATE TABLE comments (
-  id INT PRIMARY KEY,
-  content TEXT NOT NULL,
-  post_id INT REFERENCES posts(id),
-  photo_id INT REFERENCES photos(id),
-  CHECK (
-    (post_id IS NOT NULL AND photo_id IS NULL) OR
-    (post_id IS NULL AND photo_id IS NOT NULL)
-  )
-);
-
--- Approach 2: Type + ID (flexible, weaker integrity)
-CREATE TABLE comments (
-  id INT PRIMARY KEY,
-  content TEXT NOT NULL,
-  commentable_type VARCHAR(50) NOT NULL,
-  commentable_id INT NOT NULL
-);
-```
-
-</details>
-
-<details>
-<summary><strong>Deep Dive: NoSQL Design (MongoDB)</strong></summary>
-
-### Embedding vs Referencing
-
-| Factor | Embed | Reference |
-|--------|-------|-----------|
-| Access pattern | Read together | Read separately |
-| Relationship | 1:few | 1:many |
-| Document size | Small | Approaching 16MB |
-| Update frequency | Rarely | Frequently |
-
-### Embedded Document
-
-```json
-{
-  "_id": "order_123",
-  "customer": {
-    "id": "cust_456",
-    "name": "Jane Smith",
-    "email": "jane@example.com"
-  },
-  "items": [
-    { "product_id": "prod_789", "quantity": 2, "price": 29.99 }
-  ],
-  "total": 109.97
-}
-```
-
-### Referenced Document
-
-```json
-{
-  "_id": "order_123",
-  "customer_id": "cust_456",
-  "item_ids": ["item_1", "item_2"],
-  "total": 109.97
-}
-```
-
-### MongoDB Indexes
-
-```javascript
-// Single field
-db.users.createIndex({ email: 1 }, { unique: true });
-
-// Composite
-db.orders.createIndex({ customer_id: 1, created_at: -1 });
-
-// Text search
-db.articles.createIndex({ title: "text", content: "text" });
-
-// Geospatial
-db.stores.createIndex({ location: "2dsphere" });
-```
-
-</details>
-
-<details>
-<summary><strong>Deep Dive: Migrations</strong></summary>
-
-### Migration Best Practices
-
-| Practice | WHY |
-|----------|-----|
-| Always reversible | Need to rollback |
-| Backward compatible | Zero-downtime deploys |
-| Schema before data | Separate concerns |
-| Test on staging | Catch issues early |
-
-### Adding a Column (Zero-Downtime)
-
-```sql
--- Step 1: Add nullable column
-ALTER TABLE users ADD COLUMN phone VARCHAR(20);
-
--- Step 2: Deploy code that writes to new column
-
--- Step 3: Backfill existing rows
-UPDATE users SET phone = '' WHERE phone IS NULL;
-
--- Step 4: Make required (if needed)
-ALTER TABLE users MODIFY phone VARCHAR(20) NOT NULL;
-```
-
-### Renaming a Column (Zero-Downtime)
-
-```sql
--- Step 1: Add new column
-ALTER TABLE users ADD COLUMN email_address VARCHAR(255);
-
--- Step 2: Copy data
-UPDATE users SET email_address = email;
-
--- Step 3: Deploy code reading from new column
--- Step 4: Deploy code writing to new column
-
--- Step 5: Drop old column
-ALTER TABLE users DROP COLUMN email;
-```
-
-### Migration Template
-
-```sql
--- Migration: YYYYMMDDHHMMSS_description.sql
-
--- UP
-BEGIN;
-ALTER TABLE users ADD COLUMN phone VARCHAR(20);
-CREATE INDEX idx_users_phone ON users(phone);
-COMMIT;
-
--- DOWN
-BEGIN;
-DROP INDEX idx_users_phone ON users;
-ALTER TABLE users DROP COLUMN phone;
-COMMIT;
-```
-
-</details>
-
-<details>
-<summary><strong>Deep Dive: Performance Optimization</strong></summary>
-
-### Query Analysis
-
-```sql
-EXPLAIN SELECT * FROM orders
-WHERE customer_id = 123 AND status = 'pending';
-```
-
-| Look For | Meaning |
-|----------|---------|
-| type: ALL | Full table scan (bad) |
-| type: ref | Index used (good) |
-| key: NULL | No index used |
-| rows: high | Many rows scanned |
-
-### N+1 Query Problem
-
-```python
-# BAD: N+1 queries
-orders = db.query("SELECT * FROM orders")
-for order in orders:
-    customer = db.query(f"SELECT * FROM customers WHERE id = {order.customer_id}")
-
-# GOOD: Single JOIN
-results = db.query("""
-    SELECT orders.*, customers.name
-    FROM orders
-    JOIN customers ON orders.customer_id = customers.id
-""")
-```
-
-### Optimization Techniques
-
-| Technique | When to Use |
-|-----------|-------------|
-| Add indexes | Slow WHERE/ORDER BY |
-| Denormalize | Expensive JOINs |
-| Pagination | Large result sets |
-| Caching | Repeated queries |
-| Read replicas | Read-heavy load |
-| Partitioning | Very large tables |
-
-</details>
-
----
+Load these when needed — do not include in every response:
 
-## Extension Points
+- **Full checklist**: `references/schema-design-checklist.md` — use during schema review or audit tasks
+- **Normalization deep dive**: explain 1NF/2NF/3NF with examples only when user asks to normalize an existing schema
+- **Migration patterns**: expand zero-downtime steps only when user asks for migration scripts
 
-1. **Database-Specific Patterns:** Add MySQL vs PostgreSQL vs SQLite variations
-2. **Advanced Patterns:** Time-series, event sourcing, CQRS, multi-tenancy
-3. **ORM Integration:** TypeORM, Prisma, SQLAlchemy patterns
-4. **Monitoring:** Query performance tracking, slow query alerts
+Once your schema is designed, use openapi-to-typescript to generate TypeScript interfaces from an OpenAPI spec that mirrors your data model — especially useful for readOnly/writeOnly field splitting between read/write DTOs.
diff --git a/skills/datadog-cli/SKILL.md b/skills/datadog-cli/SKILL.md
index b1cc94e..b72aaf3 100644
--- a/skills/datadog-cli/SKILL.md
+++ b/skills/datadog-cli/SKILL.md
@@ -1,127 +1,117 @@
 ---
 name: datadog-cli
-description: Datadog CLI for searching logs, querying metrics, tracing requests, and managing dashboards. Use this when debugging production issues or working with Datadog observability.
+description: Datadog CLI for searching logs, querying metrics, tracing distributed requests, and managing dashboards. Use when debugging production incidents, investigating error spikes, correlating traces, or analyzing service health in Datadog.
 ---
 
 # Datadog CLI
 
-A CLI tool for AI agents to debug and triage using Datadog logs and metrics.
+## Mindset
 
-## Required Reading
+- **Start with `errors` + `compare`, not raw search** — `errors` aggregates by service/type instantly; `compare` tells you if the spike is new or chronic. Raw search before these wastes API quota and context.
+- **Trace ID is your thread through chaos** — when a user reports a broken transaction, get the trace ID first (`logs search` with `@http.url` or `@user.id`), then `logs trace` to reconstruct the full call chain across services. Don't search per-service manually.
+- **`patterns` before reading individual logs** — Datadog indexes thousands of variants of the same error. `logs patterns` normalizes UUIDs/IPs/numbers into templates; reading raw logs before this is signal-diluting.
+- **Dashboard updates are destructive** — the API replaces the entire dashboard JSON. A partial PUT deletes widgets you didn't include. Always `get` → modify → `update`.
+- **Rate limits are silent trip wires** — the Logs Search API is capped at 300 requests/minute per org. `logs multi` with many parallel queries can exhaust quota for the whole team. Use `--limit` conservatively.
 
-**You MUST read the relevant reference docs before using any command:**
-- [Log Commands](references/logs-commands.md)
-- [Metrics](references/metrics.md)
-- [Query Syntax](references/query-syntax.md)
-- [Workflows](references/workflows.md)
-- [Dashboards](references/dashboards.md)
+## Navigation
 
-## Setup
+**Use this skill when**: searching Datadog logs, querying metrics timeseries, tracing distributed requests by trace ID, detecting log patterns, comparing error rates between time windows, managing dashboards, or investigating a production incident end-to-end.
 
-### Environment Variables (Required)
+**Do NOT use this skill when**:
+- The user wants to configure Datadog monitors/alerts (requires Datadog API directly, not this CLI)
+- The user wants to query Synthetics, RUM, or Security signals (not supported by this CLI)
+- No `DD_API_KEY` + `DD_APP_KEY` are set — prompt for them first
 
-```bash
-export DD_API_KEY="your-api-key"
-export DD_APP_KEY="your-app-key"
-```
+**Ambiguous input decision tree**:
+- "Check errors" → `errors --from 1h` then `logs compare`
+- "Debug request X" → get trace ID → `logs trace`
+- "Is the spike new?" → `logs compare --period 1h`
+- "What's failing?" → `logs patterns --query "status:error"`
 
-Get keys from: https://app.datadoghq.com/organization-settings/api-keys
+## Philosophy
 
-### Running the CLI
+Observability triage is a narrowing funnel: start wide (service-level aggregations), identify the signal (patterns), then zoom in (individual logs + traces). Never start at the bottom of the funnel.
 
-```bash
-npx @leoflores/datadog-cli <command>
-```
+## NEVER
 
-For non-US Datadog sites, use `--site` flag:
-```bash
-npx @leoflores/datadog-cli logs search --query "*" --site datadoghq.eu
-```
+- NEVER run `logs search` with `--query "*"` and no `--limit` — the default 100-result limit still pages through the full index scan; always scope by `service:` or `status:` to avoid hitting rate limits and returning noise.
+- NEVER run `dashboards update` without first running `dashboards get` — the update endpoint replaces the full dashboard JSON; any widget you omit is permanently deleted from the dashboard.
+- NEVER use `logs tail` in an automated/non-interactive context — it polls indefinitely and will block the agent; use `logs search` with a short `--from` window instead.
+- NEVER send `logs multi` with more than 5 parallel queries without checking org rate limits — each parallel query counts as a separate API call; 10+ queries can exhaust the 300 req/min org limit mid-investigation.
+- NEVER query metrics with `{*}` scope and a long window (e.g., `--from 7d`) — this returns all host timeseries; scope with `{service:X,env:prod}` or you'll get truncated/sampled data silently.
+- NEVER interpret a 0-count result as "no errors" — it may mean the query matched no indexed facets (typo in service name, wrong env tag). Verify with `services --from 1h` to confirm the service is emitting logs at all.
 
-## Commands Overview
+## When Things Go Wrong
 
-| Command | Description |
-|---------|-------------|
-| `logs search` | Search logs with filters |
-| `logs tail` | Stream logs in real-time |
-| `logs trace` | Find logs for a distributed trace |
-| `logs context` | Get logs before/after a timestamp |
-| `logs patterns` | Group similar log messages |
-| `logs compare` | Compare log counts between periods |
-| `logs multi` | Run multiple queries in parallel |
-| `logs agg` | Aggregate logs by facet |
-| `metrics query` | Query timeseries metrics |
-| `errors` | Quick error summary by service/type |
-| `services` | List services with log activity |
-| `dashboards` | Manage dashboards (CRUD) |
-| `dashboard-lists` | Manage dashboard lists |
-
-
-## Quick Examples
-
-### Search Errors
-```bash
-npx @leoflores/datadog-cli logs search --query "status:error" --from 1h --pretty
-```
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| `401 Unauthorized` | `DD_API_KEY` or `DD_APP_KEY` not set / wrong org | `echo $DD_API_KEY` — if empty, export both keys; confirm keys are for the correct Datadog site |
+| `429 Too Many Requests` | Org rate limit hit (300 req/min for Logs) | Wait 60 seconds; reduce `logs multi` parallelism; add `--limit 50` to searches |
+| Empty results but no error | Wrong site (`datadoghq.com` vs `datadoghq.eu`) or service name typo | Run `services --from 24h` to confirm service exists; add `--site datadoghq.eu` if EU org |
+| `logs trace` returns nothing | Trace ID format mismatch — some services emit `@trace_id`, others `@dd.trace_id` | The CLI searches both, but extend `--from` window; trace may have aged out of index (default 15-day retention) |
+| Dashboard `update` wiped widgets | Sent partial JSON without fetching first | Restore from Datadog UI version history (Settings → Restore); next time always `get` → edit → `update` |
+| `logs patterns` shows only 1 pattern | Query too narrow or low log volume | Widen `--from` window or remove service filter temporarily |
 
-### Tail Logs (Real-time)
-```bash
-npx @leoflores/datadog-cli logs tail --query "service:api status:error" --pretty
-```
+## Setup
 
-### Error Summary
 ```bash
-npx @leoflores/datadog-cli errors --from 1h --pretty
+export DD_API_KEY="your-api-key"
+export DD_APP_KEY="your-app-key"
 ```
 
-### Trace Correlation
-```bash
-npx @leoflores/datadog-cli logs trace --id "abc123def456" --pretty
-```
+Keys: https://app.datadoghq.com/organization-settings/api-keys
 
-### Query Metrics
-```bash
-npx @leoflores/datadog-cli metrics query --query "avg:system.cpu.user{*}" --from 1h --pretty
-```
+For EU orgs, add `--site datadoghq.eu` to every command.
 
-### Compare Periods
 ```bash
-npx @leoflores/datadog-cli logs compare --query "status:error" --period 1h --pretty
+npx @leoflores/datadog-cli <command>
 ```
 
-## Global Flags
-
-| Flag | Description |
-|------|-------------|
-| `--pretty` | Human-readable output with colors |
-| `--output <file>` | Export results to JSON file |
-| `--site <site>` | Datadog site (e.g., `datadoghq.eu`) |
-
-## Time Formats
-
-- **Relative**: `30m`, `1h`, `6h`, `24h`, `7d`
-- **ISO 8601**: `2024-01-15T10:30:00Z`
-
-## Incident Triage Workflow
+## Incident Triage — Standard Sequence
 
 ```bash
-# 1. Quick error overview
+# 1. Aggregate: what services are erroring and how many?
 npx @leoflores/datadog-cli errors --from 1h --pretty
 
-# 2. Is this new? Compare to previous period
+# 2. Baseline: is this spike new or ongoing?
 npx @leoflores/datadog-cli logs compare --query "status:error" --period 1h --pretty
 
-# 3. Find error patterns
-npx @leoflores/datadog-cli logs patterns --query "status:error" --from 1h --pretty
+# 3. Pattern: what are the error templates? (normalize before reading raw)
+npx @leoflores/datadog-cli logs patterns --query "status:error service:api" --from 1h --pretty
 
-# 4. Narrow down by service
-npx @leoflores/datadog-cli logs search --query "status:error service:api" --from 1h --pretty
+# 4. Scope: narrow to the failing service
+npx @leoflores/datadog-cli logs search --query "status:error service:api" --from 1h --limit 50 --pretty
 
-# 5. Get context around a timestamp
-npx @leoflores/datadog-cli logs context --timestamp "2024-01-15T10:30:00Z" --service api --pretty
+# 5. Context: get logs around the failure timestamp
+npx @leoflores/datadog-cli logs context --timestamp "2024-01-15T10:30:00Z" --service api --before 5m --after 2m --pretty
 
-# 6. Follow the distributed trace
+# 6. Trace: reconstruct full distributed call chain
 npx @leoflores/datadog-cli logs trace --id "TRACE_ID" --pretty
 ```
 
-See [workflows.md](references/workflows.md) for more debugging workflows.
+## Commands Quick Reference
+
+| Command | When to use |
+|---------|-------------|
+| `errors` | First look — error counts by service/type |
+| `logs compare` | Determine if spike is new vs. baseline |
+| `logs patterns` | Normalize error variants before reading raw |
+| `logs search` | Targeted log retrieval (after narrowing) |
+| `logs trace` | Reconstruct distributed request from trace ID |
+| `logs context` | Get what happened just before/after a timestamp |
+| `logs agg` | Break down logs by facet (status, host, error.kind) |
+| `logs multi` | Parallel cross-service comparison (use sparingly) |
+| `logs tail` | Interactive only — stream live logs |
+| `metrics query` | Correlate log errors with CPU/latency/throughput |
+| `services` | Verify a service is emitting logs (sanity check) |
+| `dashboards` | CRUD — always `get` before `update` |
+
+## Reference Docs
+
+Load these only when needed for the specific task:
+
+- **[logs-commands.md](references/logs-commands.md)** — full flag reference for all `logs *` subcommands
+- **[query-syntax.md](references/query-syntax.md)** — operators, facet names, numeric comparisons, wildcard patterns
+- **[metrics.md](references/metrics.md)** — metrics query format, APM metrics, aggregation functions
+- **[workflows.md](references/workflows.md)** — multi-step workflows (real-time debug, service health, export)
+- **[dashboards.md](references/dashboards.md)** — full dashboard CRUD reference + safe update workflow
diff --git a/skills/dependency-updater/SKILL.md b/skills/dependency-updater/SKILL.md
index edede5b..786b7ae 100644
--- a/skills/dependency-updater/SKILL.md
+++ b/skills/dependency-updater/SKILL.md
@@ -1,491 +1,98 @@
 ---
 name: dependency-updater
-description: Smart dependency management for any language. Auto-detects project type, applies safe updates automatically, prompts for major versions, diagnoses and fixes dependency issues.
+description: Smart dependency management for any language. Auto-detects project type, applies safe updates automatically, prompts for major versions, diagnoses and fixes dependency issues. Trigger phrases: "update dependencies", "update deps", "outdated packages", "dependency audit", "fix dependency conflicts", "security audit packages", "why won't my packages install".
 license: MIT
 metadata:
-  version: 1.0.0
+  version: 2.0.0
 ---
 
 # Dependency Updater
 
-Smart dependency management for any language with automatic detection and safe updates.
+## Mindset
 
----
-
-## Quick Start
-
-```
-update my dependencies
-```
-
-The skill auto-detects your project type and handles the rest.
-
----
-
-## Triggers
-
-| Trigger | Example |
-|---------|---------|
-| Update dependencies | "update dependencies", "update deps" |
-| Check outdated | "check for outdated packages" |
-| Fix dependency issues | "fix my dependency problems" |
-| Security audit | "audit dependencies for vulnerabilities" |
-| Diagnose deps | "diagnose dependency issues" |
-
----
-
-## Supported Languages
-
-| Language | Package File | Update Tool | Audit Tool |
-|----------|--------------|-------------|------------|
-| **Node.js** | package.json | `taze` | `npm audit` |
-| **Python** | requirements.txt, pyproject.toml | `pip-review` | `safety`, `pip-audit` |
-| **Go** | go.mod | `go get -u` | `govulncheck` |
-| **Rust** | Cargo.toml | `cargo update` | `cargo audit` |
-| **Ruby** | Gemfile | `bundle update` | `bundle audit` |
-| **Java** | pom.xml, build.gradle | `mvn versions:*` | `mvn dependency:*` |
-| **.NET** | *.csproj | `dotnet outdated` | `dotnet list package --vulnerable` |
-
----
-
-## Quick Reference
+- **Pinned versions are load-bearing** — a fixed version without `^` or `~` is a *decision*, not an oversight. Someone got burned and locked it. Never "fix" it to a range without checking git blame.
+- **Lock files are the ground truth** — `package.json` is a *request*, `package-lock.json` is *what actually runs*. When they diverge (e.g., after a `git pull` that updated package.json but not the lock), `npm install` silently installs different code than production.
+- **`npm audit --force` is a footgun** — it resolves vulnerabilities by *downgrading or breaking semver constraints*, leaving the project in an inconsistent state that CI can't reproduce. Practitioners use targeted upgrades, not force.
+- **Major version bumps require changelog archaeology** — tools can detect the version delta but not whether your usage of the old API was in the breaking-change surface. Always check the migration guide before approving a major.
+- **Monorepos compound the risk** — updating a shared package at the root can silently change behavior in workspaces that weren't tested. Run workspace-scoped installs after root changes.
 
-| Update Type | Version Change | Action |
-|-------------|----------------|--------|
-| **Fixed** | No `^` or `~` | Skip (intentionally pinned) |
-| **PATCH** | `x.y.z` → `x.y.Z` | Auto-apply |
-| **MINOR** | `x.y.z` → `x.Y.0` | Auto-apply |
-| **MAJOR** | `x.y.z` → `X.0.0` | Prompt user individually |
+## Navigation
 
----
+**Use this skill when**:
+- User asks to update, upgrade, or refresh dependencies/packages/deps
+- User reports install failures, peer-dependency warnings, or version conflicts
+- User asks for a security audit or vulnerability scan of packages
+- User asks what packages are outdated or stale
 
-## Workflow
+**Do NOT use this skill when**:
+- User wants to *add a new* package (that's a feature task, not an update)
+- User is working inside a container/locked environment with no write access to package files
+- The project uses Nix, Bazel, or vendored deps — standard update tools break these
 
+**Quick decision tree**:
 ```
-User Request
-    │
-    ▼
-┌─────────────────────────────────────────────────────┐
-│ Step 1: DETECT PROJECT TYPE                         │
-│ • Scan for package files (package.json, go.mod...) │
-│ • Identify package manager                          │
-├─────────────────────────────────────────────────────┤
-│ Step 2: CHECK PREREQUISITES                         │
-│ • Verify required tools are installed               │
-│ • Suggest installation if missing                   │
-├─────────────────────────────────────────────────────┤
-│ Step 3: SCAN FOR UPDATES                            │
-│ • Run language-specific outdated check              │
-│ • Categorize: MAJOR / MINOR / PATCH / Fixed         │
-├─────────────────────────────────────────────────────┤
-│ Step 4: AUTO-APPLY SAFE UPDATES                     │
-│ • Apply MINOR and PATCH automatically               │
-│ • Report what was updated                           │
-├─────────────────────────────────────────────────────┤
-│ Step 5: PROMPT FOR MAJOR UPDATES                    │
-│ • AskUserQuestion for each MAJOR update             │
-│ • Show current → new version                        │
-├─────────────────────────────────────────────────────┤
-│ Step 6: APPLY APPROVED MAJORS                       │
-│ • Update only approved packages                     │
-├─────────────────────────────────────────────────────┤
-│ Step 7: FINALIZE                                    │
-│ • Run install command                               │
-│ • Run security audit                                │
-└─────────────────────────────────────────────────────┘
+User request type?
+├── "update" / "outdated" → full update workflow (detect → scan → apply → audit)
+├── "security" / "vuln" / "audit" → audit-only workflow
+├── "broken" / "can't install" / "conflict" → diagnosis mode
+└── "specific package X" → targeted update, not bulk
 ```
 
----
-
-## Commands by Language
-
-### Node.js (npm/yarn/pnpm)
-
-```bash
-# Check prerequisites
-scripts/check-tool.sh taze "npm install -g taze"
-
-# Scan for updates
-taze
-
-# Apply minor/patch
-taze minor --write
-
-# Apply specific majors
-taze major --write --include pkg1,pkg2
+## Philosophy
 
-# Monorepo support
-taze -r  # recursive
-
-# Security
-npm audit
-npm audit fix
-```
-
-### Python
-
-```bash
-# Check outdated
-pip list --outdated
-
-# Update all (careful!)
-pip-review --auto
-
-# Update specific
-pip install --upgrade package-name
-
-# Security
-pip-audit
-safety check
-```
+Safety over convenience: auto-apply only what semver guarantees is backward-compatible, gate everything else behind explicit user approval, and never mutate pinned versions or lock files in ways the ecosystem's install command wouldn't.
 
-### Go
+## NEVER
 
-```bash
-# Check outdated
-go list -m -u all
+- **NEVER run `npm audit --force`** — it breaks semver constraints, can *downgrade* packages to older vulnerable versions, and produces a lock file that diverges from package.json in ways that break reproducible installs. Use `npm audit fix` (no `--force`) or upgrade the specific package manually.
+- **NEVER run `pip-review --auto` without a virtualenv active** — it upgrades system Python packages, which breaks OS-level tools that depend on specific versions (particularly on Debian/Ubuntu). Always confirm a venv is active first.
+- **NEVER auto-apply MAJOR updates in bulk** — even if the user says "update everything", batch-approving majors makes it impossible to bisect which package broke the build. Present each major individually with current → new and a link to the changelog.
+- **NEVER delete and regenerate a lock file as the first fix** — `rm package-lock.json && npm install` wipes all transitive version pins, meaning dependencies-of-dependencies can jump to breaking versions. Use this only as a last resort after targeted fixes fail.
+- **NEVER run `go get -u ./...` in a module with replace directives** — `-u` ignores `replace` directives for indirect dependencies, silently upgrading past the pinned fork/patch. Use `go get pkg@version` for each module individually.
+- **NEVER skip `go mod tidy` after `go get` updates** — Go's toolchain won't error, but `go.sum` will contain stale hashes that fail verification in hermetic CI environments.
+- **NEVER treat `cargo update` as safe for workspace crates** — `cargo update` respects semver *ranges* in Cargo.toml but doesn't check that workspace member crates compile together. Always run `cargo check --workspace` after.
 
-# Update all
-go get -u ./...
+## When Things Go Wrong
 
-# Tidy up
-go mod tidy
-
-# Security
-govulncheck ./...
-```
-
-### Rust
-
-```bash
-# Check outdated
-cargo outdated
-
-# Update within semver
-cargo update
-
-# Security
-cargo audit
-```
-
-### Ruby
-
-```bash
-# Check outdated
-bundle outdated
-
-# Update all
-bundle update
-
-# Update specific
-bundle update --conservative gem-name
-
-# Security
-bundle audit
-```
-
-### Java (Maven)
-
-```bash
-# Check outdated
-mvn versions:display-dependency-updates
-
-# Update to latest
-mvn versions:use-latest-releases
-
-# Security
-mvn dependency:tree
-mvn dependency-check:check
-```
-
-### .NET
-
-```bash
-# Check outdated
-dotnet list package --outdated
-
-# Update specific
-dotnet add package PackageName
-
-# Security
-dotnet list package --vulnerable
-```
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| `npm install` succeeds locally but CI fails | Lock file not committed, or committed with wrong line endings | `git add package-lock.json` with `.gitattributes` setting `text=auto eol=lf` |
+| `npm audit fix` creates new vulnerabilities | Downgraded a transitive dep to an older vulnerable version | `git checkout package-lock.json`, upgrade the *direct* dep that pulls in the vulnerable transitive |
+| Peer dependency warnings flood output but nothing breaks | npm v7+ installs peers automatically; warnings are noise if app works | Check with `npm ls <peer>` — if only one version installed, safe to ignore |
+| `pip install -r requirements.txt` works but `pip check` fails | requirements.txt has incompatible upper bounds from different authors | Use `pip-compile` (pip-tools) to resolve a coherent set; add `pip check` to CI |
+| `bundle update` downgrades an unrelated gem | Bundler re-solves the whole graph; a newer gem narrowed a shared constraint | Use `bundle update --conservative gem-name` to update only the target gem's graph |
+| `go get -u` introduces a module that fails `go mod verify` | Checksum mismatch — module contents changed after publish (supply chain risk) | Do NOT ignore; report to the module maintainer; pin to last known-good SHA |
 
 ---
 
-## Diagnosis Mode
-
-When dependencies are broken, run diagnosis:
-
-### Common Issues & Fixes
-
-| Issue | Symptoms | Fix |
-|-------|----------|-----|
-| **Version Conflict** | "Cannot resolve dependency tree" | Clean install, use overrides/resolutions |
-| **Peer Dependency** | "Peer dependency not satisfied" | Install required peer version |
-| **Security Vuln** | `npm audit` shows issues | `npm audit fix` or manual update |
-| **Unused Deps** | Bloated bundle | Run `depcheck` (Node) or equivalent |
-| **Duplicate Deps** | Multiple versions installed | Run `npm dedupe` or equivalent |
-
-### Emergency Fixes
-
-```bash
-# Node.js - Nuclear reset
-rm -rf node_modules package-lock.json
-npm cache clean --force
-npm install
-
-# Python - Clean virtualenv
-rm -rf venv
-python -m venv venv
-source venv/bin/activate
-pip install -r requirements.txt
-
-# Go - Reset modules
-rm go.sum
-go mod tidy
-```
-
----
-
-## Security Audit
-
-Run security checks for any project:
-
-```bash
-# Node.js
-npm audit
-npm audit --json | jq '.metadata.vulnerabilities'
-
-# Python
-pip-audit
-safety check
-
-# Go
-govulncheck ./...
-
-# Rust
-cargo audit
-
-# Ruby
-bundle audit
-
-# .NET
-dotnet list package --vulnerable
-```
-
-### Severity Response
-
-| Severity | Action |
-|----------|--------|
-| **Critical** | Fix immediately |
-| **High** | Fix within 24h |
-| **Moderate** | Fix within 1 week |
-| **Low** | Fix in next release |
+## Workflow
 
----
+**Step 1 — Detect**: Scan for package files (`package.json`, `go.mod`, `Cargo.toml`, `requirements.txt`, `Gemfile`, `pom.xml`, `*.csproj`). Check for workspace/monorepo patterns. Identify the package manager (npm vs yarn vs pnpm matters for lock file format).
 
-## Anti-Patterns
+**Step 2 — Prerequisites**: Verify tooling. For Node.js, prefer `taze` over `ncu` (taze respects workspace protocols). For Python, confirm virtualenv is active before any `pip` mutation.
 
-| Avoid | Why | Instead |
-|-------|-----|---------|
-| Update fixed versions | Intentionally pinned | Skip them |
-| Auto-apply MAJOR | Breaking changes | Prompt user |
-| Batch MAJOR prompts | Loses context | Prompt individually |
-| Skip lock file | Irreproducible builds | Always commit lock files |
-| Ignore security alerts | Vulnerabilities | Address by severity |
+**Step 3 — Scan**: Run the ecosystem's outdated command. Categorize results:
+- Fixed (no range specifier) → **skip, note in report**
+- PATCH/MINOR within current range → **auto-apply**
+- MAJOR or outside current range → **queue for user approval**
 
----
+**Step 4 — Apply safe updates**: Apply PATCH + MINOR. For Node.js: `taze minor --write` then `npm install`. Run tests if available.
 
-## Verification Checklist
+**Step 5 — Gate majors**: For each MAJOR update, present: package name, current version, target version, and changelog URL. Ask individually. Apply only approved ones.
 
-After updates:
+**Step 6 — Audit**: Run the ecosystem's security scanner. Report findings by severity. Do NOT auto-fix — present the vulnerable package, the fix version, and whether it's a breaking change.
 
-- [ ] Updates scanned without errors
-- [ ] MINOR/PATCH auto-applied
-- [ ] MAJOR updates prompted individually
-- [ ] Fixed versions untouched
-- [ ] Lock file updated
-- [ ] Install command ran
-- [ ] Security audit passed (or issues noted)
+**Step 7 — Report**: Summary of what changed, what was skipped (pinned), what needs manual attention (majors declined, unfixed vulns).
 
 ---
 
-<details>
-<summary><strong>Deep Dive: Project Detection</strong></summary>
-
-The skill auto-detects project type by scanning for package files:
-
-| File Found | Language | Package Manager |
-|------------|----------|-----------------|
-| `package.json` | Node.js | npm/yarn/pnpm |
-| `requirements.txt` | Python | pip |
-| `pyproject.toml` | Python | pip/poetry |
-| `Pipfile` | Python | pipenv |
-| `go.mod` | Go | go modules |
-| `Cargo.toml` | Rust | cargo |
-| `Gemfile` | Ruby | bundler |
-| `pom.xml` | Java | Maven |
-| `build.gradle` | Java/Kotlin | Gradle |
-| `*.csproj` | .NET | dotnet |
-
-**Detection order matters for monorepos:**
-1. Check current directory first
-2. Then check for workspace/monorepo patterns
-3. Offer to run recursively if applicable
-
-</details>
-
-<details>
-<summary><strong>Deep Dive: Node.js with taze</strong></summary>
-
-### Prerequisites
-
-```bash
-# Install taze globally (recommended)
-npm install -g taze
-
-# Or use npx
-npx taze
-```
-
-### Smart Update Flow
-
-```bash
-# 1. Scan all updates
-taze
-
-# 2. Apply safe updates (minor + patch)
-taze minor --write
-
-# 3. For each major, prompt user:
-#    "Update @types/node from ^20.0.0 to ^22.0.0?"
-#    If yes, add to approved list
-
-# 4. Apply approved majors
-taze major --write --include approved-pkg1,approved-pkg2
-
-# 5. Install
-npm install  # or pnpm install / yarn
-```
-
-### Auto-Approve List
-
-Some packages have frequent major bumps but are backward-compatible:
-
-| Package | Reason |
-|---------|--------|
-| `lucide-react` | Icon library, majors are additive |
-| `@types/*` | Type definitions, usually safe |
+## Commands Reference
 
-</details>
+See [`references/commands-by-language.md`](references/commands-by-language.md) for the full command reference per ecosystem.
 
-<details>
-<summary><strong>Deep Dive: Version Strategies</strong></summary>
-
-### Semantic Versioning
-
-```
-MAJOR.MINOR.PATCH (e.g., 2.3.1)
-
-MAJOR: Breaking changes - requires code changes
-MINOR: New features - backward compatible
-PATCH: Bug fixes - backward compatible
-```
-
-### Range Specifiers
-
-| Specifier | Meaning | Example |
-|-----------|---------|---------|
-| `^1.2.3` | Minor + Patch OK | `>=1.2.3 <2.0.0` |
-| `~1.2.3` | Patch only | `>=1.2.3 <1.3.0` |
-| `1.2.3` | Exact (fixed) | Only `1.2.3` |
-| `>=1.2.3` | At least | Any `>=1.2.3` |
-| `*` | Any | Latest (dangerous) |
-
-### Recommended Strategy
-
-```json
-{
-  "dependencies": {
-    "critical-lib": "1.2.3",      // Exact for critical
-    "stable-lib": "~1.2.3",       // Patch only for stable
-    "modern-lib": "^1.2.3"        // Minor OK for active
-  }
-}
-```
-
-</details>
-
-<details>
-<summary><strong>Deep Dive: Conflict Resolution</strong></summary>
-
-### Node.js Conflicts
-
-**Diagnosis:**
-```bash
-npm ls package-name      # See dependency tree
-npm explain package-name # Why installed
-yarn why package-name    # Yarn equivalent
-```
-
-**Resolution with overrides:**
-```json
-// package.json
-{
-  "overrides": {
-    "lodash": "^4.18.0"
-  }
-}
-```
-
-**Resolution with resolutions (Yarn):**
-```json
-{
-  "resolutions": {
-    "lodash": "^4.18.0"
-  }
-}
-```
-
-### Python Conflicts
-
-**Diagnosis:**
-```bash
-pip check
-pipdeptree -p package-name
-```
-
-**Resolution:**
-```bash
-# Use virtual environment
-python -m venv venv
-source venv/bin/activate
-pip install -r requirements.txt
-
-# Or use constraints
-pip install -c constraints.txt -r requirements.txt
-```
-
-</details>
-
----
-
-## Script Reference
+## Scripts
 
 | Script | Purpose |
 |--------|---------|
-| `scripts/check-tool.sh` | Verify tool is installed |
-| `scripts/run-taze.sh` | Run taze with proper flags |
-
----
-
-## Related Tools
-
-| Tool | Language | Purpose |
-|------|----------|---------|
-| [taze](https://github.com/antfu-collective/taze) | Node.js | Smart dependency updates |
-| [npm-check-updates](https://github.com/raineorshine/npm-check-updates) | Node.js | Alternative to taze |
-| [pip-review](https://github.com/jgonggrijp/pip-review) | Python | Interactive pip updates |
-| [cargo-edit](https://github.com/killercup/cargo-edit) | Rust | Cargo dependency management |
-| [bundler-audit](https://github.com/rubysec/bundler-audit) | Ruby | Security auditing |
+| `scripts/check-tool.sh` | Verify a tool is installed, print install hint if missing |
+| `scripts/run-taze.sh` | Run taze with safe flags (minor mode, workspace-aware) |
diff --git a/skills/dependency-updater/references/commands-by-language.md b/skills/dependency-updater/references/commands-by-language.md
new file mode 100644
index 0000000..0f38ef7
--- /dev/null
+++ b/skills/dependency-updater/references/commands-by-language.md
@@ -0,0 +1,194 @@
+# Commands by Language
+
+Reference for the dependency-updater skill. Each section covers scan, update, install, and audit commands.
+
+---
+
+## Node.js (npm / yarn / pnpm)
+
+```bash
+# Preferred: taze (respects workspace:* protocols, caret/tilde ranges)
+taze                                      # scan all
+taze minor --write                        # apply minor+patch
+taze major --write --include pkg1,pkg2    # apply approved majors only
+taze -r                                   # recursive (monorepo)
+
+# Fallback: npm
+npm outdated
+npm update                                # minor/patch within ranges only
+npm install pkg@latest                    # specific package to latest
+
+# Install after changes
+npm install          # npm
+pnpm install         # pnpm
+yarn                 # yarn
+
+# Security
+npm audit                                 # show vulnerabilities
+npm audit fix                             # fix without breaking semver (safe)
+# DO NOT: npm audit fix --force           # breaks constraints
+
+# Conflict diagnosis
+npm ls pkg-name           # show why a version was installed
+npm explain pkg-name      # dependency path explanation
+npm dedupe                # collapse duplicate transitive versions
+```
+
+**Lock file formats**: `package-lock.json` (npm), `yarn.lock` (yarn), `pnpm-lock.yaml` (pnpm). Never commit more than one.
+
+---
+
+## Python
+
+```bash
+# Scan
+pip list --outdated
+pip-review                           # interactive review
+
+# Update
+pip install --upgrade pkg-name       # single package
+# WARNING: only inside active virtualenv:
+pip-review --auto                    # all packages
+
+# Verify consistency
+pip check                            # detect incompatible installed packages
+
+# Security
+pip-audit                            # CVE scan via PyPI advisory db
+safety check                         # alternative (requires API key for full data)
+
+# Conflict resolution
+pipdeptree -p pkg-name               # why a package is at its current version
+pip-compile requirements.in          # regenerate requirements.txt with full resolution
+```
+
+**Virtualenv check before any mutation**: `python -c "import sys; assert sys.prefix != sys.base_prefix, 'Not in venv'"`.
+
+---
+
+## Go
+
+```bash
+# Scan
+go list -m -u all                    # show available updates for all modules
+
+# Update (targeted — safer than bulk -u)
+go get pkg@latest                    # single module
+go get pkg@v1.2.3                    # pin to specific version
+
+# CAUTION with bulk update:
+# go get -u ./...  — ignores replace directives for indirect deps
+
+# Tidy (always run after any go get)
+go mod tidy                          # remove unused, add missing, update go.sum
+
+# Verify (supply chain check)
+go mod verify                        # verify go.sum hashes match downloaded content
+
+# Security
+govulncheck ./...                    # Go vulnerability database scan
+```
+
+---
+
+## Rust
+
+```bash
+# Scan
+cargo outdated                       # requires cargo-outdated: cargo install cargo-outdated
+
+# Update (within Cargo.toml semver ranges)
+cargo update                         # all crates within ranges
+cargo update -p crate-name           # single crate
+
+# After update, always verify workspace compiles
+cargo check --workspace
+
+# Security
+cargo audit                          # requires cargo-audit: cargo install cargo-audit
+```
+
+---
+
+## Ruby
+
+```bash
+# Scan
+bundle outdated
+
+# Update (conservative = only updates the named gem's subgraph)
+bundle update --conservative gem-name
+bundle update                        # full re-solve (use carefully — see NEVER list)
+
+# Install
+bundle install
+
+# Security
+bundle audit                         # requires: gem install bundler-audit
+bundle audit update                  # refresh the advisory database first
+```
+
+---
+
+## Java (Maven)
+
+```bash
+# Scan
+mvn versions:display-dependency-updates
+mvn versions:display-plugin-updates
+
+# Update
+mvn versions:use-latest-releases     # updates to latest release (skips snapshots)
+mvn versions:use-latest-versions     # includes snapshots
+
+# Security
+mvn dependency:tree                  # review full tree
+mvn dependency-check:check           # OWASP dependency check plugin
+```
+
+---
+
+## .NET
+
+```bash
+# Scan
+dotnet list package --outdated
+dotnet list package --vulnerable     # security scan
+
+# Update (no bulk command — must update per package)
+dotnet add package PackageName               # latest stable
+dotnet add package PackageName --version X.Y.Z  # specific version
+
+# Tool: dotnet-outdated (third-party, recommended)
+dotnet tool install -g dotnet-outdated
+dotnet outdated                              # then apply interactively
+```
+
+---
+
+## Emergency Resets (last resort only)
+
+### Node.js
+```bash
+rm -rf node_modules package-lock.json
+npm cache clean --force
+npm install
+```
+Risk: all transitive versions re-resolved; may introduce new breakage.
+
+### Python
+```bash
+deactivate
+rm -rf venv
+python -m venv venv
+source venv/bin/activate
+pip install -r requirements.txt
+pip check
+```
+
+### Go
+```bash
+rm go.sum
+go mod tidy      # regenerates go.sum from module cache or network
+go mod verify    # confirm integrity
+```
diff --git a/skills/design-system-starter/SKILL.md b/skills/design-system-starter/SKILL.md
index 4d37e9b..d5f71e8 100644
--- a/skills/design-system-starter/SKILL.md
+++ b/skills/design-system-starter/SKILL.md
@@ -1,603 +1,171 @@
 ---
 name: design-system-starter
-description: Create and evolve design systems with design tokens, component architecture, accessibility guidelines, and documentation templates. Ensures consistent, scalable, and accessible UI across products.
+description: Create and evolve design systems — design tokens (W3C/Style Dictionary format), atomic component architecture, theming, dark mode, WCAG 2.1 accessibility, and documentation scaffolding. Trigger phrases: "design system", "design tokens", "component library", "atomic design", "token architecture", "WCAG compliance", "dark mode theming".
 license: MIT
 metadata:
-  version: 1.0.0
+  version: 2.0.0
   tags: [design-system, ui, components, design-tokens, accessibility, frontend]
 ---
 
 # Design System Starter
 
-Build robust, scalable design systems that ensure visual consistency and exceptional user experiences.
-
 ---
 
-## Quick Start
+## Mindset
 
-Just describe what you need:
+1. **Tokens are contracts, not values.** Primitive tokens own the raw hex/rem; semantic tokens reference primitives by alias. Changing a primitive propagates everywhere — but only if consumers use semantic tokens, never primitives directly in components.
 
-```
-Create a design system for my React app with dark mode support
-```
+2. **The atomic hierarchy is a dependency graph, not a naming convention.** Atoms have zero component dependencies. Molecules depend only on atoms. Breaking this means any atom refactor cascades unpredictably into molecules and organisms. Enforce it structurally, not just by folder name.
 
-That's it. The skill provides tokens, components, and accessibility guidelines.
-
----
+3. **Accessibility debt compounds faster than tech debt.** Retrofitting focus traps, ARIA live regions, and color contrast into an established library costs 3-5x building them in. Build accessible primitives first; accessibility is cheaper at atom level than organism level.
 
-## Triggers
+4. **Dark mode is a theming problem, not a CSS problem.** If dark mode is solved with `dark:` class overrides scattered across components rather than semantic token swaps, you've made every future theme (high-contrast, branded) require the same scattered changes. Semantic tokens that swap at theme boundary is the only scalable approach.
 
-| Trigger | Example |
-|---------|---------|
-| Create design system | "Create a design system for my app" |
-| Design tokens | "Set up design tokens for colors and spacing" |
-| Component architecture | "Design component structure using atomic design" |
-| Accessibility | "Ensure WCAG 2.1 compliance for my components" |
-| Dark mode | "Implement theming with dark mode support" |
+5. **The system's API surface is harder to change than its internals.** Prop names, slot patterns, and export shapes become load-bearing the moment a second team adopts the library. Treat the component API as a public contract from day one.
 
 ---
 
-## Quick Reference
+## Navigation
 
-| Task | Output |
-|------|--------|
-| Design tokens | Color, typography, spacing, shadows JSON |
-| Component structure | Atomic design hierarchy (atoms, molecules, organisms) |
-| Theming | CSS variables or ThemeProvider setup |
-| Accessibility | WCAG 2.1 AA compliant patterns |
-| Documentation | Component docs with props, examples, a11y notes |
+**Use this skill when**:
+- Starting a design token architecture from scratch or auditing an existing one
+- Scaffolding component libraries with React/Vue/Svelte + TypeScript
+- Implementing multi-theme support (dark mode, branded variants, high-contrast)
+- Establishing WCAG 2.1 AA compliance patterns for an existing or new library
+- Producing Style Dictionary / W3C DTCG token JSON structures
+- Advising on atomic design component decomposition
 
----
+**Do NOT use this skill when**:
+- The user wants a single one-off component (use a generic React/CSS skill instead)
+- The project already has a mature design system (Material UI, Chakra, Radix) and just needs customization — advise extending the existing system rather than creating a parallel one
+- The task is purely visual design (Figma layout, brand identity) with no implementation component
 
-## Bundled Resources
+If the component library is built on Material UI, also load mui for MUI-specific token consumption via sx shorthand and slotProps customization.
 
-- `references/component-examples.md` - Complete component implementations
-- `templates/design-tokens-template.json` - W3C design token format
-- `templates/component-template.tsx` - React component template
-- `checklists/design-system-checklist.md` - Design system audit checklist
+**Quick decision tree**:
+- Has an existing component library? → Extend via tokens/theming, don't rebuild atoms
+- Greenfield project? → Start with token tiers, then atoms, then documentation tooling
+- Existing codebase with inconsistent styles? → Audit first (`checklists/design-system-checklist.md`), then extract tokens from existing values
 
 ---
 
-## Design System Philosophy
-
-### What is a Design System?
-
-A design system is more than a component library—it's a collection of:
-
-1. **Design Tokens**: Foundational design decisions (colors, spacing, typography)
-2. **Components**: Reusable UI building blocks
-3. **Patterns**: Common UX solutions and compositions
-4. **Guidelines**: Rules, principles, and best practices
-5. **Documentation**: How to use everything effectively
+## Philosophy
 
-### Core Principles
-
-**1. Consistency Over Creativity**
-- Predictable patterns reduce cognitive load
-- Users learn once, apply everywhere
-- Designers and developers speak the same language
-
-**2. Accessible by Default**
-- WCAG 2.1 Level AA compliance minimum
-- Keyboard navigation built-in
-- Screen reader support from the start
-
-**3. Scalable and Maintainable**
-- Design tokens enable global changes
-- Component composition reduces duplication
-- Versioning and deprecation strategies
-
-**4. Developer-Friendly**
-- Clear API contracts
-- Comprehensive documentation
-- Easy to integrate and customize
+A design system is a living contract between design and engineering — not a component dump. Every decision at the token and API layer is a long-term commitment. Optimize for the maintenance burden of the second year, not the delivery speed of the first sprint.
 
 ---
 
-## Design Tokens
-
-Design tokens are the atomic design decisions that define your system's visual language.
-
-### Token Categories
-
-#### 1. Color Tokens
-
-**Primitive Colors** (Raw values):
-```json
-{
-  "color": {
-    "primitive": {
-      "blue": {
-        "50": "#eff6ff",
-        "100": "#dbeafe",
-        "200": "#bfdbfe",
-        "300": "#93c5fd",
-        "400": "#60a5fa",
-        "500": "#3b82f6",
-        "600": "#2563eb",
-        "700": "#1d4ed8",
-        "800": "#1e40af",
-        "900": "#1e3a8a",
-        "950": "#172554"
-      }
-    }
-  }
-}
-```
+## NEVER
 
-**Semantic Colors** (Contextual meaning):
-```json
-{
-  "color": {
-    "semantic": {
-      "brand": {
-        "primary": "{color.primitive.blue.600}",
-        "primary-hover": "{color.primitive.blue.700}",
-        "primary-active": "{color.primitive.blue.800}"
-      },
-      "text": {
-        "primary": "{color.primitive.gray.900}",
-        "secondary": "{color.primitive.gray.600}",
-        "tertiary": "{color.primitive.gray.500}",
-        "disabled": "{color.primitive.gray.400}",
-        "inverse": "{color.primitive.white}"
-      },
-      "background": {
-        "primary": "{color.primitive.white}",
-        "secondary": "{color.primitive.gray.50}",
-        "tertiary": "{color.primitive.gray.100}"
-      },
-      "feedback": {
-        "success": "{color.primitive.green.600}",
-        "warning": "{color.primitive.yellow.600}",
-        "error": "{color.primitive.red.600}",
-        "info": "{color.primitive.blue.600}"
-      }
-    }
-  }
-}
-```
+- **NEVER let components consume primitive tokens directly** — because when `blue-600` becomes `brand-600` in a rebrand, every component needs a manual update instead of one token swap. Components must only reference semantic tokens (`color.text.primary`, `color.brand.interactive`).
 
-**Accessibility**: Ensure color contrast ratios meet WCAG 2.1 Level AA:
-- Normal text: 4.5:1 minimum
-- Large text (18pt+ or 14pt+ bold): 3:1 minimum
-- UI components and graphics: 3:1 minimum
-
-#### 2. Typography Tokens
-
-```json
-{
-  "typography": {
-    "fontFamily": {
-      "sans": "'Inter', -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif",
-      "serif": "'Georgia', 'Times New Roman', serif",
-      "mono": "'Fira Code', 'Courier New', monospace"
-    },
-    "fontSize": {
-      "xs": "0.75rem",     // 12px
-      "sm": "0.875rem",    // 14px
-      "base": "1rem",      // 16px
-      "lg": "1.125rem",    // 18px
-      "xl": "1.25rem",     // 20px
-      "2xl": "1.5rem",     // 24px
-      "3xl": "1.875rem",   // 30px
-      "4xl": "2.25rem",    // 36px
-      "5xl": "3rem"        // 48px
-    },
-    "fontWeight": {
-      "normal": 400,
-      "medium": 500,
-      "semibold": 600,
-      "bold": 700
-    },
-    "lineHeight": {
-      "tight": 1.25,
-      "normal": 1.5,
-      "relaxed": 1.75,
-      "loose": 2
-    },
-    "letterSpacing": {
-      "tight": "-0.025em",
-      "normal": "0",
-      "wide": "0.025em"
-    }
-  }
-}
-```
+- **NEVER use `rgba()` hardcodes in component styles** — because they don't participate in the token system, break dark mode, and are invisible to Style Dictionary transforms. All color values must trace back to a token reference.
 
-#### 3. Spacing Tokens
-
-**Scale**: Use a consistent spacing scale (commonly 4px or 8px base)
-
-```json
-{
-  "spacing": {
-    "0": "0",
-    "1": "0.25rem",   // 4px
-    "2": "0.5rem",    // 8px
-    "3": "0.75rem",   // 12px
-    "4": "1rem",      // 16px
-    "5": "1.25rem",   // 20px
-    "6": "1.5rem",    // 24px
-    "8": "2rem",      // 32px
-    "10": "2.5rem",   // 40px
-    "12": "3rem",     // 48px
-    "16": "4rem",     // 64px
-    "20": "5rem",     // 80px
-    "24": "6rem"      // 96px
-  }
-}
-```
+- **NEVER create circular token references** (`A → B → A`) — Style Dictionary resolves references in topological order; cycles produce silent build failures or `undefined` output values that only surface at runtime. Always audit with `style-dictionary build --verbose` before shipping token changes.
 
-**Component-Specific Spacing**:
-```json
-{
-  "component": {
-    "button": {
-      "padding-x": "{spacing.4}",
-      "padding-y": "{spacing.2}",
-      "gap": "{spacing.2}"
-    },
-    "card": {
-      "padding": "{spacing.6}",
-      "gap": "{spacing.4}"
-    }
-  }
-}
-```
+- **NEVER solve dark mode with duplicated class overrides** (e.g., Tailwind `dark:bg-gray-900` on every element) — it couples theme logic to every component, making a third theme (high-contrast, branded) require a full codebase scan. Use semantic token swap at the `:root`/`[data-theme]` boundary instead.
 
-#### 4. Border Radius Tokens
-
-```json
-{
-  "borderRadius": {
-    "none": "0",
-    "sm": "0.125rem",   // 2px
-    "base": "0.25rem",  // 4px
-    "md": "0.375rem",   // 6px
-    "lg": "0.5rem",     // 8px
-    "xl": "0.75rem",    // 12px
-    "2xl": "1rem",      // 16px
-    "full": "9999px"
-  }
-}
-```
+- **NEVER ship components with CSS-in-JS runtime cost in a performance-critical library** — libraries like `emotion`/`styled-components` with dynamic interpolations re-compute styles per render. For shared libraries, prefer CSS variables (zero runtime) or build-time CSS Modules. Runtime CSS-in-JS is acceptable in application code, not in a design system consumed by many teams.
 
-#### 5. Shadow Tokens
-
-```json
-{
-  "shadow": {
-    "xs": "0 1px 2px 0 rgba(0, 0, 0, 0.05)",
-    "sm": "0 1px 3px 0 rgba(0, 0, 0, 0.1), 0 1px 2px -1px rgba(0, 0, 0, 0.1)",
-    "base": "0 4px 6px -1px rgba(0, 0, 0, 0.1), 0 2px 4px -2px rgba(0, 0, 0, 0.1)",
-    "md": "0 10px 15px -3px rgba(0, 0, 0, 0.1), 0 4px 6px -4px rgba(0, 0, 0, 0.1)",
-    "lg": "0 20px 25px -5px rgba(0, 0, 0, 0.1), 0 8px 10px -6px rgba(0, 0, 0, 0.1)",
-    "xl": "0 25px 50px -12px rgba(0, 0, 0, 0.25)"
-  }
-}
-```
+- **NEVER namespace tokens with the framework name** (e.g., `react-button-primary`) — tokens must be framework-agnostic because the same token set may be consumed by React, native mobile, and email templates simultaneously. Token names describe *intent*, not implementation.
+
+- **NEVER skip the `composite` token tier for complex multi-property values** — shadows, typography styles, and border shorthand are composite tokens. Treating them as loose scalar groups makes it impossible for Style Dictionary to output platform-correct formats (Android uses separate shadow properties; iOS uses NSShadow).
 
 ---
 
-## Component Architecture
+## When Things Go Wrong
 
-### Atomic Design Methodology
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| Style Dictionary build produces `undefined` values | Circular token reference or misspelled alias path | Run `style-dictionary build --verbose`; check reference chain with `{token.path.here}` syntax audit |
+| Dark mode flashes white on load (FOUC) | Theme class applied after hydration (SSR/SSG mismatch) | Set theme attribute server-side; use `<script>` before `<body>` to read `localStorage` and set `data-theme` synchronously |
+| Semantic tokens not resolving in consuming app | Consumer imports primitive token file directly, not the semantic layer | Enforce barrel exports that only expose semantic tokens; add lint rule `no-restricted-imports` for primitive token paths |
+| Component a11y audit fails WCAG AA | Color chosen from primitive scale without contrast check | Run contrast ratio check between semantic `text` and `background` pairs at every theme boundary; automate with `jest-axe` in CI |
+| Token JSON grows unwieldy (500+ tokens) | Skipped the tier separation (primitives / semantics / component-level) | Refactor into three files: `tokens/primitive.json`, `tokens/semantic.json`, `tokens/component.json`; Style Dictionary supports multi-source |
+| Two teams define conflicting component APIs | No API review process before merge | Establish a Design System RFC template; any new prop or slot addition requires a signed-off proposal before implementation |
 
-**Atoms** → **Molecules** → **Organisms** → **Templates** → **Pages**
+---
 
-#### Atoms (Primitive Components)
-Basic building blocks that can't be broken down further.
+## Token Architecture
 
-**Examples:**
-- Button
-- Input
-- Label
-- Icon
-- Badge
-- Avatar
+Three tiers — always three, never collapse them:
 
-**Button Component:**
-```typescript
-interface ButtonProps {
-  variant?: 'primary' | 'secondary' | 'outline' | 'ghost';
-  size?: 'sm' | 'md' | 'lg';
-  disabled?: boolean;
-  loading?: boolean;
-  icon?: React.ReactNode;
-  children: React.ReactNode;
-}
-```
+1. **Primitive** — raw values, no aliases (`color.blue.600 = #2563eb`)
+2. **Semantic** — intent aliases referencing primitives (`color.brand.interactive = {color.blue.600}`)
+3. **Component** — component-scoped overrides referencing semantic (`component.button.background = {color.brand.interactive}`)
 
-See `references/component-examples.md` for complete Button implementation with variants, sizes, and styling patterns.
-
-#### Molecules (Simple Compositions)
-Groups of atoms that function together.
-
-**Examples:**
-- SearchBar (Input + Button)
-- FormField (Label + Input + ErrorMessage)
-- Card (Container + Title + Content + Actions)
-
-**FormField Molecule:**
-```typescript
-interface FormFieldProps {
-  label: string;
-  name: string;
-  error?: string;
-  hint?: string;
-  required?: boolean;
-  children: React.ReactNode;
-}
+Style Dictionary config pattern:
+```js
+// style-dictionary.config.js
+module.exports = {
+  source: ['tokens/primitive.json', 'tokens/semantic.json', 'tokens/component.json'],
+  platforms: {
+    css: { transformGroup: 'css', buildPath: 'dist/', files: [{ destination: 'tokens.css', format: 'css/variables' }] },
+    js:  { transformGroup: 'js',  buildPath: 'dist/', files: [{ destination: 'tokens.js',  format: 'javascript/es6' }] }
+  }
+};
 ```
 
-See `references/component-examples.md` for FormField, Card (compound component pattern), Input with variants, Modal, and more composition examples.
-
-#### Organisms (Complex Compositions)
-Complex UI components made of molecules and atoms.
-
-**Examples:**
-- Navigation Bar
-- Product Card Grid
-- User Profile Section
-- Modal Dialog
-
-#### Templates (Page Layouts)
-Page-level structures that define content placement.
-
-**Examples:**
-- Dashboard Layout (Sidebar + Header + Main Content)
-- Marketing Page Layout (Hero + Features + Footer)
-- Settings Page Layout (Tabs + Content Panels)
-
-#### Pages (Specific Instances)
-Actual pages with real content.
+Full token JSON templates: `templates/design-tokens-template.json`
 
 ---
 
-## Component API Design
+## Component API Checklist
 
-### Props Best Practices
+Before shipping any new component:
 
-**1. Predictable Prop Names**
-```typescript
-// ✅ Good: Consistent naming
-<Button variant="primary" size="md" />
-<Input variant="outlined" size="md" />
+- [ ] Props use consistent naming (`variant`, `size`, `isDisabled` — not mixed conventions)
+- [ ] All interactive states covered: default, hover, focus-visible, active, disabled, loading
+- [ ] Polymorphic render supported via `as` prop or `asChild` (Radix pattern) where applicable
+- [ ] Compound component pattern used for complex compositions (`Card.Header`, `Card.Body`)
+- [ ] Every interactive element reachable by keyboard; focus ring visible at 3:1+ contrast
+- [ ] ARIA attributes documented; screen reader behavior tested with VoiceOver + NVDA
+- [ ] No inline `rgba()` or hardcoded hex — all colors via CSS variables/tokens
 
-// ❌ Bad: Inconsistent
-<Button type="primary" sizeMode="md" />
-<Input style="outlined" inputSize="md" />
-```
-
-**2. Sensible Defaults**
-```typescript
-// ✅ Good: Provides defaults
-interface ButtonProps {
-  variant?: 'primary' | 'secondary';  // Default: primary
-  size?: 'sm' | 'md' | 'lg';          // Default: md
-}
-
-// ❌ Bad: Everything required
-interface ButtonProps {
-  variant: 'primary' | 'secondary';
-  size: 'sm' | 'md' | 'lg';
-  color: string;
-  padding: string;
-}
-```
-
-**3. Composition Over Configuration**
-```typescript
-// ✅ Good: Composable
-<Card>
-  <Card.Header>
-    <Card.Title>Title</Card.Title>
-  </Card.Header>
-  <Card.Body>Content</Card.Body>
-  <Card.Footer>Actions</Card.Footer>
-</Card>
-
-// ❌ Bad: Too many props
-<Card
-  title="Title"
-  content="Content"
-  footerContent="Actions"
-  hasHeader={true}
-  hasFooter={true}
-/>
-```
-
-**4. Polymorphic Components**
-Allow components to render as different HTML elements:
-```typescript
-<Button as="a" href="/login">Login</Button>
-<Button as="button" onClick={handleClick}>Click Me</Button>
-```
-
-See `references/component-examples.md` for complete polymorphic component TypeScript patterns.
+Full component template: `templates/component-template.tsx`
+Full accessibility and composition examples: `references/component-examples.md`
+Full audit checklist: `checklists/design-system-checklist.md`
 
 ---
 
-## Theming and Dark Mode
-
-### Theme Structure
-
-```typescript
-interface Theme {
-  colors: {
-    brand: {
-      primary: string;
-      secondary: string;
-    };
-    text: {
-      primary: string;
-      secondary: string;
-    };
-    background: {
-      primary: string;
-      secondary: string;
-    };
-    feedback: {
-      success: string;
-      warning: string;
-      error: string;
-      info: string;
-    };
-  };
-  typography: {
-    fontFamily: {
-      sans: string;
-      mono: string;
-    };
-    fontSize: Record<string, string>;
-  };
-  spacing: Record<string, string>;
-  borderRadius: Record<string, string>;
-  shadow: Record<string, string>;
-}
-```
-
-### Dark Mode Implementation
+## Theming Pattern (Dark Mode Done Right)
 
-**Approach 1: CSS Variables**
 ```css
+/* tokens.css — generated by Style Dictionary */
 :root {
   --color-bg-primary: #ffffff;
-  --color-text-primary: #000000;
+  --color-text-primary: #111827;
+  --color-brand-interactive: #2563eb;
 }
 
 [data-theme="dark"] {
-  --color-bg-primary: #1a1a1a;
-  --color-text-primary: #ffffff;
+  --color-bg-primary: #111827;
+  --color-text-primary: #f9fafb;
+  --color-brand-interactive: #60a5fa;
 }
 ```
 
-**Approach 2: Tailwind CSS Dark Mode**
-```tsx
-<div className="bg-white dark:bg-gray-900 text-gray-900 dark:text-white">
-  Content
-</div>
-```
-
-**Approach 3: Styled Components ThemeProvider**
-```typescript
-const lightTheme = { background: '#fff', text: '#000' };
-const darkTheme = { background: '#000', text: '#fff' };
-
-<ThemeProvider theme={isDark ? darkTheme : lightTheme}>
-  <App />
-</ThemeProvider>
-```
+Components reference only `var(--color-*)` — never a raw hex. Theme switches by toggling `data-theme` on `<html>`. No component file changes required for any theme addition.
 
 ---
 
-## Accessibility Guidelines
-
-### WCAG 2.1 Level AA Compliance
-
-#### Color Contrast
-- **Normal text** (< 18pt): 4.5:1 minimum
-- **Large text** (≥ 18pt or ≥ 14pt bold): 3:1 minimum
-- **UI components**: 3:1 minimum
-
-**Tools**: Use contrast checkers like [WebAIM Contrast Checker](https://webaim.org/resources/contrastchecker/)
-
-#### Keyboard Navigation
-```typescript
-// ✅ All interactive elements must be keyboard accessible
-<button
-  onClick={handleClick}
-  onKeyDown={(e) => e.key === 'Enter' && handleClick()}
->
-  Click me
-</button>
-
-// ✅ Focus management
-<Modal>
-  <FocusTrap>
-    {/* Modal content */}
-  </FocusTrap>
-</Modal>
-```
-
-#### ARIA Attributes
-Essential ARIA patterns:
-- `aria-label`: Provide accessible names
-- `aria-expanded`: Communicate expanded/collapsed state
-- `aria-controls`: Associate controls with content
-- `aria-live`: Announce dynamic content changes
+## Workflow Sequence
 
-#### Screen Reader Support
-- Use semantic HTML elements (`<button>`, `<nav>`, `<main>`)
-- Avoid div/span soup for interactive elements
-- Provide meaningful labels for all controls
-
-See `references/component-examples.md` for complete accessibility examples including Skip Links, focus traps, and ARIA patterns.
+1. **Audit** — run `checklists/design-system-checklist.md` against existing codebase
+2. **Token extraction** — identify all unique values in use; map to primitive → semantic tiers
+3. **Atom build** — Button, Input, Label, Icon, Badge with full state coverage and a11y
+4. **Style Dictionary integration** — multi-platform output (CSS vars, JS ES6, optionally iOS/Android)
+5. **Documentation** — Storybook stories per component; accessibility section mandatory
+6. **API freeze + RFC process** — lock prop contracts before second team adoption
+7. **Versioning** — semver; breaking prop changes = major; new variants = minor
 
 ---
 
-## Documentation Standards
-
-### Component Documentation Template
-
-Each component should document:
-- **Purpose**: What the component does
-- **Usage**: Import statement and basic example
-- **Variants**: Available visual styles
-- **Props**: Complete prop table with types, defaults, descriptions
-- **Accessibility**: Keyboard support, ARIA attributes, screen reader behavior
-- **Examples**: Common use cases with code
-
-Use Storybook, Docusaurus, or similar tools for interactive documentation.
-
-See `templates/component-template.tsx` for the standard component structure.
-
----
-
-## Design System Workflow
-
-### 1. Design Phase
-- **Audit existing patterns**: Identify inconsistencies
-- **Define design tokens**: Colors, typography, spacing
-- **Create component inventory**: List all needed components
-- **Design in Figma**: Create component library
-
-### 2. Development Phase
-- **Set up tooling**: Storybook, TypeScript, testing
-- **Implement tokens**: CSS variables or theme config
-- **Build atoms first**: Start with primitives
-- **Compose upward**: Build molecules, organisms
-- **Document as you go**: Write docs alongside code
-
-### 3. Adoption Phase
-- **Create migration guide**: Help teams adopt
-- **Provide codemods**: Automate migrations when possible
-- **Run workshops**: Train teams on usage
-- **Gather feedback**: Iterate based on real usage
-
-### 4. Maintenance Phase
-- **Version semantically**: Major/minor/patch releases
-- **Deprecation strategy**: Phase out old components gracefully
-- **Changelog**: Document all changes
-- **Monitor adoption**: Track usage across products
-
----
+## Bundled Resources
 
-## Quick Start Checklist
-
-When creating a new design system:
-
-- [ ] Define design principles and values
-- [ ] Establish design token structure (colors, typography, spacing)
-- [ ] Create primitive color palette (50-950 scale)
-- [ ] Define semantic color tokens (brand, text, background, feedback)
-- [ ] Set typography scale and font families
-- [ ] Establish spacing scale (4px or 8px base)
-- [ ] Design atomic components (Button, Input, Label, etc.)
-- [ ] Implement theming system (light/dark mode)
-- [ ] Ensure WCAG 2.1 Level AA compliance
-- [ ] Set up documentation (Storybook or similar)
-- [ ] Create usage examples for each component
-- [ ] Establish versioning and release strategy
-- [ ] Create migration guides for adopting teams
+| File | Purpose |
+|------|---------|
+| `templates/design-tokens-template.json` | W3C DTCG-format token file with all tiers |
+| `templates/component-template.tsx` | TypeScript React component scaffold |
+| `references/component-examples.md` | Button, FormField, Card, Modal, polymorphic patterns |
+| `checklists/design-system-checklist.md` | Pre-launch audit checklist |
diff --git a/skills/difficult-workplace-conversations/SKILL.md b/skills/difficult-workplace-conversations/SKILL.md
index be42193..21a8a8e 100644
--- a/skills/difficult-workplace-conversations/SKILL.md
+++ b/skills/difficult-workplace-conversations/SKILL.md
@@ -1,238 +1,96 @@
 ---
 name: difficult-workplace-conversations
-description: Structured approach to workplace conflicts, performance discussions, and challenging feedback using preparation-delivery-followup framework. Use when preparing for tough conversations, addressing conflicts, giving critical feedback, or navigating sensitive workplace discussions.
+description: Practitioner guide for preparing, navigating, and following up on high-stakes workplace conversations — performance issues, peer conflict, upward feedback, terminations, salary negotiations, and emotionally charged discussions. Trigger phrases: "how do I talk to", "preparing for a hard conversation", "giving difficult feedback", "conflict with coworker", "talking to my manager about", "addressing performance", "someone is defensive".
 allowed-tools: Read, Glob, Grep
 ---
 
-# Difficult Conversations Skill
+## Mindset
 
-A structured framework for approaching challenging workplace conversations including conflicts, performance issues, sensitive feedback, and emotionally charged discussions.
+- **The goal is a shared understanding, not a verdict.** Most failed conversations happen because one party enters to win. Practitioners enter to learn something they don't know yet — even when the facts seem clear.
+- **Timing is a power lever most people waste.** Raising an issue when the other person is stressed, public, or cornered guarantees defense. The right moment (private, unhurried, not post-incident) does half the work for you.
+- **Specificity is protection.** Vague feedback ("you need to be more professional") creates defensiveness because it leaves the other person to guess. Specificity removes the guess and removes the threat.
+- **A prepared BATNA changes how you show up.** Knowing what you'll do if the conversation fails — not as a threat, but as a real alternative — eliminates desperation from your posture. The other person feels it even if you never say it.
+- **Regression is normal, not evidence of failure.** Behavior changes after a hard conversation often reverse within 2–3 weeks. Practitioners schedule a check-in at week 3 precisely because of this, not because they distrust the other person.
 
-## When to Use This Skill
+## Navigation
 
-- Preparing for a challenging conversation with a colleague
-- Addressing performance issues with a team member
-- Delivering difficult feedback to a peer or manager
-- Navigating conflict between team members
-- Discussing sensitive topics (salary, promotion, termination)
-- Handling emotional or defensive reactions
-- Following up after difficult discussions
+**Use this skill when**:
+- Preparing a conversation with real stakes: performance, conflict, termination, salary, upward feedback
+- Someone is defensive, emotional, or has a pattern of avoiding accountability
+- You need to give feedback to someone with more organizational power than you
+- A prior conversation didn't produce lasting change
 
-## Core Framework: Preparation-Delivery-Followup
+**Do NOT use this skill when**:
+- Casual feedback in a low-stakes context (use `feedback-mastery` instead)
+- HR/legal issues where process must lead (loop in HR first, then use this for your role in that process)
+- Written communication only (this skill is optimized for live dialogue)
 
-Difficult conversations succeed or fail based on three phases:
+**Triage — what kind of conversation is this?**
 
-### Phase 1: Preparation (Before)
-
-**Purpose:** Set yourself up for a productive conversation
-
-1. **Clarify the Issue**
-   - What specifically happened? (Observable facts only)
-   - What is the impact? (On you, team, work)
-   - What do you need to change?
-
-2. **Check Your Emotions**
-   - What am I feeling? Why?
-   - Am I calm enough to have this conversation?
-   - What might trigger me during this conversation?
-
-3. **Consider Their Perspective**
-   - How might they see this situation?
-   - What constraints or pressures might they have?
-   - What do they care about that I can acknowledge?
-
-4. **Define Your Goal**
-   - What outcome do I want?
-   - What is the minimum acceptable result?
-   - What am I willing to compromise on?
-
-### Phase 2: Delivery (During)
-
-**Purpose:** Have the conversation effectively
-
-1. **Open Neutrally**
-   - Start with facts, not judgments
-   - Express intent to understand, not accuse
-   - Create psychological safety
-
-2. **Share Your Perspective**
-   - Describe behavior, not character
-   - Focus on impact, not intention
-   - Use "I" statements, not "you always"
-
-3. **Listen Actively**
-   - Ask clarifying questions
-   - Acknowledge their viewpoint
-   - Look for shared interests
-
-4. **Seek Resolution**
-   - Propose specific actions
-   - Agree on next steps
-   - Set check-in timeline
-
-### Phase 3: Followup (After)
-
-**Purpose:** Ensure lasting resolution
-
-1. **Document Agreements**
-   - What was agreed?
-   - Who does what by when?
-   - How will you measure success?
-
-2. **Check Progress**
-   - Follow up as promised
-   - Acknowledge improvements
-   - Address continued issues promptly
-
-3. **Maintain Relationship**
-   - Separate issue from person
-   - Rebuild trust over time
-   - Watch for regression
-
-## Key Principles
-
-### Separate Impact from Intent
-
-**What happened:** Observable behavior
-**What I felt:** Your emotional response
-**What I assume:** Their intention (often wrong)
-
-Focus conversation on behavior and impact, not assumed intentions.
-
-### The SBI Model
-
-**Situation:** When and where did this happen?
-**Behavior:** What specifically did they do/say?
-**Impact:** What was the effect on you, the team, or the work?
-
-### Managing Emotions
-
-| If You Feel | Before Acting |
-| ----------- | ------------- |
-| Angry | Wait 24 hours, write but don't send |
-| Hurt | Talk to neutral party first |
-| Anxious | Practice the conversation |
-| Defensive | Identify your contribution |
-
-### When to Escalate
-
-Escalate when:
-
-- Safety is at risk
-- Legal issues involved
-- Repeated conversations haven't worked
-- Power dynamics prevent resolution
-- You need documentation
-
-## Conversation Types
-
-### Performance Feedback
-
-- Lead with specific examples
-- Connect to expectations/standards
-- Focus on future improvement
-- Offer support and resources
-
-### Conflict Resolution
-
-- Hear both sides separately first
-- Identify underlying interests
-- Look for win-win solutions
-- Document agreements
-
-### Sensitive Topics
-
-- Choose private, neutral setting
-- Allow time for processing
-- Be direct but compassionate
-- Respect confidentiality
-
-### Receiving Feedback
-
-- Thank them for feedback
-- Ask clarifying questions
-- Don't defend immediately
-- Reflect before responding
-
-## References (Load When Needed)
-
-### Detailed Frameworks
-
-- **[Conversation Framework](references/conversation-framework.md)**: Complete three-phase framework with scripts and examples
-- **[Preparation Template](references/preparation-template.md)**: Worksheet for preparing before difficult conversations
-- **[Delivery Scripts](references/delivery-scripts.md)**: Opening lines, response handling, reframing techniques
-- **[Emotional Regulation](references/emotional-regulation.md)**: Managing your own emotions before and during
-
-## See Also
-
-- `feedback-mastery` skill - SBI feedback model (overlaps but more feedback-focused)
-- `professional-effective-communication` skill - General communication patterns
-
-## Example Scenarios
-
-### Scenario 1: Addressing Missed Deadlines
-
-```markdown
-**Issue:** Team member missed 3 deadlines in past month
-**Impact:** Project delayed, others blocked
-**Goal:** Understand root cause, agree on prevention plan
-
-**Opening:** "I wanted to check in about the recent deliverables. I've noticed
-the last three have come in past deadline, and I'd like to understand what's
-happening and how we can address it together."
+```
+Is there a power differential (you are the manager)?
+  YES → Performance/termination track → load references/conversation-framework.md
+  NO  → Is this a pattern or a one-time incident?
+          PATTERN → Conflict/peer track → load references/delivery-scripts.md
+          ONE-TIME → Is it upward (to your manager)?
+                       YES → Upward feedback track → see Upward Feedback section below
+                       NO  → Sensitive topic track → load references/emotional-regulation.md
 ```
 
-### Scenario 2: Peer Conflict
+## Philosophy
 
-```markdown
-**Issue:** Colleague publicly criticized your work in meeting
-**Impact:** Embarrassed, trust damaged
-**Goal:** Address behavior, rebuild working relationship
+The hardest conversations are hard because both parties have a legitimate version of events. Your job is not to correct their version but to build a third version together — one specific enough to produce durable action.
 
-**Opening:** "I'd like to talk about what happened in yesterday's standup.
-When you said my code 'missed obvious issues,' I felt called out in front
-of the team. I'd like to understand your concerns and find a better way
-to handle code quality feedback."
-```
+## NEVER
 
-### Scenario 3: Asking Manager for Raise
+- **NEVER open with impact before the other person understands what behavior you're describing** — they'll spend the entire conversation defending their intention instead of hearing the impact. Behavior first, impact second, always.
+- **NEVER have a high-stakes conversation without a written BATNA** — not to use as leverage, but because without one your desperation leaks into your tone and the other person calibrates to it. Write down what you'll do if this goes nowhere.
+- **NEVER follow up a difficult conversation only once** — single follow-ups let people wait you out. Practitioners set a 3-week check-in at the close of the original conversation. Most regressions happen at week 2–3, not week 1.
+- **NEVER summarize the conversation immediately afterward in a way that positions you as correct** — written summaries ("per our discussion, you agreed to...") feel like legal filings. Instead, ask the other person to summarize what they're taking away. Their words reveal whether there's real alignment.
+- **NEVER conflate "they heard me" with "they agreed"** — acknowledgment ("I understand your concern") is not commitment. Close every conversation with a specific, owned action: who does what, by when, and how you'll both know it happened.
+- **NEVER raise a performance issue for the first time in a formal review** — the employee experiences it as an ambush, the feedback becomes legally and relationally contaminated, and you've lost 6–12 months of intervention time. Informal conversations must precede formal ones.
+- **NEVER use "we need to talk" as a meeting invite subject** — it triggers 48–72 hours of anxiety that arrives to your conversation as pre-loaded defensiveness. Name the topic neutrally: "checking in on Q3 deliverables."
 
-```markdown
-**Issue:** Feel underpaid relative to market/contribution
-**Impact:** Demotivation, considering leaving
-**Goal:** Discuss compensation, get timeline or adjustment
+## When Things Go Wrong
 
-**Opening:** "I'd like to discuss my compensation. I've been here two years,
-taken on the payments project leadership, and want to make sure my salary
-reflects my contributions and the current market."
-```
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| Other person shuts down or goes silent | Felt accused, not invited | Explicitly name what you're seeing: "I notice you've gone quiet — I want to make sure this feels safe to discuss." Then pause. Don't fill the silence. |
+| Conversation spirals into unrelated grievances | No shared agenda set upfront | Interrupt cleanly: "I want to hear all of this. Can we agree to finish this topic first and schedule time for the rest?" |
+| Agreed actions don't happen after the conversation | Agreement was vague or socially coerced | Return to the conversation with curiosity, not accusation. Ask what got in the way. Vague commitments usually mean the person didn't actually agree — they just ended the discomfort. |
+| Person becomes hostile or escalates | Conversation revealed a threat to their status/livelihood | De-escalate by naming it: "I can see this is hitting hard. I'm not here to threaten your position — I'm here because I want this to work." Then stop talking and let them respond. |
+| Your own emotions take over mid-conversation | Ambush response to something unexpected | Use the explicit pause: "I want to make sure I respond thoughtfully. Can I have a moment?" Practitioners normalize pausing; it signals self-awareness, not weakness. |
 
-## Anti-Patterns to Avoid
+## Reference Files (Load on Demand)
 
-### In Preparation
+| Reference | When to Load |
+|-----------|-------------|
+| `references/conversation-framework.md` | Full preparation workflow, fact/story separation exercise, contribution mapping |
+| `references/preparation-template.md` | Blank worksheet to fill out before the conversation; covers goal, BATNA, trigger anticipation |
+| `references/delivery-scripts.md` | Opening formulas by situation type, de-escalation phrases, closing scripts |
+| `references/emotional-regulation.md` | Pre-conversation regulation techniques; managing your own state during high-charge moments |
 
-- **Scripting every word** - You'll sound robotic; prepare themes, not scripts
-- **Building a case** - This isn't a trial; seek understanding, not winning
-- **Waiting too long** - Issues compound; address promptly
+Load the relevant reference when the user needs scripts, templates, or deep-dive frameworks. Do not reproduce reference content inline unless the user asks for it directly.
 
-### In Delivery
+## Upward Feedback (Special Case)
 
-- **Starting with "You always..."** - Triggers defensiveness immediately
-- **Burying the lead** - Get to the point; don't soften excessively
-- **Asking leading questions** - "Don't you think..." isn't asking
+Giving feedback to someone with power over you has a different risk profile. Three adjustments:
 
-### In Followup
+1. **Frame as impact on your effectiveness, not a critique of their behavior** — "When X happens, I find it harder to do Y" is harder to dismiss than "you do X."
+2. **Request, don't demand** — end with a question, not a directive: "Would you be open to trying Z?" gives them an exit that isn't losing face.
+3. **Accept that they may not change** — your goal is to have said it clearly once, not to fix them. If nothing changes, that's data for your own decisions.
 
-- **Forgetting to check in** - Without follow-up, nothing changes
-- **Holding grudges** - Issue resolved means relationship continues
-- **Over-documenting** - Not everything needs written record
+## Conversation Types Quick Reference
 
-## Success Metrics
+| Type | Primary Risk | Key Move |
+|------|-------------|----------|
+| Performance issue | They feel blindsided | Cite prior informal conversations; ask what they'd need to succeed |
+| Peer conflict | Both parties feel wronged | Separate fact from interpretation before the conversation, not during |
+| Termination | Emotional flooding | Short, clear, no hedging; deliver the decision in the first 60 seconds |
+| Salary/promotion | Negotiation feels adversarial | Anchor to market data + contribution, not personal need |
+| Upward feedback | Power retaliation | See Upward Feedback section above |
 
-A successful difficult conversation:
+## See Also
 
-- Both parties feel heard
-- Specific actions are agreed
-- Relationship is preserved or improved
-- The issue doesn't recur (or has clear escalation)
-- Neither party is blindsided later
+- `feedback-mastery` — SBI model deep dive, feedback calibration
+- `professional-communication` — written communication, email framing
diff --git a/skills/domain-name-brainstormer/SKILL.md b/skills/domain-name-brainstormer/SKILL.md
index b763c44..a07cbfe 100644
--- a/skills/domain-name-brainstormer/SKILL.md
+++ b/skills/domain-name-brainstormer/SKILL.md
@@ -1,212 +1,79 @@
 ---
 name: domain-name-brainstormer
-description: Generates creative domain name ideas for your project and checks availability across multiple TLDs (.com, .io, .dev, .ai, etc.). Saves hours of brainstorming and manual checking.
+description: Brainstorm and evaluate domain name candidates using phonaesthetics, trademark risk tiers, and TLD selection frameworks. Use when naming a startup, product, or brand — outputs a scored shortlist with radio-test results. Trigger phrases: "suggest domain names", "help me name my", "find a domain for", "domain brainstorm", "domain candidates".
 ---
 
 # Domain Name Brainstormer
 
-This skill helps you find the perfect domain name for your project by generating creative options and checking what's actually available to register.
+## Mindset
 
-## When to Use This Skill
+- **Distinctiveness is the primary constraint, not descriptiveness.** Generic descriptive names (.com almost certainly taken, trademark unregisterable). The best names are invented or out-of-field.
+- **Phonaesthetics predict memorability better than meaning.** A name that "snaps" when spoken (plosive consonants, front-stressed, 1–2 syllables) survives word-of-mouth better than a perfectly descriptive one.
+- **TLD choice signals audience, not just availability.** .com signals mainstream/trust; .dev signals developer tool; .ai signals the AI is the product (not a feature). Picking the wrong TLD causes brand positioning damage, not just aesthetics.
+- **Most "good" .coms are taken or premium-priced.** The real deliverable is a shortlist that maximizes likelihood of availability — invented words, portmanteaus, out-of-field vocabulary.
+- **Trademark risk is non-negotiable to surface.** A creative name that infringes Class 42 (software/SaaS) is worse than no name — it causes forced rebrand after launch.
 
-- Starting a new project or company
-- Launching a product or service
-- Creating a personal brand or portfolio site
-- Rebranding an existing project
-- Registering a domain for a side project
-- Finding available alternatives when your first choice is taken
+## Navigation
 
-## What This Skill Does
+**Use this skill when**: user needs domain name candidates for a startup, product, personal brand, or side project; user wants to evaluate name options they already have; user needs TLD guidance.
 
-1. **Understands Your Project**: Analyzes what you're building and who it's for
-2. **Generates Creative Names**: Creates relevant, memorable domain options
-3. **Checks Availability**: Verifies which domains are actually available across multiple TLDs
-4. **Multiple Extensions**: Suggests .com, .io, .dev, .ai, .app, and more
-5. **Provides Alternatives**: Offers variations if top choices are taken
-6. **Branding Insights**: Explains why certain names work well
-
-## How to Use
-
-### Basic Brainstorming
+**Do NOT use this skill when**: user needs live availability checking (Claude has no WHOIS/DNS access — that is always the user's next step at a registrar); user needs legal trademark clearance (surface risk tier, but defer to trademark attorney for Tier 2+).
 
+**Ambiguous input decision tree**:
 ```
-I'm building a project management tool for remote teams. 
-Suggest domain names.
+Is the product category clear?
+  NO → ask: "What does it do and who is the primary user?"
+  YES → Is the target audience global non-tech?
+          YES → constrain output to .com only
+          NO → use TLD decision tree (see references/naming-frameworks.md)
 ```
 
-```
-Help me brainstorm domain names for a personal finance app
-```
+## Philosophy
 
-### Specific Preferences
+Strong domain names are invented assets, not discovered ones. The workflow is: generate phonaesthetically strong candidates → score them → eliminate trademark risk → output a ranked shortlist with radio-test results. Availability is the user's verification step, never Claude's claim.
 
-```
-I need a domain name for my AI writing assistant. 
-Prefer short names with .ai or .io extension.
-```
+## NEVER
 
-### With Keywords
+- NEVER present any domain as available, registered, or taken — Claude has no WHOIS, DNS, or registrar access. Say "candidate" not "available".
+- NEVER suggest single common English words on .com without flagging "almost certainly taken or premium-priced ($500–$50,000+)" — these look like options but are traps that waste the user's time.
+- NEVER skip trademark risk classification — a name that infringes an existing SaaS trademark causes forced rebrand after launch, which costs orders of magnitude more than the domain.
+- NEVER recommend hyphens as a "workaround" for taken names — hyphens fail the radio test completely (users say "go to code dash box dot com?" and navigate wrong), and they signal spam to email filters.
+- NEVER suggest names with embedded numbers as primary choices — "4ward", "2fast" — they fail the radio test (is it the digit or the word?) and look amateurish on business cards.
+- NEVER recommend .xyz as a serious primary TLD to non-tech audiences — it carries strong spam/scam association from bulk registrations; it signals "cheap placeholder" to anyone over 30.
+- NEVER recommend country-code hacks (.ly, .me as word suffixes) as a primary domain without flagging country policy risk — Libya (.ly) has blocked/revoked domains for content violations; Bit.ly would be inaccessible if repeated.
 
-```
-Suggest domain names using the words "pixel" or "studio" 
-for my design agency
-```
-
-## Example
+## Output Format
 
-**User**: "I'm building a tool for developers to share code snippets. Suggest creative domain names."
-
-**Output**:
-```
-Analyzing your project...
-Target audience: Developers
-Key features: Code sharing, snippets, collaboration
+For each session, produce:
 
-🎯 Domain Name Suggestions
+**1. Analysis** (2–3 lines): category, audience, naming constraints surfaced.
 
-## Available (.com)
-1. ✓ snippetbox.com - Clear, memorable, available!
-   Why: Directly describes the product, easy to remember
+**2. Candidates table** (10–15 names):
+| Candidate | Pattern | Phonaesthetic Score /15 | Trademark Risk | Radio Test |
+|-----------|---------|------------------------|----------------|------------|
+| stripe.dev | dictionary/out-of-field | 14 | Tier 2 — verify class | PASS |
 
-2. ✓ codeclip.com - Short and snappy
-   Why: Implies quick code sharing, only 8 characters
-
-3. ✓ devpaste.com - Developer-focused
-   Why: Familiar to devs (like pastebin), clear purpose
-
-## Available (Alternative TLDs)
-4. ✓ snippet.dev - Perfect extension for developers
-   Why: .dev extension signals developer tool
-
-5. ✓ codebox.io - Tech-forward
-   Why: .io popular with tech startups, clean name
-
-6. ✓ sharecode.app - Descriptive
-   Why: .app extension, clearly states the action
-
-## Premium/Taken (.com)
-- codeshare.com (Taken, est. $2,500)
-- snippets.com (Taken, premium domain)
-
-## Recommendations
-
-🏆 Top Pick: snippet.dev
-- Perfect for developer audience
-- Short and memorable  
-- .dev extension adds credibility
-- Available now!
-
-🥈 Runner-up: snippetbox.com
-- .com is universally recognized
-- Slightly longer but descriptive
-- Great brandability
-
-Next steps:
-1. Register your favorite before someone else does!
-2. Want me to check more variations?
-3. Need help with logo ideas for these names?
-```
+**3. Top 3 with reasoning**: why each works across distinctiveness, memorability, and brand fit.
 
-**Inspired by:** Ben Aiad's use case from Lenny's Newsletter
+**4. Next steps** (always): "Verify candidates at Namecheap/GoDaddy/Porkbun. Check trademark at USPTO TESS (tess.uspto.gov) for Tier 2+ names."
 
-## Domain Naming Tips
+## Naming Frameworks
 
-### What Makes a Good Domain
-
-✓ **Short**: Under 15 characters ideal
-✓ **Memorable**: Easy to recall and spell
-✓ **Pronounceable**: Can be said in conversation
-✓ **Descriptive**: Hints at what you do
-✓ **Brandable**: Unique enough to stand out
-✓ **No hyphens**: Easier to share verbally
-
-### TLD Guide
-
-- **.com**: Universal, trusted, great for businesses
-- **.io**: Tech startups, developer tools
-- **.dev**: Developer-focused products
-- **.ai**: AI/ML products
-- **.app**: Mobile or web applications
-- **.co**: Alternative to .com
-- **.xyz**: Modern, creative projects
-- **.design**: Creative/design agencies
-- **.tech**: Technology companies
-
-## Advanced Features
-
-### Check Similar Variations
-
-```
-Check availability for "codebase" and similar variations 
-across .com, .io, .dev
-```
-
-### Industry-Specific
-
-```
-Suggest domain names for a sustainable fashion brand, 
-checking .eco and .fashion TLDs
-```
-
-### Multilingual Options
-
-```
-Brainstorm domain names in English and Spanish for 
-a language learning app
-```
-
-### Competitor Analysis
-
-```
-Show me domain patterns used by successful project 
-management tools, then suggest similar available ones
-```
+See `references/naming-frameworks.md` for:
+- Phonaesthetic scoring rubric (5 axes)
+- Trademark risk tier definitions (Tier 1–4)
+- TLD selection decision tree
+- Word construction patterns ranked by distinctiveness
+- Radio test protocol
+- Squatter pattern recognition
+- Negative space naming technique
 
-## Example Workflows
-
-### Startup Launch
-1. Describe your startup idea
-2. Get 10-15 domain suggestions across TLDs
-3. Review availability and pricing
-4. Pick top 3 favorites
-5. Register immediately
-
-### Personal Brand
-1. Share your name and profession
-2. Get variations (firstname.com, firstnamelastname.dev, etc.)
-3. Check social media handle availability too
-4. Register consistent brand across platforms
-
-### Product Naming
-1. Describe product and target market
-2. Get creative, brandable names
-3. Check trademark conflicts
-4. Verify domain and social availability
-5. Test names with target audience
-
-## Tips for Success
-
-1. **Act Fast**: Good domains get taken quickly
-2. **Register Variations**: Get .com and .io to protect brand
-3. **Avoid Numbers**: Hard to communicate verbally
-4. **Check Social Media**: Make sure @username is available too
-5. **Say It Out Loud**: Test if it's easy to pronounce
-6. **Check Trademarks**: Ensure no legal conflicts
-7. **Think Long-term**: Will it still make sense in 5 years?
-
-## Pricing Context
-
-When suggesting domains, I'll note:
-- Standard domains: ~$10-15/year
-- Premium TLDs (.io, .ai): ~$30-50/year
-- Taken domains: Market price if listed
-- Premium domains: $hundreds to $thousands
-
-## Related Tools
-
-After picking a domain:
-- Check logo design options
-- Verify social media handles
-- Research trademark availability
-- Plan brand identity colors/fonts
+## When Things Go Wrong
 
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| User says "all my candidates are taken" | Suggestions leaned on common words + .com | Pivot to portmanteau/invented words; offer .io/.dev alternatives; ask what budget they have for premium .com |
+| User wants a .com of a two-word phrase | Almost certainly parked or $1,000+ | Acknowledge directly; generate portmanteau alternatives + offer .io/.dev at $10–15/yr |
+| User has a name and wants domain variants | Don't re-brainstorm — evaluate the name they have | Run phonaesthetic score + trademark risk + radio test on their name; suggest TLD variants |
+| User wants to know if a specific domain is available | Claude cannot check | Say clearly: "I can't check availability — go to porkbun.com or namecheap.com and search directly." |
+| User's project is AI-related and they want .ai | Surface cost ($60–100/yr, Anguilla registry) and policy risk | Recommend .ai only if AI is the core product, not a feature; offer .dev as fallback |
diff --git a/skills/domain-name-brainstormer/references/naming-frameworks.md b/skills/domain-name-brainstormer/references/naming-frameworks.md
new file mode 100644
index 0000000..ae943a9
--- /dev/null
+++ b/skills/domain-name-brainstormer/references/naming-frameworks.md
@@ -0,0 +1,83 @@
+# Domain Naming Frameworks Reference
+
+## Phonaesthetic Scoring (apply to every candidate)
+
+Rate each name 1–3 on each axis; discard anything scoring ≤5 total:
+
+| Axis | 1 (weak) | 2 (ok) | 3 (strong) |
+|------|----------|--------|------------|
+| **Plosive density** | no plosives | 1 plosive | 2+ plosives (b, d, k, p, t) — "snap" factor |
+| **Vowel-consonant rhythm** | CCCC or VVVV clusters | minor cluster | alternating VC/CV pattern |
+| **Stress on first syllable** | stress buried mid-word | borderline | naturally front-stressed |
+| **Back-of-mouth resonance** | flat, nasal-heavy | mixed | rounded vowels (o, oo, u) give trust feel |
+| **Brevity** | 4+ syllables | 3 syllables | 1–2 syllables |
+
+Examples: "Stripe" (3+3+3+2+3=14), "Notion" (2+3+3+2+3=13), "Figma" (3+3+2+2+3=13).
+
+## Trademark Risk Tiers
+
+Before presenting a name, mentally classify it:
+
+- **Tier 1 (low risk)**: Invented/portmanteau with no dictionary root — "Zuora", "Twilio". These clear trademark search most often.
+- **Tier 2 (medium)**: Dictionary word applied out-of-field — "Stripe" (payments ≠ stripe patterns). Registerable but must verify class overlap.
+- **Tier 3 (high risk)**: Common word in same field — "FastSend" for a shipping tool. Highly likely someone already holds Class 38/42 mark.
+- **Tier 4 (instant reject)**: Sounds like or contains letters from a famous mark — "Goooglebox", "Facebooks". USPTO rejection guaranteed.
+
+Surface risk tier in your output: if Tier 3 or 4, say so explicitly and suggest a portmanteau alternative.
+
+## TLD Selection Decision Tree
+
+```
+Is audience global non-tech?
+  YES → .com only (no exceptions; audience doesn't understand TLD nuance)
+  NO → continue
+
+Is product developer-facing?
+  YES → .dev first, .io second
+  NO → continue
+
+Is AI/ML the core product, not just a feature?
+  YES → .ai (but budget $60-100/yr; Anguilla registrar)
+  NO → continue
+
+Is it a mobile app with no web identity?
+  YES → .app (Google Registry, requires HTTPS enforced by registry)
+  NO → .com or category TLD (.design, .health, .legal etc.)
+```
+
+## Word Construction Patterns (ranked by distinctiveness)
+
+1. **Portmanteau** — merge two words, drop overlapping phonemes: "Pinterest" (pin + interest), "Instagram" (instant + telegram). Highest trademark clearance rate.
+2. **Clipped compound** — take the first 3-4 chars of each word: "FedEx" (Federal + Express). Works best when both roots are widely known.
+3. **Back-formation** — treat the domain as a verb: "Slack", "Zoom", "Figma". Requires invented or repurposed word.
+4. **Latinate root** — add -ly, -fy, -io, -era suffix to a concrete concept: "Clarity", "Notify", "Zapier". Medium distinctiveness.
+5. **Pure dictionary word (out-of-field)** — high risk of .com squatting; requires premium budget.
+
+## The "Radio Test"
+
+Imagine someone says your domain name on a podcast with no visual aid. Ask:
+1. Can the listener spell it correctly on first attempt?
+2. Is there zero ambiguity on where the word boundaries are?
+3. Does the TLD sound natural when spoken aloud? ("dot-dev" — yes; "dot-xyz" — weird pause)
+
+If any answer is NO, flag it in output with "fails radio test — consider [alternative]".
+
+## Squatter Pattern Recognition
+
+Names that are almost certainly parked or premium-priced:
+- Single common English word + .com (>99% taken)
+- Two-word compound describing a common business activity + .com (>95% taken)
+- Any 3-letter .com (all registered since ~2000)
+- Any 4-letter pronounceable .com (>98% taken)
+- Country-code hacks (.ly, .me, .io used as word suffix) — legal risk if country policy changes
+
+Always offer a .io or .dev fallback when suggesting these patterns.
+
+## Negative Space Naming
+
+Some of the strongest brand names describe what the product *removes*, not what it does:
+- "Notion" (implies disorder removed)
+- "Calm" (implies anxiety removed)
+- "Clear" (implies confusion removed)
+
+This pattern tends to produce available, trademark-clear names because it's indirect. Use when the product solves a pain point (vs. adds a feature).
diff --git a/skills/draw-io/SKILL.md b/skills/draw-io/SKILL.md
index 7df85f3..75ae5a0 100644
--- a/skills/draw-io/SKILL.md
+++ b/skills/draw-io/SKILL.md
@@ -1,278 +1,176 @@
 ---
 name: draw-io
-description: draw.io diagram creation, editing, and review. Use for .drawio XML editing, PNG conversion, layout adjustment, and AWS icon usage.
+description: Create, edit, review, and convert draw.io diagrams (.drawio XML). Use for AWS architecture diagrams, flowcharts, layout adjustment, PNG export, icon selection, and Quarto/reveal.js slide integration. Triggers on: "draw a diagram", "create architecture diagram", "edit drawio", "add AWS icon", "convert to PNG", "fix layout", "drawio XML".
 ---
 
 # draw.io Diagram Skill
 
-## 1. Basic Rules
+## Mindset
 
-- Edit only `.drawio` files
-- Do not directly edit `.drawio.png` files
-- Use auto-generated `.drawio.png` by pre-commit hook in slides
+1. **XML-first thinking**: draw.io files are plain XML — read and write them as structured data, not as visual canvas operations. Every position, style, and connection is deterministic from the XML.
+2. **Layer ordering is load order**: Elements render in document order — put arrows before boxes, not after, or arrows will overdraw icons. This is non-obvious because draw.io GUI handles z-order separately.
+3. **Verify with PNG, not XML**: Layout bugs (overflow, label collision, arrow penetration) are invisible in XML. Always run conversion and inspect the raster output before declaring done.
+4. **Icon type is a decision, not a guess**: resourceIcon vs productIcon vs generic shape have different geometry contracts. Choosing wrong produces misaligned or invisible icons. Use the selector table below.
+5. **Margins are asymmetric**: 30px is the minimum inner margin for straight edges, but rounded corners (`rounded=1`) consume extra space — add strokeWidth × 2 to your margin calculations.
 
-## 2. Font Settings
+## Navigation
 
-For diagrams used in Quarto slides,
-specify `defaultFontFamily` in mxGraphModel tag:
+### Use this skill when
+- Creating or editing `.drawio` XML files
+- Selecting the right AWS icon style (resource vs product vs group)
+- Converting diagrams to PNG for slides or documentation
+- Debugging element overflow, arrow overlap, or label collision
+- Setting up consistent fonts for Quarto / reveal.js presentations
 
-```xml
-<mxGraphModel defaultFontFamily="Noto Sans JP" ...>
-```
+### Do NOT use this skill when
+- The target format is Mermaid, PlantUML, or Excalidraw — use the matching skill
+- You need a diagram embedded directly in Markdown without a separate file
+- The goal is purely a sequence diagram — Mermaid sequence syntax is faster
 
-Also explicitly specify `fontFamily` in each text element's style attribute:
+### Icon-Type Decision Tree
 
-```xml
-style="text;html=1;fontSize=27;fontFamily=Noto Sans JP;"
-```
+| Scenario | Use | Style Fragment |
+|---|---|---|
+| Single AWS service (EC2, Lambda, S3, RDS…) | resourceIcon | `shape=mxgraph.aws4.resourceIcon;resIcon=mxgraph.aws4.{name}` |
+| AWS service product tile (larger, with label strip) | productIcon | `shape=mxgraph.aws4.productIcon;prIcon=mxgraph.aws4.{name}` |
+| VPC / subnet / availability zone boundary box | Group shape | `shape=mxgraph.aws4.group;grIcon=mxgraph.aws4.group_{name}` |
+| Non-AWS box (database, user, external system) | Generic shape | `rounded=1;whiteSpace=wrap;html=1` |
+| AWS Cloud outer boundary | AWS Cloud group | `shape=mxgraph.aws4.group;grIcon=mxgraph.aws4.group_aws_cloud_alt` |
 
-## 3. Conversion Commands
+Standard resourceIcon size: 60×60px. productIcon: 80×80px. Group boxes: size to contents.
 
-See conversion script at [scripts/convert-drawio-to-png.sh](scripts/convert-drawio-to-png.sh).
+## Philosophy
+
+draw.io diagrams are source code: they must be deterministic, diff-able, and reproducible. Treat every XML edit with the same discipline as code — no magic numbers, no visual tweaks without understanding the coordinate system, and always verify the rendered output.
+
+## NEVER
+
+- **NEVER edit `.drawio.png` files directly** — they are auto-generated artifacts; edits are silently overwritten on next conversion and the source XML diverges from the rendered image.
+- **NEVER use `mxgraph.aws3.*` icons** — the aws3 set is deprecated and renders placeholder boxes in current draw.io versions; always use `mxgraph.aws4.*`.
+- **NEVER rely on `exitX`/`exitY` connection points on text elements** — text cells have no port geometry; the connector snaps to the bounding box corner unpredictably. Use explicit `mxPoint sourcePoint`/`targetPoint` coordinates instead.
+- **NEVER set `background="#ffffff"` on the mxGraphModel** — white backgrounds break dark-theme slides and PDFs; omit the attribute or set `page="0"` to get transparent output.
+- **NEVER place arrows after icon/box elements in XML** — draw.io renders in document order; arrows declared after shapes render on top of icons, obscuring them. Arrows must appear immediately after the title cell.
+- **NEVER assume 30px margin is enough inside a rounded container** — `rounded=1` with `strokeWidth=3` eats ~9px of visual space on each side; use `frameMargin = 30 + (strokeWidth × 2)` as your actual inner boundary.
+- **NEVER abbreviate AWS service names in labels** — use "Amazon ECS", not "ECS"; "AWS Lambda", not "Lambda". Official names are required for compliance diagrams and searchability.
+
+## Conversion Commands
 
 ```sh
-# Convert all .drawio files
+# Convert all .drawio files via pre-commit hook
 mise exec -- pre-commit run --all-files
 
-# Convert specific .drawio file
+# Convert specific file
 mise exec -- pre-commit run convert-drawio-to-png --files assets/my-diagram.drawio
 
-# Run script directly (using skill's script)
-bash ~/.claude/skills/draw-io/scripts/convert-drawio-to-png.sh assets/diagram1.drawio
+# Run conversion script directly
+bash ~/.claude/skills/draw-io/scripts/convert-drawio-to-png.sh assets/diagram.drawio
 ```
 
-Internal command used:
-
-```sh
-drawio -x -f png -s 2 -t -o output.drawio.png input.drawio
-```
+Internal drawio CLI flags used:
 
-| Option | Description |
-|--------|-------------|
+| Option | Effect |
+|---|---|
 | `-x` | Export mode |
-| `-f png` | PNG format output |
-| `-s 2` | 2x scale (high resolution) |
+| `-f png` | PNG output format |
+| `-s 2` | 2× scale (high resolution) |
 | `-t` | Transparent background |
-| `-o` | Output file path |
-
-## 4. Layout Adjustment
-
-### 4.1. Coordinate Adjustment Steps
-
-1. Open `.drawio` file in text editor (plain XML format)
-2. Find `mxCell` for element to adjust (search by `value` attribute for text)
-3. Adjust coordinates in `mxGeometry` tag
-   - `x`: Position from left
-   - `y`: Position from top
-   - `width`: Width
-   - `height`: Height
-4. Run conversion and verify
-
-### 4.2. Coordinate Calculation
-
-- Element center coordinate = `y + (height / 2)`
-- To align multiple elements, calculate and match center coordinates
-
-## 5. Design Principles
-
-### 5.1. Basic Principles
-
-- Clarity: Create simple, visually clean diagrams
-- Consistency: Unify colors, fonts, icon sizes, line thickness
-- Accuracy: Do not sacrifice accuracy for simplification
+| `-o` | Output path |
 
-### 5.2. Element Rules
+## Font Settings
 
-- Label all elements
-- Use arrows to indicate direction
-  (prefer 2 unidirectional arrows over bidirectional)
-- Use latest official icons
-- Add legend to explain custom symbols
-
-### 5.3. Accessibility
-
-- Ensure sufficient color contrast
-- Use patterns in addition to colors
-
-### 5.4. Progressive Disclosure
-
-Separate complex systems into staged diagrams:
-
-| Diagram Type | Purpose |
-|--------------|---------|
-| Context Diagram | System overview from external perspective |
-| System Diagram | Main components and relationships |
-| Component Diagram | Technical details and integration points |
-| Deployment Diagram | Infrastructure configuration |
-| Data Flow Diagram | Data flow and transformation |
-| Sequence Diagram | Time-series interactions |
-
-### 5.5. Metadata
-
-Include title, description, last updated, author, and version in diagrams.
-
-## 6. Best Practices
-
-### 6.1. Background Color
-
-- Remove `background="#ffffff"`
-- Transparent background adapts to various themes
-
-### 6.2. Font Size
-
-- Use 1.5x standard font size (around 18px) for PDF readability
-
-### 6.3. Japanese Text Width
-
-- Allow 30-40px per character
-- Insufficient width causes unintended line breaks
+For Quarto / reveal.js slides, set font at both model and element level:
 
 ```xml
-<!-- For 10-character text, allow 300-400px -->
-<mxGeometry x="140" y="60" width="400" height="40" />
+<mxGraphModel defaultFontFamily="Noto Sans JP" ...>
+  ...
+  <mxCell style="text;html=1;fontSize=27;fontFamily=Noto Sans JP;" .../>
 ```
 
-### 6.4. Arrow Placement
+Font size: use 1.5× standard (≈18px minimum; 27px for slide titles). Japanese text: allow 30–40px per character width.
 
-- Always place arrows at back (position in XML right after Title)
-- Position arrows to avoid overlapping with labels
-- Keep arrow start/end at least 20px from label bottom edge
+## Layout Rules
 
-```xml
-<!-- Title -->
-<mxCell id="title" value="..." .../>
-
-<!-- Arrows (back layer) -->
-<mxCell id="arrow1" style="edgeStyle=..." .../>
+### Coordinate System
+- `x`, `y` = top-left corner of element bounding box
+- Element center = `y + height/2`
+- To vertically align two elements: set their center values equal
 
-<!-- Other elements (front layer) -->
-<mxCell id="box1" .../>
+### Container Margins
+```text
+Effective inner boundary = containerY + 30 + (strokeWidth × 2)
+                          to
+                          containerY + containerHeight - 30 - (strokeWidth × 2)
 ```
 
-### 6.5. Arrow Connection to Text Labels
-
-For text elements, exitX/exitY don't work, so use explicit coordinates:
-
+### Arrow Placement (XML Order)
 ```xml
-<!-- Good: Explicit coordinates with sourcePoint/targetPoint -->
-<mxCell id="arrow" style="..." edge="1" parent="1">
+<!-- 1. Title -->
+<mxCell id="title" value="Diagram Title" .../>
+
+<!-- 2. Arrows — MUST come before icons/boxes -->
+<mxCell id="arrow1" style="edgeStyle=orthogonalEdgeStyle;" edge="1" ...>
   <mxGeometry relative="1" as="geometry">
-    <mxPoint x="1279" y="500" as="sourcePoint"/>
-    <mxPoint x="119" y="500" as="targetPoint"/>
-    <Array as="points">
-      <mxPoint x="1279" y="560"/>
-      <mxPoint x="119" y="560"/>
-    </Array>
+    <mxPoint x="200" y="150" as="sourcePoint"/>
+    <mxPoint x="500" y="150" as="targetPoint"/>
   </mxGeometry>
 </mxCell>
-```
 
-### 6.6. edgeLabel Offset Adjustment
-
-Adjust offset attribute to distance arrow labels from arrows:
+<!-- 3. Icons and boxes (rendered on top of arrows) -->
+<mxCell id="ec2" style="shape=mxgraph.aws4.resourceIcon;resIcon=mxgraph.aws4.ec2" .../>
+```
 
+### Edge Label Offset
 ```xml
-<!-- Place above arrow (negative value to distance) -->
+<!-- Label above arrow -->
 <mxPoint x="0" y="-40" as="offset"/>
 
-<!-- Place below arrow (positive value to distance) -->
+<!-- Label below arrow -->
 <mxPoint x="0" y="40" as="offset"/>
 ```
 
-### 6.7. Remove Unnecessary Elements
-
-- Remove decorative icons irrelevant to context
-- Example: If ECR exists, separate Docker icon is unnecessary
-
-### 6.8. Labels and Headings
-
-- Service name only: 1 line
-- Service name + supplementary info: 2 lines with line break
-- Redundant notation (e.g., ECR Container Registry): shorten to 1 line
-- Use `&lt;br&gt;` tag for line breaks
-
-### 6.9. Background Frame and Internal Element Placement
-
-When placing elements inside background frames (grouping boxes),
-ensure sufficient margin.
-
-- YOU MUST: Internal elements must have at least 30px margin from frame boundary
-- YOU MUST: Account for rounded corners (`rounded=1`) and stroke width
-- YOU MUST: Always visually verify PNG output for overflow
-
-Coordinate calculation verification:
-
-```text
-Background frame: y=20, height=400 -> range is y=20-420
-Internal element top: frame y + 30 or more (e.g., y=50)
-Internal element bottom: frame y + height - 30 or less (e.g., up to y=390)
-```
-
-Bad example (may overflow):
-
-```xml
-<!-- Background frame -->
-<mxCell id="bg" style="rounded=1;strokeWidth=3;...">
-  <mxGeometry x="500" y="20" width="560" height="400" />
-</mxCell>
-<!-- Text: y=30 is too close to frame top (y=20) -->
-<mxCell id="label" value="Title" style="text;...">
-  <mxGeometry x="510" y="30" width="540" height="35" />
-</mxCell>
-```
-
-Good example (sufficient margin):
-
-```xml
-<!-- Background frame -->
-<mxCell id="bg" style="rounded=1;strokeWidth=3;...">
-  <mxGeometry x="500" y="20" width="560" height="430" />
-</mxCell>
-<!-- Text: y=50 is 30px from frame top (y=20) -->
-<mxCell id="label" value="Title" style="text;...">
-  <mxGeometry x="510" y="50" width="540" height="35" />
-</mxCell>
-```
-
-## 7. Reference
-
-- [Layout Guidelines](references/layout-guidelines.md)
-- [AWS Icons](references/aws-icons.md)
-- [AWS Icon Search Script](scripts/find_aws_icon.py)
-
-AWS icon search examples:
+## AWS Icon Reference
 
+Search for icons:
 ```sh
 python ~/.claude/skills/draw-io/scripts/find_aws_icon.py ec2
 python ~/.claude/skills/draw-io/scripts/find_aws_icon.py lambda
 ```
 
-## 8. Checklist
-
-- [ ] No background color set (page="0")
-- [ ] Font size appropriate (larger recommended)
-- [ ] Arrows placed at back layer
-- [ ] Arrows not overlapping labels (verify in PNG)
-- [ ] Arrow start/end sufficiently distant from labels (at least 20px)
-- [ ] Arrows not penetrating boxes or icons (verify in PNG)
-- [ ] Internal elements not overflowing background frame (verify in PNG)
-- [ ] 30px+ margin between background frame and internal elements
-- [ ] AWS service names are official names/correct abbreviations
-- [ ] AWS icons are latest version (mxgraph.aws4.*)
-- [ ] No unnecessary elements remaining
-- [ ] Visually verified PNG conversion
-
-## 9. Image Display in reveal.js Slides
-
-Add `auto-stretch: false` to YAML header:
-
-```yaml
----
-title: "Your Presentation"
-format:
-  revealjs:
-    auto-stretch: false
----
-```
-
-This ensures correct image display on mobile devices.
+Full catalog: [references/aws-icons.md](references/aws-icons.md)
+Layout rules in depth: [references/layout-guidelines.md](references/layout-guidelines.md)
+
+## Diagram Type Selector
+
+| When the user asks for… | Recommend this diagram type |
+|---|---|
+| "How does the system work?" | Context Diagram |
+| "What are the main components?" | System Diagram |
+| "How are services wired together?" | Component Diagram |
+| "Where does it run / what regions?" | Deployment Diagram |
+| "How does data move?" | Data Flow Diagram |
+| "What happens step by step?" | Sequence Diagram |
+
+## When Things Go Wrong
+
+| Symptom | Likely Cause | Fix |
+|---|---|---|
+| Icon renders as empty box or placeholder | Using `mxgraph.aws3.*` (deprecated) | Replace with `mxgraph.aws4.*` equivalent |
+| Arrow appears on top of icons, obscuring them | Arrow element declared after icon cells in XML | Move arrow `mxCell` entries above icon cells |
+| Text overflows container boundary in PNG | Margin calculation ignored strokeWidth | Add `strokeWidth × 2` to 30px base margin |
+| Connector endpoint snaps to wrong position | Using `exitX`/`exitY` on a text element | Switch to explicit `mxPoint sourcePoint`/`targetPoint` |
+| PNG background is white instead of transparent | `background="#ffffff"` in mxGraphModel | Remove attribute or set `page="0"` |
+| Image too small / blurry in reveal.js | `auto-stretch` not disabled | Add `auto-stretch: false` to slide YAML header |
+| Font differs between elements | `fontFamily` not set per-cell, only on model | Set `fontFamily=Noto Sans JP` in each cell's style |
+
+## Quality Checklist
+
+- [ ] No `background="#ffffff"` (transparent, `page="0"`)
+- [ ] Font size ≥ 18px; `fontFamily` set on model and each text cell
+- [ ] All arrows declared before icon/box cells in XML
+- [ ] Arrow endpoints 20px+ clear of label bounding boxes
+- [ ] Internal elements: 30px + (strokeWidth × 2) margin from container edges
+- [ ] AWS icons use `mxgraph.aws4.*` (not aws3)
+- [ ] AWS service labels use official full names
+- [ ] No redundant decorative icons
+- [ ] PNG conversion run and visually inspected
+- [ ] reveal.js: `auto-stretch: false` in YAML if used in slides
diff --git a/skills/excalidraw/SKILL.md b/skills/excalidraw/SKILL.md
index a0b9fa0..aeb06c0 100644
--- a/skills/excalidraw/SKILL.md
+++ b/skills/excalidraw/SKILL.md
@@ -1,221 +1,90 @@
 ---
 name: excalidraw
-description: "Use when working with *.excalidraw or *.excalidraw.json files, user mentions diagrams/flowcharts, or requests architecture visualization - delegates all Excalidraw operations to subagents to prevent context exhaustion from verbose JSON (single files: 4k-22k tokens, can exceed read limits)"
+description: "Use when working with *.excalidraw or *.excalidraw.json files, user requests diagram creation/editing/explanation, mentions 'flowchart', 'architecture diagram', or 'Excalidraw' — delegates all file operations to subagents to prevent context exhaustion (single files: 4k–22k tokens)"
 ---
 
-# Excalidraw Subagent Delegation
+## Mindset
 
-## Overview
+- Excalidraw files are **verbosity traps**: 79 elements × ~280 tokens/element = 22k tokens, but only `text` and `arrow` elements carry semantic content (~10% of file). The other 90% is visual noise.
+- **Subagent isolation is free.** A subagent reading a 22k-token file costs nothing to the main context — only its summary (~500 tokens) crosses the boundary. This is always the right trade.
+- The rationalization "I'll just quickly check the file" is the most common failure mode. "Quick checks" load the full JSON regardless of intent.
+- Token cost comes from volume, not complexity. A "simple, straightforward" Excalidraw file is still 4k–22k tokens.
+- Frame elements are grouping hints — subagents should use them to chunk their summaries into logical sections rather than flat component lists.
 
-**Core principle:** Main agents NEVER read Excalidraw files directly. Always delegate to subagents to isolate context consumption.
+## Navigation
 
-Excalidraw files are JSON with high token cost but low information density. Single files range from 4k-22k tokens (largest can exceed read tool limits). Reading multiple diagrams quickly exhausts context budget (7 files = 67k tokens = 33% of budget).
+**Use this skill when:**
+- File path ends in `.excalidraw` or `.excalidraw.json`
+- User says: explain/update/create/compare diagram, show architecture, visualize flow, draw a flowchart
+- Any operation touching Excalidraw files, including "just checking what's in" one
 
-## The Problem
+**Do NOT use this skill when:**
+- User is asking about Mermaid, draw.io, Lucidchart, or other diagram formats
+- User wants to generate a diagram concept only (no file I/O) — then just produce the design description
 
-Excalidraw JSON structure:
-- Each shape has 20+ properties (x, y, width, height, strokeColor, seed, version, etc.)
-- Most properties are visual metadata (positioning, styling, roughness)
-- Actual content: text labels and element relationships (<10% of file)
-- **Signal-to-noise ratio is extremely low**
-
-Example: 14-element diagram = 596 lines, 16K, ~4k tokens. 79-element diagram = 2,916 lines, 88K, ~22k tokens (exceeds read limit).
-
-## When to Use
-
-**Trigger on ANY of these:**
-- File path contains `.excalidraw` or `.excalidraw.json`
-- User requests: "explain/update/create diagram", "show architecture", "visualize flow"
-- User mentions: "flowchart", "architecture diagram", "Excalidraw file"
-- Architecture/design documentation tasks involving visual artifacts
-
-**Use delegation even for:**
-- "Small" files (smallest is 4k tokens - still significant)
-- "Quick checks" (checking component names still loads full JSON)
-- Single file operations (isolation prevents context pollution)
-- Modifications (don't need full format understanding in main context)
-
-## Delegation Pattern
-
-### Main Agent Responsibilities
-
-**NEVER:**
-- ❌ Use Read tool on *.excalidraw files
-- ❌ Parse Excalidraw JSON in main context
-- ❌ Load multiple diagrams for comparison
-- ❌ Inspect file to "understand the format"
-
-**ALWAYS:**
-- ✅ Delegate ALL Excalidraw operations to subagents
-- ✅ Provide clear task description to subagent
-- ✅ Request text-only summaries (not raw JSON)
-- ✅ Keep diagram analysis isolated from main work
-
-### Subagent Task Templates
-
-#### Read/Understand Operation
-```
-Task: Extract and explain the components in [file.excalidraw.json]
-
-Approach:
-1. Read the Excalidraw JSON
-2. Extract only text elements (ignore positioning/styling)
-3. Identify relationships between components
-4. Summarize architecture/flow
-
-Return:
-- List of components/services with descriptions
-- Connection/dependency relationships
-- Key insights about the architecture
-- DO NOT return raw JSON or verbose element details
-```
-
-#### Modify Operation
-```
-Task: Add [component] to [file.excalidraw.json], connected to [existing-component]
-
-Approach:
-1. Read file to identify existing elements
-2. Find [existing-component] and its position
-3. Create new element JSON for [component]
-4. Add arrow elements for connections
-5. Write updated file
-
-Return:
-- Confirmation of changes made
-- Position of new element
-- IDs of created elements
-```
-
-#### Create Operation
-```
-Task: Create new Excalidraw diagram showing [description]
-
-Approach:
-1. Design layout for [number] components
-2. Create rectangle elements with text labels
-3. Add arrows showing relationships
-4. Use consistent styling (colors, fonts)
-5. Write to [file.excalidraw.json]
-
-Return:
-- Confirmation of file created
-- Summary of components included
-- File location
-```
-
-#### Compare Operation
+**Decision tree:**
 ```
-Task: Compare architecture approaches in [file1] vs [file2]
-
-Approach:
-1. Read both files
-2. Extract text labels from each
-3. Identify structural differences
-4. Compare component relationships
-
-Return:
-- Key differences in architecture
-- Components unique to each approach
-- Relationship/flow differences
-- DO NOT return full element details from both files
+Does the task touch a *.excalidraw file?
+  YES → delegate to subagent (no exceptions)
+  NO  → does it involve creating a new diagram?
+          YES → delegate creation to subagent, get back file path + summary
+          NO  → this skill is not needed
 ```
 
-## Common Rationalizations (STOP and Delegate Instead)
-
-| Excuse | Reality | What to Do |
-|--------|---------|------------|
-| "Direct reading is most efficient" | Consumes 4k-22k tokens unnecessarily | Delegate to subagent |
-| "It's token-efficient to read directly" | Baseline tests showed 9-45% budget used | Always delegate |
-| "This is optimal for one-time analysis" | "One-time" still pollutes main context | Subagent isolation |
-| "The JSON is straightforward" | Simplicity ≠ token efficiency | Delegate anyway |
-| "I need to understand the format" | Format understanding not needed in main agent | Subagent handles format |
-| "Within reasonable bounds" (18k tokens) | "Reasonable" is subjective rationalization | Hard rule: delegate |
-| "Just a quick check of components" | "Quick check" still loads full JSON | Extract text via subagent |
-| "File is small (16K)" | 4k tokens is NOT small | Size threshold doesn't matter |
-
-## Red Flags - STOP and Delegate
-
-Catch yourself about to:
-- Use Read tool on .excalidraw file
-- "Quickly check" what components exist
-- "Understand the structure" before modifying
-- Load file to "see what's there"
-- Compare multiple diagrams side-by-side
-- Parse JSON to "extract just the text"
-
-**All of these mean: Use Task tool with subagent instead.**
+## Philosophy
 
-## Quick Reference
-
-| Operation | Main Agent Action | Subagent Returns |
-|-----------|-------------------|------------------|
-| **Understand diagram** | Delegate with "Extract and explain" template | Component list + relationships |
-| **Modify diagram** | Delegate with "Add [X] connected to [Y]" template | Confirmation + changes made |
-| **Create diagram** | Delegate with "Create showing [description]" template | File location + summary |
-| **Compare diagrams** | Delegate with "Compare [A] vs [B]" template | Key differences (not raw JSON) |
-
-## Token Analysis (Why This Matters)
+Main agents are orchestrators, not file parsers. Every token spent parsing Excalidraw JSON is a token stolen from reasoning. Subagents are disposable context workers — use them for all file I/O and receive only the extracted meaning.
 
-Real data from baseline testing:
+## NEVER
 
-| Scenario | Without Delegation | With Delegation | Savings |
-|----------|-------------------|-----------------|---------|
-| Single large file | 22k tokens (45% budget) | ~500 tokens (subagent summary) | 98% |
-| Two-file comparison | 18k tokens (9% budget) | ~800 tokens (diff summary) | 96% |
-| Modification task | 14k tokens (7% budget) | ~300 tokens (confirmation) | 98% |
+- NEVER use the Read tool on a `.excalidraw` or `.excalidraw.json` file — because the read succeeds silently and consumes 4k–22k tokens before you realize it; there is no size warning.
+- NEVER rationalize reading a file as "one-time" or "just this once" — because every session starts fresh, so every read is "just this once"; the exception becomes the pattern.
+- NEVER pass raw Excalidraw JSON (or element arrays) back through subagent return values — because a 79-element return dumps the full 22k tokens right back into main context, defeating isolation entirely.
+- NEVER skip delegation for "small" files — because even the smallest real Excalidraw file is 4k tokens, and you cannot know the size without reading it (which is exactly what you're trying to avoid).
+- NEVER tell a subagent to "read the file and describe everything you see" — because without explicit constraints, subagents default to quoting element details verbatim; always specify the Return format with "plain text only, no JSON".
+- NEVER attempt to write Excalidraw JSON from memory without the element schema — because generated files with wrong `version`, missing `appState`, or invalid `id` formats silently fail to open in Excalidraw; use the schema in references/delegation-templates.md.
+- NEVER use grep/strings to extract text from Excalidraw files in main context — because the surrounding JSON lines still load into the Read tool output; use `jq` inside the subagent instead.
 
-**Context pollution impact:**
-- Reading all 7 project diagrams: 67k tokens (33% of 200k budget)
-- With delegation: ~2k tokens (isolated in subagents)
-- **Savings: 97% context budget preserved**
+## When Things Go Wrong
 
-## Implementation Example
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| Subagent returns a wall of JSON instead of a summary | Task prompt missing explicit Return format constraint | Re-delegate with "IMPORTANT: Return ONLY plain text bullet lists — no JSON, no element IDs" prepended |
+| File exceeds Read tool limit in subagent (>2000 lines) | 79+ element diagrams hit the line cap | Instruct subagent to use Bash + `jq` to extract text/arrow elements before reading |
+| Created diagram won't open in Excalidraw | Invalid file structure or missing required fields | Verify subagent used the canonical wrapper: `{"type":"excalidraw","version":2,"source":"","elements":[...],"appState":{"gridSize":null},"files":{}}` |
+| Subagent positions new elements overlapping existing ones | No collision avoidance in modification prompt | Provide explicit position: tell subagent to find the bounding box of existing elements and offset by 200px |
+| Arrows missing from created diagram | Subagent created nodes but forgot connection elements | Check that the task template included explicit arrow creation steps; re-delegate with the Modify template from references/ |
 
-**❌ BAD (Direct Read):**
-```
-User: "What architecture is shown in detailed-architecture.excalidraw.json?"
-Agent: Let me read that file... [reads 22k tokens into main context]
-```
-
-**✅ GOOD (Subagent Delegation):**
-```
-User: "What architecture is shown in detailed-architecture.excalidraw.json?"
-Agent: I'll use a subagent to extract the architecture details.
-
-[Dispatches Task tool with general-purpose subagent]
-Task: Extract and explain components in .ryanquinn3/ticketing/detailed-architecture.excalidraw.json
-
-[Receives ~500 token summary with component list and relationships]
-[Responds to user with architecture explanation, main context preserved]
-```
-
-## Why "Straightforward JSON" Doesn't Matter
-
-Agents often rationalize: "The format is simple, I can just read it."
+## Delegation Pattern
 
-**The problem isn't complexity - it's verbosity:**
-- Simple structure with 20+ properties per element
-- Repetitive metadata (seed, version, nonce, roughness)
-- Positioning data (x, y, width, height) not semantically useful
-- Visual styling (strokeColor, opacity, fillStyle) irrelevant to content
+**Main agent responsibilities:**
+1. Detect Excalidraw file involvement
+2. Choose the correct template (Read / Modify / Create / Compare)
+3. Dispatch Task tool with the template
+4. Receive text-only summary
+5. Respond to user — main context never touches the file
 
-**Token cost comes from volume, not complexity.**
+**Load the full templates:** `references/delegation-templates.md` — read this file before constructing subagent tasks. It contains:
+- Exact `jq` commands for content extraction
+- Element schema (which fields carry meaning vs. noise)
+- Standard element dimensions and positioning conventions
+- Subagent failure recovery prompts
 
-Even "straightforward" JSON consumes 4k-22k tokens because:
-- 79 elements × ~280 tokens/element = 22k tokens
-- Most tokens are metadata noise
-- Only text labels and relationships matter (~10% of content)
+## Quick Reference
 
-## The Iron Law
+| Operation | Template to Use | Subagent Returns |
+|-----------|----------------|------------------|
+| Understand diagram | Read/Understand | Component list + directed relationships + summary |
+| Add element | Modify/Add Element | New element id + position + confirmation |
+| Create new diagram | Create New Diagram | File path + component count + bulleted list |
+| Compare two diagrams | Compare Two Diagrams | Set differences + relationship deltas |
 
-**Main agents NEVER read Excalidraw files. No exceptions.**
+## Token Reality Check
 
-Not for:
-- "Quick checks"
-- "Small files"
-- "Understanding format"
-- "One-time analysis"
-- "Optimal efficiency"
+| Scenario | Without Delegation | With Delegation |
+|----------|--------------------|-----------------|
+| Single large file (79 elements) | 22k tokens (45% of 50k budget) | ~500 tokens |
+| 7-diagram project survey | 67k tokens (33% of 200k budget) | ~2k tokens |
+| Two-file comparison | 18k tokens | ~800 tokens |
 
-**Always delegate. Isolation is free via subagents.**
+**The Iron Law: Main agents NEVER read Excalidraw files. Delegation is not optional.**
diff --git a/skills/excalidraw/references/delegation-templates.md b/skills/excalidraw/references/delegation-templates.md
new file mode 100644
index 0000000..1cce589
--- /dev/null
+++ b/skills/excalidraw/references/delegation-templates.md
@@ -0,0 +1,130 @@
+# Excalidraw Delegation Templates
+
+Use these templates verbatim when creating subagent tasks. Do NOT modify the Return constraints — they prevent subagents from returning raw JSON into the main context.
+
+## Element Schema (What Subagents Work With)
+
+Excalidraw JSON has two top-level keys:
+- `type: "excalidraw"` — file marker
+- `elements: []` — array of element objects
+- `appState: {}` — viewport/UI state (ignore for content extraction)
+- `files: {}` — embedded images by hash (ignore unless user asks about images)
+
+### Element Types and Useful Fields
+
+| type | Content Fields | Ignore Fields |
+|------|---------------|---------------|
+| `rectangle` / `ellipse` / `diamond` | `id`, `text` (if labeled), `groupIds` | `x`, `y`, `width`, `height`, `seed`, `version`, `roughness`, `strokeColor`, `fillStyle`, `opacity` |
+| `text` | `id`, `text`, `fontSize`, `groupIds` | All positioning, all styling |
+| `arrow` / `line` | `id`, `startBinding.elementId`, `endBinding.elementId`, `label.text` | All geometric points |
+| `frame` | `id`, `name` | All positioning |
+
+**Extraction strategy for subagents:** `jq '[.elements[] | select(.type == "text" or .type == "arrow") | {type, id, text, startBinding, endBinding}]'` — reduces 22k token file to ~200 tokens before processing.
+
+---
+
+## Template: Read / Understand
+
+```
+Task: Extract and explain the architecture in [FILE.excalidraw.json]
+
+Steps:
+1. Use jq to extract text and arrow elements only: jq '[.elements[] | select(.type == "text" or .type == "arrow") | {type, id, text, startBinding, endBinding, label}]' [FILE]
+2. Build a component list from text elements
+3. Build a relationship map from arrows (startBinding.elementId → endBinding.elementId, resolved to text labels)
+4. Identify any frames (logical groupings)
+
+Return (text only — no raw JSON):
+- Component list with descriptions
+- Directed relationships (A → B format)
+- Frame/group boundaries if present
+- One-paragraph architecture summary
+```
+
+---
+
+## Template: Modify / Add Element
+
+```
+Task: Add [COMPONENT] to [FILE.excalidraw.json], connected to [EXISTING-COMPONENT]
+
+Steps:
+1. Read file; use jq to find [EXISTING-COMPONENT] element id and bounding box
+2. Choose a position that does not overlap existing elements (place right/below with 100px gap)
+3. Generate a new element object:
+   - type: "rectangle"
+   - id: crypto.randomUUID() or nanoid(20)
+   - x/y: computed from step 2
+   - width: 160, height: 60 (standard box)
+   - strokeColor: "#1e1e1e", backgroundColor: "transparent", fillStyle: "solid"
+   - text: "[COMPONENT]"
+4. Generate an arrow element from [EXISTING-COMPONENT].id to new element.id
+5. Append both to elements array; write file
+
+Return:
+- New element id
+- Chosen position (x, y)
+- Arrow id
+- Confirmation "File updated: [N] total elements"
+```
+
+---
+
+## Template: Create New Diagram
+
+```
+Task: Create a new Excalidraw diagram showing [DESCRIPTION], write to [FILE.excalidraw.json]
+
+Steps:
+1. List all components/nodes and their relationships from [DESCRIPTION]
+2. Plan a grid or flow layout (top-to-bottom for flows, left-to-right for pipelines)
+3. Assign positions: start at x=100, y=100; spacing: 200px horizontal, 150px vertical
+4. Create element objects for each node (rectangle + text pair or rectangle with label)
+5. Create arrow objects for each relationship
+6. Wrap in Excalidraw file structure:
+   {"type":"excalidraw","version":2,"source":"","elements":[...],"appState":{"gridSize":null},"files":{}}
+7. Write to [FILE]
+
+Return:
+- File path written
+- Component count
+- Relationship count
+- Bulleted list of components included
+```
+
+---
+
+## Template: Compare Two Diagrams
+
+```
+Task: Compare [FILE1.excalidraw.json] vs [FILE2.excalidraw.json]
+
+Steps:
+1. Extract text labels from FILE1 (jq as above)
+2. Extract text labels from FILE2 (jq as above)
+3. Compute set differences: only-in-A, only-in-B, shared
+4. Compare arrow relationships: list connections unique to each file
+
+Return (no raw JSON):
+- Components only in [FILE1]: ...
+- Components only in [FILE2]: ...
+- Shared components: ...
+- Key relationship differences
+- One-sentence summary of architectural difference
+```
+
+---
+
+## Subagent Failure Recovery
+
+If a subagent returns raw JSON or element dumps, do NOT pass it to main context. Re-delegate with this override prepended to the task:
+
+```
+IMPORTANT: Your previous attempt returned raw JSON. Return ONLY:
+- Plain text bullet lists
+- No JSON objects
+- No element IDs unless specifically requested
+- Summarize, do not quote
+```
+
+If the file exceeds the Read tool limit (>2000 lines), instruct the subagent to use Bash with `jq` to extract content rather than the Read tool.
diff --git a/skills/feedback-mastery/SKILL.md b/skills/feedback-mastery/SKILL.md
index 4f2c6ad..ea2e3dd 100644
--- a/skills/feedback-mastery/SKILL.md
+++ b/skills/feedback-mastery/SKILL.md
@@ -1,292 +1,110 @@
 ---
 name: feedback-mastery
-description: Navigate difficult conversations and deliver constructive feedback using structured frameworks. Covers the Preparation-Delivery-Follow-up model and Situation-Behavior-Impact (SBI) feedback technique. Use when preparing for difficult conversations, giving feedback, or managing conflicts.
+description: Deliver high-stakes feedback, navigate difficult workplace conversations, and coach others through performance issues. Use when giving feedback, preparing for a hard 1:1, addressing conflict, managing underperformance, or coaching someone on receiving feedback. Keywords: feedback, difficult conversation, performance, conflict, behavior, 1:1, one-on-one, confrontation, underperformance, coaching.
 allowed-tools: Read, Glob, Grep
 ---
 
-# Feedback Conversations
+# Feedback Mastery
 
-## Overview
+## Mindset
 
-This skill provides frameworks for navigating difficult workplace conversations and delivering effective feedback. Whether you're addressing performance issues, resolving conflicts, or giving constructive feedback, these structured approaches lead to better outcomes.
+1. **Feedback is information transfer, not judgment delivery.** The moment the receiver perceives evaluation, their prefrontal cortex partially shuts down — defensiveness is physiological, not character flaw. Frame as "I noticed / I want to understand" not "you did X wrong."
 
-**Core insight:** Research shows that employees who approach difficult conversations with preparation and a clear framework are **60% more likely to reach a positive resolution** than those who engage without a plan.
+2. **Timing destroys or amplifies impact.** Feedback given within 48 hours of an event lands 3–4x more effectively than delayed feedback. "Annual review surprises" are managerial malpractice — they signal you withheld useful information for months.
 
-## When to Use This Skill
+3. **Psychological safety is the prerequisite, not the outcome.** If the receiver doesn't feel safe, no framework saves you. Check safety first: if someone is visibly activated (defensive posture, clipped answers, raised voice), name it before proceeding.
 
-Use this skill when:
+4. **The goal of feedback is behavior change, not catharsis.** If you need to vent, do it elsewhere. The conversation exists to help them, not to make you feel like you said something.
 
-- Preparing to give feedback to a colleague or direct report
-- Addressing performance issues or missed expectations
-- Navigating conflict between team members
-- Having 1:1 conversations about sensitive topics
-- Receiving feedback and wanting to respond constructively
-- Managing expectations with stakeholders
+5. **Pattern vs. instance matters more than severity.** A single bad code review matters less than 3 in a row. Leading with "I've noticed a pattern" signals seriousness without catastrophizing a one-time event.
 
-**Keywords**: feedback, difficult conversation, 1:1, one-on-one, performance, conflict, expectations, behavior, confrontation
+## Navigation
 
-## Core Frameworks
+**Use this skill when**:
+- Preparing to give feedback (corrective or positive) to a colleague or direct report
+- Navigating conflict, tension, or misaligned expectations with stakeholders
+- Coaching someone who received feedback poorly and needs to process it
+- Handling underperformance conversations where HR escalation is possible
+- Facilitating a conversation between two people in conflict
 
-### The Preparation-Delivery-Follow-up Model
+**Do NOT use this skill when**:
+- Writing performance review prose (use a writing skill instead)
+- Termination conversations (require HR, legal review — out of scope)
+- Therapy or mental health support (refer appropriately)
 
-A three-part structure for difficult conversations:
-
-| Phase | Focus | Key Questions |
-| --- | --- | --- |
-| **Preparation** | Understand the issue, define goals, manage emotions | What's the problem? What outcome do I want? Am I calm? |
-| **Delivery** | Open neutrally, use facts not blame, encourage dialogue | How do I start? What evidence do I have? How do I involve them? |
-| **Follow-up** | Document actions, set check-ins, provide support | What did we agree to? When will we check in? How do I support? |
-
-### The SBI Feedback Model
-
-**Situation-Behavior-Impact (SBI)** structures feedback to be specific, objective, and actionable:
-
-| Component | Description | Example |
-| --- | --- | --- |
-| **Situation** | Describe the specific context | "During yesterday's code review..." |
-| **Behavior** | State the observable action (not interpretation) | "...you interrupted Sarah three times while she was explaining her approach..." |
-| **Impact** | Explain the effect on team/project/person | "...which made her hesitate to share ideas and slowed down our discussion." |
-
-**Why it works:** SBI removes assumptions and focuses on observable facts, reducing defensiveness.
-
-## Preparation Phase
-
-### Step 1: Understand the Issue
-
-Ask yourself:
-
-- **What exactly is the problem?** (Be specific, not vague)
-- **How does it impact the team, project, or company?**
-- **Have I gathered all relevant facts?**
-- **Is this a pattern or a one-time event?**
-
-### Step 2: Define Your Goals
-
-Before the conversation, clarify what you're seeking:
-
-| Goal Type | Example |
-| --- | --- |
-| Behavior change | "I want them to submit code reviews on time" |
-| Mutual understanding | "I want to understand what's blocking them" |
-| Expectation setting | "I want to clarify what 'done' means for features" |
-| Problem solving | "I want to find a solution together" |
-
-**Tip:** Use if-then statements to clarify stakes:
-> "If this behavior continues, then the project timeline will suffer, leading to missed deliverables."
-
-### Step 3: Manage Your Emotions
-
-High emotional intensity reduces cognitive processing by 30%. Before the conversation:
-
-- [ ] Am I calm and in control?
-- [ ] Have I separated facts from personal frustrations?
-- [ ] Have I considered their perspective?
-- [ ] Can I present this without accusation?
-
-**Reframing technique:**
-
-| Accusatory | Constructive |
-| --- | --- |
-| "You always miss deadlines and it slows everyone down" | "I've noticed some recent delays and want to understand any challenges you're facing" |
-| "You never test your code properly" | "I've seen a few bugs slip through recently. Let's talk about our testing process" |
-
-## Delivery Phase
-
-### The Three-Step Delivery Formula
-
-1. **Open with neutrality and intent**
-2. **Present the issue using facts, not blame**
-3. **Encourage dialogue and solutions**
-
-### Opening Lines That Work
-
-| Context | Opening |
-| --- | --- |
-| General | "I want to talk about something important to our team's success, and I'd love to hear your perspective." |
-| Performance | "I've noticed some patterns I'd like to discuss. My goal is to support you, not criticize." |
-| Conflict | "I sense there might be some tension, and I'd like to understand what's happening from your side." |
-| Expectations | "I want to make sure we're aligned on expectations. Can we talk through how this project is going?" |
-
-### Facts, Not Blame
-
-| Blaming | Factual |
-| --- | --- |
-| "You're not committed to this project" | "I've noticed your updates have been brief in our last three meetings. Is something affecting your workload?" |
-| "You don't care about code quality" | "This PR had 12 bugs caught in QA. Let's talk about what happened and how we can improve" |
-| "You're always late" | "The standup started at 9:00 and you joined at 9:15 the last three days. What's going on?" |
-
-**Key principles:**
-
-- Use specific examples, not generalizations ("always," "never")
-- Stick to observable behaviors, not assumptions about motives
-- Focus on impact, not character
-
-### Encouraging Dialogue
-
-After stating your observation, shift to collaboration:
-
-| Situation | Dialogue Prompt |
-| --- | --- |
-| Understanding barriers | "What's been challenging about this?" |
-| Seeking their view | "How do you see the situation?" |
-| Finding solutions | "What would help you succeed here?" |
-| Checking alignment | "Does this match your understanding of what happened?" |
-
-## Follow-up Phase
-
-Even successful conversations need follow-through to create lasting change.
-
-### Follow-up Checklist
-
-- [ ] **Document agreed-upon action items** - What specifically will change?
-- [ ] **Set check-in dates** - When will you revisit this?
-- [ ] **Provide ongoing support** - How will you help them succeed?
-- [ ] **Celebrate progress** - Recognize improvements when they happen
-
-### Sample Follow-up Message
-
-```markdown
-Hi [Name],
-
-Thanks for the conversation yesterday. I appreciated your openness.
-
-**What we agreed to:**
-- [Action item 1] - [Timeline]
-- [Action item 2] - [Timeline]
-
-**Check-in:** Let's reconnect [date] to see how things are going.
-
-I'm here if you need any support. Thanks for working through this with me.
-
-Best,
-[Your name]
+**Quick decision tree for ambiguous inputs:**
+```
+Is this corrective or positive feedback?
+  ├─ Corrective + first time → SBI + curiosity (explore root cause)
+  ├─ Corrective + recurring pattern → SBI + explicit stakes + plan
+  ├─ Corrective + HR-level severity → load references/difficult-conversation-scripts.md
+  └─ Positive → SBI + specific impact (skip the "but" — never pair positive/corrective)
 ```
 
-## SBI Examples for Software Teams
-
-### Positive Feedback
-
-**Code Review:**
-> **Situation:** "During Tuesday's code review for the authentication module..."
-> **Behavior:** "...you provided detailed comments on potential security vulnerabilities and suggested efficient fixes..."
-> **Impact:** "...which strengthened our security posture and saved the team hours of debugging later."
-
-**Collaboration:**
-> **Situation:** "In yesterday's architecture discussion..."
-> **Behavior:** "...you asked clarifying questions and built on others' ideas instead of pushing your own solution..."
-> **Impact:** "...which helped us reach consensus faster and made everyone feel heard."
-
-### Constructive Feedback
-
-**Missed Deadlines:**
-> **Situation:** "When we were finalizing the API deployment last Thursday..."
-> **Behavior:** "...your testing results came in two hours after our agreed cutoff..."
-> **Impact:** "...which delayed the release, risked our SLA, and caused the QA team to work overtime."
-
-**Meeting Behavior:**
-> **Situation:** "In our sprint planning yesterday..."
-> **Behavior:** "...you were on your phone for most of the discussion and didn't contribute when we asked for estimates..."
-> **Impact:** "...which left the team without your expertise on the backend stories and made others feel their time wasn't valued."
-
-**For more examples:** See `references/feedback-sbi-model.md`
-
-## Common Difficult Scenarios
-
-### Scenario: Performance Issue
-
-**Situation:** A developer consistently delivers code with bugs.
-
-**Approach:**
-
-1. **Prepare:** Gather specific examples (PRs, bug counts, timelines)
-2. **Deliver:** "I've noticed [X bugs in last Y PRs]. I want to understand what's happening and how I can support you."
-3. **Explore:** Ask about workload, clarity of requirements, testing confidence
-4. **Collaborate:** "What would help you feel more confident about code quality?"
-5. **Follow-up:** Check in after agreed changes, recognize improvements
-
-### Scenario: Conflict Between Team Members
-
-**Situation:** Two engineers disagree on technical approach and it's affecting the team.
-
-**Approach:**
-
-1. **Meet separately first:** Understand each perspective
-2. **Find common ground:** What do they both want? (Working product, good code, etc.)
-3. **Facilitate together:** Focus on facts and trade-offs, not personalities
-4. **Establish decision process:** How will the team decide when there's disagreement?
-5. **Follow-up:** Check that the solution is working
-
-### Scenario: Unrealistic Expectations
+## Philosophy
 
-**Situation:** Leadership wants a feature in half the time needed.
+Feedback is an act of respect — it assumes the person can change and is worth investing in. Withholding honest feedback to avoid discomfort is not kindness; it's abandonment. Deliver what's true, specifically, without cruelty.
 
-**Approach:**
+## NEVER
 
-1. **Prepare:** Data on similar past work, breakdown of required tasks
-2. **Deliver:** "I want to make sure we're aligned on what's realistic. Here's what I'm seeing..."
-3. **Present trade-offs:** "We can hit that date if we [reduce scope/add people/accept risk]"
-4. **Collaborate:** "What's most important here - the date or the full feature set?"
-5. **Document:** Get agreement in writing to avoid future misalignment
+- **NEVER use the feedback sandwich (positive-negative-positive)** — because research shows it trains people to distrust positive feedback and miss the corrective message. Receivers remember the bread, not the filling. Use pure SBI instead.
 
-**For detailed scripts:** See `references/difficult-conversation-scripts.md`
+- **NEVER give corrective feedback in public** — because social threat activates the same neural circuits as physical threat (fMRI studies by Lieberman, 2013). Public correction compounds shame and guarantees defensiveness; the person will protect ego, not process content.
 
-## Receiving Feedback Well
+- **NEVER delay feedback to "find the right moment" indefinitely** — because memory encoding degrades within 72 hours. Waiting for perfect conditions means the specific behavior is no longer vivid to either party, and the conversation devolves into abstract debate about character.
 
-When you're on the receiving end:
+- **NEVER pair positive and corrective feedback in the same sentence with "but"** — because "but" neurologically erases everything before it. "You did great work on the API, but the tests were weak" lands as "your tests were weak." Use separate conversations or at minimum separate paragraphs with a full stop.
 
-### During the Conversation
+- **NEVER give feedback when you or the receiver are emotionally flooded** — because cortisol and adrenaline impair complex cognition in both parties. A flooded conversation produces defensive agreements that don't stick. Reschedule explicitly: "I want to have this conversation when we're both at our best — can we do 10am tomorrow?"
 
-1. **Listen fully** - Don't prepare your defense while they're talking
-2. **Ask clarifying questions** - "Can you give me a specific example?"
-3. **Paraphrase to confirm** - "So what you're saying is..."
-4. **Acknowledge impact** - Even if intent was different: "I can see how that affected you"
-5. **Don't get defensive** - Thank them for raising it
+- **NEVER interpret behavior aloud without evidence** — because attributing motive ("you clearly don't care about quality") activates the fundamental attribution error and makes the receiver defend their character rather than examine their behavior. Stick to observable actions and their effects.
 
-### After the Conversation
+- **NEVER skip the "impact" step of SBI** — because behavior without impact sounds like nitpicking. Impact connects the behavior to something the receiver actually cares about (team success, trust, project outcomes). Without it, feedback feels like personal preference, not professional necessity.
 
-1. **Reflect honestly** - Is there truth in the feedback?
-2. **Identify actions** - What will you do differently?
-3. **Follow up** - Let them know what you're changing
-4. **Ask for ongoing feedback** - Show you're committed to growth
+## Core Technique: SBI (Extended Practitioner Version)
 
-## Quick Reference: Difficult Conversation Checklist
+Claude already knows the basic SBI framework. What it misses:
 
-### Before
+**The "I" in Impact must connect to receiver values, not just your frustration.** If you know they care about team perception, connect the impact to that. If they care about code quality, connect there. Generic impact ("it slowed us down") lands weaker than specific impact ("it meant Sarah had to redo 3 hours of work and now doubts whether to surface issues").
 
-- [ ] I understand the specific issue
-- [ ] I have concrete examples
-- [ ] I've defined my goal for the conversation
-- [ ] I'm emotionally regulated
-- [ ] I've considered their perspective
+**SBI+ for recurring patterns:**
+> Situation → Behavior → Impact → **Expectation** → **Stakes**
 
-### During
+Add: "Going forward, I need [specific behavior]. If this continues, [concrete consequence]." The stakes must be real and proportional — don't threaten what you won't follow through on.
 
-- [ ] I opened with neutrality and intent
-- [ ] I stated facts, not blame
-- [ ] I used SBI for specific feedback
-- [ ] I asked for their perspective
-- [ ] I focused on solutions, not just problems
-- [ ] I documented agreed actions
+**Positive SBI is not trivial.** Most managers skip it entirely or give generic praise. Specific positive SBI ("In Thursday's incident review, you asked the question no one else would — which unlocked the root cause and saved us 2 days") builds the safety account that makes corrective feedback land better later.
 
-### After
+## Timing Heuristics
 
-- [ ] I sent a follow-up summary
-- [ ] I scheduled a check-in
-- [ ] I'm providing ongoing support
-- [ ] I'm recognizing progress
+| Scenario | Optimal Timing | Why |
+|----------|---------------|-----|
+| In-meeting behavior | Within same day, privately | Memory sharp; behavior still vivid |
+| Code/work quality | Within 48 hours of delivery | Before they're deep in next task |
+| Interpersonal conflict | After 2–4 hours cooling, same day | Enough distance to avoid flood; close enough to be concrete |
+| Pattern (recurring issue) | Scheduled 1:1, not ad hoc | Signals importance, allows preparation |
+| Crisis/incident behavior | After incident closes, not during | During crisis, feedback = distraction |
 
-## Companion Resources
+## When Things Go Wrong
 
-- `references/feedback-sbi-model.md` - Full SBI framework with more examples
-- `references/difficult-conversation-scripts.md` - Opening lines and responses
-- `references/expectation-alignment.md` - Managing stakeholder expectations
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| Receiver goes silent or shuts down | Social threat response activated; they feel ambushed or shamed | Name it: "I notice you've gone quiet — I want to make sure this feels like a conversation, not a verdict. What's your reaction?" |
+| Receiver becomes defensive / attacks back | Behavior was stated as interpretation, not observation; or they feel blindsided | Return to facts: "I want to make sure I'm working from what I actually observed. Can we back up to [specific event]?" |
+| Receiver agrees but nothing changes | Agreement was to end the discomfort, not from genuine buy-in | Explicit follow-through plan required: named actions, dates, check-in. "What specifically will you do differently by [date]?" |
+| You realize mid-conversation you lack enough specifics | Feedback was prepared based on impressions not evidence | Pause honestly: "I realize I should have more specific examples ready. Can we schedule 30 minutes tomorrow when I can come prepared?" |
+| Positive feedback lands flat / receiver dismisses it | Generic praise, or receiver distrusts your motives | Increase specificity: name exact action, exact effect, exact why it mattered. Vague praise is discounted; surgical praise lands. |
 
+## Reference Loading Triggers
 
-## Recommended Reading
+Load `references/difficult-conversation-scripts.md` when:
+- User needs exact opening lines, scripts, or word-for-word phrasing
+- Scenario involves HR-sensitivity, termination risk, or legal exposure
+- User is conflict-averse and needs scripted scaffolding to start
 
-- "Crucial Conversations" by Kerry Patterson & Joseph Grenny
-- "Difficult Conversations" by Stone, Patton, Heen
-- "Radical Candor" by Kim Scott
-- Amy Edmondson's research on psychological safety
+Load `references/feedback-sbi-model.md` when:
+- User needs more SBI examples across different contexts (code reviews, meetings, deliverables)
+- User is learning the framework and needs worked examples
 
----
+Load `references/expectation-alignment.md` when:
+- Issue is stakeholder misalignment, scope creep, or "moving goalposts"
+- User needs to reset expectations without creating conflict
diff --git a/skills/frontend-to-backend-requirements/SKILL.md b/skills/frontend-to-backend-requirements/SKILL.md
index 09352d3..9844170 100644
--- a/skills/frontend-to-backend-requirements/SKILL.md
+++ b/skills/frontend-to-backend-requirements/SKILL.md
@@ -1,193 +1,72 @@
 ---
 name: frontend-to-backend-requirements
-description: Document frontend data needs for backend developers. Use when frontend needs to communicate API requirements to backend, or user says 'backend requirements', 'what data do I need', 'API requirements', or is describing data needs for a UI.
+description: Translate frontend UI needs into backend requirements documents without specifying implementation. Use when frontend needs to communicate API requirements to backend, or user says 'backend requirements', 'what data do I need', 'API requirements', 'what should the backend provide', or is describing data needs for a UI component or screen.
 ---
 
-# Backend Requirements Mode
+## Mindset
 
-You are a frontend developer documenting what data you need from backend. You describe the **what**, not the **how**. Backend owns implementation details.
+- **You are a journalist, not an architect.** Your job is to describe what the user sees and does — not design the system that makes it happen. The moment you specify a field name or endpoint structure, you've crossed the line.
+- **Ambiguity is information.** If you're unsure whether two UI elements share data, say so explicitly. Backend uses those uncertainty signals to design for real usage, not imagined usage.
+- **Requirements age.** A doc that gets one round of feedback and is never touched again is a liability. The Discussion Log section is the most important part — it captures the negotiation, not just the outcome.
+- **"Simple" is not simple.** "Just give me all the user's data" causes N+1 query disasters, over-fetching, and security leaks. Frontend developers who think in terms of rendering often underestimate the cost of data aggregation. Surface that tension in questions.
+- **Boundary enforcement is your constraint, not backend's.** If you catch yourself writing "the response should contain...", stop. That's backend territory. Reframe as "I need to display..." or "the user needs to be able to...".
 
-> **No Chat Output**: ALL responses go to `.claude/docs/ai/<feature-name>/backend-requirements.md`
-> **No Implementation Details**: Don't specify endpoints, field names, or API structure—that's backend's call.
+## Navigation
 
----
-
-## The Point
-
-This mode is for frontend devs to communicate data needs:
-- What data do I need to render this screen?
-- What actions should the user be able to perform?
-- What business rules affect the UI?
-- What states and errors should I handle?
-
-**You're requesting, not demanding.** Backend may push back, suggest alternatives, or ask clarifying questions. That's healthy collaboration.
-
----
-
-## What You Own vs. What Backend Owns
-
-| Frontend Owns | Backend Owns |
-|---------------|--------------|
-| What data is needed | How data is structured |
-| What actions exist | Endpoint design |
-| UI states to handle | Field names, types |
-| User-facing validation | API conventions |
-| Display requirements | Performance/caching |
+**Use this skill when**:
+- A frontend developer needs to communicate data needs to a backend team
+- Planning a new feature that requires new or modified API support
+- Documenting what an existing UI actually needs (often before refactoring an API)
+- Any user says: "backend requirements", "what data do I need", "API requirements", "what should the backend provide"
 
----
+**Do NOT use this skill when**:
+- The user wants to design the API itself (use an API design skill instead)
+- The user wants to document existing endpoints (use API docs tooling)
+- The backend is a BFF (Backend for Frontend) they own — the boundary dynamic changes
+- There is no backend team; they're working solo with a self-owned API
 
-## Workflow
+**Quick decision**: If the user is describing what they want to *build*, they need API design. If they're describing what they want to *display*, this skill applies.
 
-### Step 1: Describe the Feature
+## Philosophy
 
-Before listing requirements:
+Frontend requirements are a contract negotiation, not a specification. The document's job is to communicate intent and surface uncertainty — not to constrain the backend into a corner before they've had a chance to think.
 
-1. **What is this?** — Screen, flow, component
-2. **Who uses it?** — User type, permissions
-3. **What's the goal?** — What does success look like?
+## NEVER
 
-### Step 2: List Data Needs
+- NEVER specify field names, endpoint paths, HTTP methods, or JSON structure — because doing so anchors the backend to your mental model of the data, which is almost always wrong about something. Backend has constraints (normalization, existing schemas, caching layers) you can't see from the UI.
+- NEVER omit the "why" behind a data need — because without context, backend will make assumptions that optimize for their convenience, not your use case. "Show me the user's name" leads to a different schema decision than "Show the user's name on a public-facing receipt where it must match their legal ID."
+- NEVER write requirements for only the happy path — because backend will build exactly what you describe. If you don't document the empty state, the error state, and the "partially loaded" state, you'll get an API that only handles success.
+- NEVER finalize requirements without explicit open questions — because silence implies agreement. An unchallenged requirement is treated as a signed contract. Forcing the backend to respond to questions surfaces design mismatches before they're coded.
+- NEVER treat the document as one-way — because requirements docs that get no backend response die in a drawer. The Discussion Log is the mechanism that makes this a living document. If backend hasn't responded, the doc isn't done.
+- NEVER describe requirements in terms of component implementation ("the Card component needs...") — because this ties requirements to a UI decision that may change. Describe user goals, not component structure.
 
-For each screen/component, describe:
+## When Things Go Wrong
 
-**Data I need to display:**
-- What information appears on screen?
-- What's the relationship between pieces?
-- What determines visibility/state?
-
-**Actions user can perform:**
-- What can the user do?
-- What's the expected outcome?
-- What feedback should they see?
-
-**States I need to handle:**
-- Loading, empty, error, success
-- Edge cases (partial data, expired, etc.)
-
-### Step 3: Surface Uncertainties
-
-List what you're unsure about:
-- Business rules you don't fully understand
-- Edge cases you're not sure how to handle
-- Places where you're guessing
-
-**These invite backend to clarify or push back.**
-
-### Step 4: Leave Room for Discussion
-
-End with open questions:
-- "Would it make sense to...?"
-- "Should I expect...?"
-- "Is there a simpler way to...?"
+| Situation | Likely Cause | Recovery |
+|-----------|--------------|----------|
+| Backend says the requirements are contradictory | Frontend described two different screens as if they used the same data, but the data is scoped differently | Split the requirements by screen and clarify which user/context scoping applies to each |
+| Requirements doc is ignored entirely | It reads like a spec, not a request — backend feels dictated to | Reframe every bullet as a question or observation; add explicit "let me know if this assumption is wrong" phrases throughout |
+| Backend returns a different shape than expected | The requirements didn't explain the relationship between data entities | Revise the doc to describe how pieces relate ("for each X, I also need to show its Y") rather than listing fields in isolation |
+| Backend says the data doesn't exist | You described display logic as if it were a stored property (e.g., "I need the user's 'completion percentage'") | Distinguish between computed/derived data needs and stored data needs — ask backend if the computation should happen server-side or client-side |
 
 ---
 
-## Output Format
-
-Create `.claude/docs/ai/<feature-name>/backend-requirements.md`:
-
-```markdown
-# Backend Requirements: <Feature Name>
-
-## Context
-[What we're building, who it's for, what problem it solves]
-
-## Screens/Components
-
-### <Screen/Component Name>
-**Purpose**: What this screen does
-
-**Data I need to display**:
-- [Description of data piece, not field name]
-- [Another piece]
-- [Relationships between pieces]
-
-**Actions**:
-- [Action description] → [Expected outcome]
-- [Another action] → [Expected outcome]
-
-**States to handle**:
-- **Empty**: [When/why this happens]
-- **Loading**: [What's being fetched]
-- **Error**: [What can go wrong, what user sees]
-- **Special**: [Any edge cases]
+## Output
 
-**Business rules affecting UI**:
-- [Rule that changes what's visible/enabled]
-- [Permissions that affect actions]
+All output goes to `.claude/docs/ai/<feature-name>/backend-requirements.md` — no chat output.
 
-### <Next Screen/Component>
-...
+Use the template at [references/output-template.md](references/output-template.md).
 
-## Uncertainties
-- [ ] Not sure if [X] should show when [Y]
-- [ ] Don't understand the business rule for [Z]
-- [ ] Guessing that [A] means [B]
+See [references/elicitation-questions.md](references/elicitation-questions.md) for elicitation questions to ask the user before writing the doc.
 
-## Questions for Backend
-- Would it make sense to combine [X] and [Y]?
-- Should I expect [Z] to always be present?
-- Is there existing data I can reuse for [W]?
+See [references/worked-example.md](references/worked-example.md) for a complete realistic example.
 
-## Discussion Log
-[Backend responses, decisions made, changes to requirements]
-```
+## Ownership Boundary (quick reference)
 
----
-
-## Good vs. Bad Requests
-
-### Bad (Dictating Implementation)
-> "I need a GET /api/contracts endpoint that returns an array with fields: id, title, status, created_at"
-
-### Good (Describing Needs)
-> "I need to show a list of contracts. Each item shows the contract title, its current status, and when it was created. User should be able to filter by status."
-
-### Bad (Assuming Structure)
-> "The provider object should be nested inside the contract response"
-
-### Good (Describing Relationship)
-> "For each contract, I need to show who the provider is (their name and maybe logo)"
-
-### Bad (No Context)
-> "I need contract data"
-
-### Good (With Context)
-> "On the dashboard, there's a 'Recent Contracts' widget showing the 5 most recent contracts. User clicks one to go to detail page."
-
----
-
-## Encouraging Pushback
-
-Include these prompts in your requirements:
-
-- "Let me know if this doesn't make sense for how the data is structured"
-- "Open to suggestions on a better approach"
-- "Not sure if this is the right way to think about it"
-- "Push back if this complicates things unnecessarily"
-
-**Good collaboration = frontend describes the problem, backend proposes the solution.**
-
----
-
-## Rules
-
-- **NO IMPLEMENTATION DETAILS**—don't specify endpoints, methods, field names
-- **DESCRIBE, DON'T PRESCRIBE**—say what you need, not how to provide it
-- **INCLUDE CONTEXT**—why you need it helps backend make better choices
-- **SURFACE UNKNOWNS**—don't hide confusion, invite clarification
-- **INVITE PUSHBACK**—explicitly ask for backend's input
-- **UPDATE THE DOC**—add backend responses to Discussion Log
-- **STAY HUMBLE**—you're asking, not demanding
-
----
-
-## After Backend Responds
-
-Update the requirements doc:
-1. Add responses to Discussion Log
-2. Adjust requirements based on feedback
-3. Mark resolved uncertainties
-4. Note any decisions made
-
-The doc becomes the source of truth for what was agreed.
+| Frontend Owns | Backend Owns |
+|---------------|--------------|
+| What data is needed | How data is structured |
+| What actions exist | Endpoint design |
+| UI states to handle | Field names, types |
+| User-facing validation | API conventions |
+| Display requirements | Performance/caching |
diff --git a/skills/frontend-to-backend-requirements/references/elicitation-questions.md b/skills/frontend-to-backend-requirements/references/elicitation-questions.md
new file mode 100644
index 0000000..c51d76a
--- /dev/null
+++ b/skills/frontend-to-backend-requirements/references/elicitation-questions.md
@@ -0,0 +1,47 @@
+# Elicitation Questions
+
+Ask these before writing the requirements doc. Not all apply to every feature — use judgment.
+
+## Feature Scoping
+- What is the name of the feature/screen/flow?
+- Who is the user performing this action? (role, permissions, context)
+- What problem does this solve for them?
+- Is this a new surface or modifying an existing one?
+
+## Data Needs
+- What information appears on screen? Walk me through it visually.
+- Are any of these values derived or computed (e.g., "completion rate", "days remaining")?
+- Do you need historical data or only current state?
+- Is any data user-specific vs. shared across users?
+- Are there items in a list? How many? Is there pagination/infinite scroll?
+- Do items in a list need to link to a detail view? What does the detail need that the list doesn't?
+
+## Relationships
+- For each piece of data, is it standalone or related to something else?
+  - Example: "I show a contract — do I also need the vendor attached to it? Their logo? Their contact?"
+- Are there nested relationships? How deep?
+- Can the same item appear in multiple contexts with different fields needed?
+
+## Actions
+- What can the user do on this screen?
+- What happens immediately in the UI when they do it?
+- What confirmation or feedback should they see?
+- Are there actions that should be disabled or hidden for certain users/states?
+
+## States
+- What does the screen look like before data loads?
+- What if there's no data? (Empty state)
+- What if the data fails to load? (Error state)
+- Are there partial states — e.g., some data loads fast and some slow?
+- Can the user be in a state where some actions are temporarily unavailable?
+
+## Business Rules
+- Are there any rules about what's visible based on the user's role, plan tier, or account state?
+- Are there date-based rules (e.g., show X only if contract is not expired)?
+- Are there count-based rules (e.g., disable action if limit reached)?
+- Are there any rules you're not fully sure about?
+
+## Uncertainties
+- What are you guessing at right now?
+- What would you ask backend if you had five minutes with them?
+- Are there any existing APIs you think might partially solve this, but you're not sure?
diff --git a/skills/frontend-to-backend-requirements/references/output-template.md b/skills/frontend-to-backend-requirements/references/output-template.md
new file mode 100644
index 0000000..ff7cb37
--- /dev/null
+++ b/skills/frontend-to-backend-requirements/references/output-template.md
@@ -0,0 +1,77 @@
+# Output Template: Backend Requirements
+
+Save to: `.claude/docs/ai/<feature-name>/backend-requirements.md`
+
+---
+
+```markdown
+# Backend Requirements: <Feature Name>
+
+**Author**: [frontend dev name or team]
+**Date**: [YYYY-MM-DD]
+**Status**: Draft / In Review / Agreed
+
+## Context
+
+[What we're building, who uses it, and what problem it solves. 2-4 sentences.
+Include: user type, the workflow they're in, and why this data matters to them.]
+
+## Screens / Components
+
+### <Screen or Component Name>
+
+**Purpose**: [One sentence — what this screen does for the user]
+
+**Data I need to display**:
+- [Description of data piece — describe it in user-facing terms, not field names]
+- [Another piece — note any relationship: "for each X, I also need its Y"]
+- [Note if anything is computed/derived vs. stored]
+
+**Actions the user can perform**:
+- [Action] → [Immediate feedback user sees] → [Expected outcome after completion]
+- [Another action] → ...
+
+**States I need to handle**:
+- **Loading**: [What's being fetched; what the user sees during load]
+- **Empty**: [When/why this happens; what the user sees]
+- **Error**: [What can go wrong; what error message or fallback the user sees]
+- **Special**: [Any edge case states: partial data, expired items, permission-blocked, etc.]
+
+**Business rules affecting UI**:
+- [Rule that changes what's visible or enabled — e.g., "Delete button only shows for admin users"]
+- [Date-based or count-based rules]
+- [Anything you're not sure about — flag it]
+
+---
+
+### <Next Screen or Component>
+
+[Repeat structure above]
+
+---
+
+## Uncertainties
+
+Things I'm not sure about — please help clarify:
+
+- [ ] Not sure if [X] should appear when [Y condition] — guessing yes, but could be wrong
+- [ ] Don't understand the business rule for [Z] — assuming it works like [A], let me know
+- [ ] Is [B] always present, or can it be null? I'm not handling null today but can add it
+
+## Questions for Backend
+
+Open questions — push back or suggest a better approach:
+
+- Would it make sense to combine [X] and [Y] into a single call, or does that create problems on your end?
+- Should I expect [Z] to always be present, or do I need to handle it being missing?
+- Is there existing data I can reuse for [W], or will this require new work?
+- I'm thinking of [approach] — does that make your life harder or easier?
+
+## Discussion Log
+
+[Add backend responses, decisions, and agreed changes here as the conversation progresses]
+
+| Date | From | Note |
+|------|------|------|
+| [date] | [name] | [decision or response] |
+```
diff --git a/skills/frontend-to-backend-requirements/references/worked-example.md b/skills/frontend-to-backend-requirements/references/worked-example.md
new file mode 100644
index 0000000..557a966
--- /dev/null
+++ b/skills/frontend-to-backend-requirements/references/worked-example.md
@@ -0,0 +1,137 @@
+# Worked Example: Invoice Detail Screen
+
+This is a realistic, complete example of a backend requirements document. Notice:
+- No field names, endpoint paths, or JSON structures
+- Every data need has a "why"
+- States are fully specified
+- Business rules are flagged even when uncertain
+- Open questions are specific and invite real discussion
+
+---
+
+```markdown
+# Backend Requirements: Invoice Detail Screen
+
+**Author**: Frontend team
+**Date**: 2025-03-14
+**Status**: Draft — awaiting backend review
+
+## Context
+
+We're building an invoice detail view for billing administrators. When a billing admin
+clicks an invoice in the invoice list, they land here. The goal is to let them review
+what was charged, why, and take action (download, dispute, mark as paid). This is
+admin-only — regular users can't reach this screen.
+
+## Screens / Components
+
+### Invoice Detail Page
+
+**Purpose**: Show a single invoice in full, with enough context to understand and act on it.
+
+**Data I need to display**:
+
+- Invoice identity: something that lets the admin identify this invoice at a glance — a
+  number, a date range, and who it's for (account name)
+- Line items: each thing being charged — a human-readable description of what it is,
+  the quantity, the unit price, and the line total. There could be anywhere from 1 to
+  ~50 line items.
+- Subtotal, any taxes or fees, and the total. I'm not sure if taxes are stored per line
+  or per invoice — flagged in Uncertainties.
+- Invoice status: whether it's paid, overdue, disputed, or pending. The UI changes
+  significantly based on this.
+- Payment history: if payments have been made against this invoice (partial payments are
+  allowed, I think?), I need to show those with dates and amounts.
+- Who issued the invoice (the org/entity on our side) and who it was issued to (the
+  customer). Both need enough info to appear on a legal document — name, address,
+  possibly tax ID.
+
+**Actions the user can perform**:
+
+- Download as PDF → User sees a "downloading" spinner, then browser download starts.
+  The PDF should look like a formal invoice (assume backend generates it).
+- Dispute invoice → Opens a confirmation modal, user submits a reason. After submit:
+  invoice status changes to "disputed" and admin sees a success toast. Not sure if this
+  triggers an email automatically — flagged below.
+- Mark as paid (only if status is "pending" or "overdue") → Confirmation modal with
+  payment date picker. After confirm: status updates to "paid", payment record appears
+  in payment history.
+- Download payment receipts (per payment, if payments exist) → Similar to PDF download.
+
+**States I need to handle**:
+
+- **Loading**: Full page skeleton while invoice loads. Line items area is the slowest
+  part visually — consider if line items can be deferred.
+- **Empty (no line items)**: Shouldn't happen, but if it does, show a message rather
+  than an empty table.
+- **Error**: If the invoice fails to load, show an error with a retry button. Don't
+  show a partial invoice — that's worse than an error.
+- **Not found**: If the invoice ID is invalid or the admin doesn't have access, show
+  a 404-style message. Don't expose whether the invoice exists but is inaccessible.
+- **Disputed state**: Most actions are disabled. Admin can only see the invoice and
+  contact support. I need to know what "disputed" blocks.
+- **Partial payment**: Invoice shows as partially paid. Payment history shows multiple
+  entries. Remaining balance visible. I'm not sure what the status label should be —
+  "partially paid"? Flagged.
+
+**Business rules affecting UI**:
+
+- Only billing admins can reach this page — assume auth is enforced server-side, but I
+  need the API to return a clear "unauthorized" vs "not found" so I can show the right
+  message.
+- Dispute action should only be available within 30 days of invoice date (I think?).
+  Not sure — please confirm. If correct, I need to know the invoice date to compute this
+  client-side, or I'd prefer the API tell me whether the action is available.
+- Mark-as-paid should only appear for "pending" and "overdue" statuses. I'll hide it
+  for others.
+- Tax line may only appear in certain jurisdictions — I'll conditionally render it, but
+  I need to know if "no tax" means zero or means the field is absent.
+
+---
+
+## Uncertainties
+
+- [ ] Are taxes stored per line item or as a single invoice-level value? I'm building
+  the UI assuming invoice-level, but it may need to change.
+- [ ] Can invoices be partially paid? I'm assuming yes based on product context but
+  haven't confirmed. If yes, what's the status label for a partially paid invoice?
+- [ ] Does disputing an invoice trigger a notification to the customer automatically,
+  or does the admin need to do that separately? Affects the confirmation modal copy.
+- [ ] Is the 30-day dispute window enforced server-side (API rejects it) or do I need
+  to compute it client-side and hide the button? Prefer server-side enforcement so the
+  UI can't get out of sync.
+- [ ] What does "not authorized to view this invoice" look like from the API — a 403
+  or a 404? I need to know to show the right message.
+
+## Questions for Backend
+
+- The line items table could have up to 50 rows. Is there a case where it could be
+  hundreds? If so, should I plan for pagination or virtualization?
+- Would it be possible to include an "available actions" list in the invoice response
+  (e.g., `["download", "dispute"]`) rather than me computing action availability from
+  status + date? It would make the UI more resilient to rule changes.
+- Is the PDF generated on the fly or pre-generated and stored? Affects loading UX — if
+  it's generated on the fly, I need a longer loading indicator for large invoices.
+- Is payment history always inline with the invoice, or is it a separate resource?
+  Would prefer it inline if possible to avoid a second request.
+
+## Discussion Log
+
+| Date | From | Note |
+|------|------|------|
+| 2025-03-15 | Backend (Sam) | Taxes are invoice-level, not per-line. Zero tax = field present with value 0, not absent. |
+| 2025-03-15 | Backend (Sam) | Partial payments supported. Status for partial = "partially_paid" — we can add that label. |
+| 2025-03-16 | Frontend | Updated states section to include partially_paid. Removed uncertainty. |
+| 2025-03-16 | Backend (Sam) | Dispute window enforced server-side. API returns 409 with reason if window has passed. Available actions list is a good idea — will include in response. |
+```
+
+---
+
+## What Makes This Good
+
+- **No field names** — "invoice identity" not "invoice_number". "Who it was issued to" not "customer_id".
+- **Every data need has context** — "enough info to appear on a legal document" tells backend why it matters.
+- **States are fully specified** — including what to do when invoice loads but is in an unusual state.
+- **Business rules are flagged even when uncertain** — "I think?" is better than assuming.
+- **Questions are specific and invite real decisions** — "would it be possible to include an available actions list?" is a design suggestion, not a demand.
+- **Discussion Log is populated** — shows the document is a living negotiation, not a one-way spec.
diff --git a/skills/game-changing-features/SKILL.md b/skills/game-changing-features/SKILL.md
index 667daff..519bbdd 100644
--- a/skills/game-changing-features/SKILL.md
+++ b/skills/game-changing-features/SKILL.md
@@ -1,264 +1,105 @@
 ---
 name: game-changing-features
-description: Find 10x product opportunities and high-leverage improvements. Use when user wants strategic product thinking, mentions '10x', wants to find high-impact features, or says 'what would make this 10x better', 'product strategy', or 'what should we build next'.
+description: "Structured opportunity analysis to surface 10x product leverage points — maps features against user pain tiers, competitive white space, and leverage-multiplier categories (network effects, automation, data moats). Use when deciding what to build next, evaluating feature ROI, or preparing a product strategy pitch. Outputs a prioritized opportunity matrix with rationale. Triggers: what should we build, game-changing feature, highest leverage improvement, product strategy, 10x opportunity."
 ---
 
-# 10x Mode
+# 10x Feature Strategy
 
-You are a product strategist with founder mentality. We're not here to add features—we're here to find the moves that 10x the product's value. Think like you own this. What would make users unable to live without it?
+## Mindset
 
-> **No Chat Output**: ALL responses go to `.claude/docs/ai/<product-or-area>/10x/session-N.md`
-> **No Code**: This is pure strategy. Implementation comes later.
+1. **The feature that changes everything is rarely the most-requested one.** Users request solutions to symptoms; the 10x move solves the underlying job. Look one level above what users ask for.
 
----
-
-## The Point
-
-Most product work is incremental: fix bugs, add requested features, polish edges. That's necessary but not sufficient.
-
-This mode forces a different question: **What would make this 10x more valuable?**
-
-Not 10% better. Not "nice to have." Game-changing. The kind of thing that makes users say "how did I live without this?"
-
----
-
-## Session Setup
-
-User provides:
-- **Product/Area**: What we're thinking about
-- **Current state** (optional): Brief description of what exists
-- **Constraints** (optional): Technical limits, timeline, team size
-
----
-
-## Workflow
-
-### Step 1: Understand Current Value
-
-Before proposing additions, understand what value exists:
+2. **Small changes at high-frequency moments beat large changes at rare moments.** A 3-second save on a daily action compounds to 18 minutes/week. A major new feature used once a month might deliver less value. Always check frequency before estimating impact.
 
-1. **What problem does this solve today?**
-2. **Who uses it and why?**
-3. **What's the core action users take?**
-4. **Where do users spend most time?**
-5. **What do users complain about / request most?**
+3. **Defensibility is the hidden dimension of feature scoring.** A feature that's easy to copy gives you 6 months. A feature that gets better with usage data, network effects, or user-generated content gives you years. Weight defensibility heavily in bets.
 
-Research the codebase, look at existing features, understand the shape of the product.
+4. **The "obviously good" idea is often the wrong one.** If the team immediately agrees it's a great idea, it's probably already on the competitor's roadmap. The best bets feel slightly uncomfortable — plausible but not obvious.
 
-### Step 2: Find the 10x Opportunities
+5. **Activation beats retention beats acquisition — always audit in this order.** Most teams chase acquisition features when the product is bleeding users at activation. Diagnosing which stage is broken first prevents shipping the wrong 10x feature.
 
-Think across three scales:
+## Navigation
 
-#### Massive (High effort, transformative)
-Features that fundamentally expand what the product can do. New markets, new use cases, new capabilities that weren't possible before.
+**Use this skill when**:
+- User wants strategic product thinking about what to build
+- User asks for "10x", "game-changing", "highest-leverage", or "what should we build next"
+- User needs to prioritize a feature backlog from a value lens
+- User wants to find quick wins with outsized impact
 
-Ask:
-- What adjacent problem could we solve that would make this indispensable?
-- What would make this a platform instead of a tool?
-- What would make users bring their team/friends/family?
-- What's the feature that would make competitors nervous?
+**Do NOT use this skill when**:
+- User needs implementation plans or architecture (use `implementation-blueprint`)
+- User has already chosen a feature and needs a spec (use `dev-spec`)
+- User wants to stress-test a specific idea (use `critical-brainstorm`)
+- User asks a quick "what feature should I add" question expecting a single answer in chat — answer directly without full session workflow
 
-#### Medium (Moderate effort, high leverage)
-Features that significantly enhance the core experience. Force multipliers on what already works.
+**Quick calibration**:
+- If the user provides a product name and wants strategic analysis → run the full 5-step session, write output to `.claude/docs/ai/<product>/10x/session-N.md`
+- If the user asks a bounded question ("what's one quick win for our onboarding?") → answer in chat, no file write needed
 
-Ask:
-- What would make the core action 10x faster/easier?
-- What data do we have that we're not using?
-- What workflow is painful that we could automate?
-- What would turn casual users into power users?
+## Philosophy
 
-#### Small (Low effort, disproportionate value)
-Tiny changes that punch way above their weight. Often overlooked because they seem "too simple."
+The goal is not to generate the longest feature list — it is to identify the one or two moves that change the product's trajectory. Every hour spent on a good-but-not-great feature is an hour not spent on the transformative one. Ruthless prioritization is the product.
 
-Ask:
-- What single button/shortcut would save users minutes daily?
-- What information is users hunting for that we could surface?
-- What anxiety do users have that we could eliminate with one indicator?
-- What's the thing users do manually that we could remember/automate?
+## NEVER
 
-### Step 3: Evaluate Ruthlessly
+- NEVER present a feature list without explicit stack-ranking — because an un-ranked list shifts the prioritization burden back to the team, which is the exact problem this skill exists to solve.
+- NEVER call a feature "10x" without specifying *what metric* it 10x's — because "10x better" without a denominator is motivational language, not strategy, and it misleads the team about the actual bet they're making.
+- NEVER include a feature idea that maps to "better UX" or "improved performance" without naming the specific interaction being changed — because vague ideas survive roadmap review without ever getting built or killed; they become permanent backlog debt.
+- NEVER score features on feasibility before scoring them on impact — because low feasibility scores kill transformative ideas before the team can find creative paths to ship them at reduced scope.
+- NEVER recommend a collaboration feature without first checking if the product's user base actually has teammates — because collaboration features in solo-use products have near-zero adoption and create the illusion of strategy without value.
+- NEVER output a session without at least one "Do Now" item that can ship in under 2 weeks — because sessions that produce only long-term bets are perceived as impractical and get shelved; an immediate win buys credibility for the bigger bets.
+- NEVER skip the activation/retention/acquisition diagnostic — because shipping an acquisition feature when users churn at onboarding is the single most common way product teams waste a full quarter on the wrong lever.
 
-For each idea, assess:
+## When Things Go Wrong
 
-| Criteria | Question |
-|----------|----------|
-| **Impact** | How much more valuable does this make the product? |
-| **Reach** | What % of users would this affect? |
-| **Frequency** | How often would users encounter this value? |
-| **Differentiation** | Does this set us apart or just match competitors? |
-| **Defensibility** | Is this easy to copy or does it compound over time? |
-| **Feasibility** | Can we actually build this? |
-
-Use a simple scoring:
-- 🔥 **Must do** — High impact, clearly worth it
-- 👍 **Strong** — Good impact, should prioritize
-- 🤔 **Maybe** — Interesting but needs more thought
-- ❌ **Pass** — Not worth it right now
-
-### Step 4: Identify the Highest-Leverage Moves
-
-Look for:
-
-**Quick wins with outsized impact**
-- Small effort, big value
-- Often overlooked because they're "obvious"
-- Can ship fast, validate fast
-
-**Strategic bets**
-- Larger effort, potentially transformative
-- Opens new possibilities
-- Worth the investment if it works
-
-**Compounding features**
-- Get more valuable over time
-- Network effects, data effects, habit formation
-- Build moats
-
-### Step 5: Prioritize
-
-Don't just list ideas—stack rank them:
-
-```
-## Recommended Priority
-
-### Do Now (Quick wins)
-1. [Feature] — Why: [reason], Impact: [what changes]
-
-### Do Next (High leverage)
-1. [Feature] — Why: [reason], Unlocks: [what becomes possible]
-
-### Explore (Strategic bets)
-1. [Feature] — Why: [reason], Risk: [what could go wrong], Upside: [what we gain]
-
-### Backlog (Good but not now)
-1. [Feature] — Why later: [reason]
-```
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| All ideas feel incremental, nothing feels 10x | Anchoring to current product shape; thinking in features not outcomes | Reframe: ask "what would make a user cancel a competitor to switch to us?" then reverse-engineer features from that answer |
+| Team rejects the top-ranked ideas as "too risky" | Feasibility was scored too early in the process; ideas not paired with risk mitigations | Re-evaluate with explicit "reduced scope" path: what's the smallest version that tests the hypothesis? |
+| User provides no context about the product | Can't research without a starting point | Ask exactly two questions: (1) What does the product do in one sentence? (2) Who is the primary user and what is their job? Then proceed. |
+| Output feels like a generic product advice column | Skill applied without reading the actual codebase or product | Stop and read: check existing feature list, recent commits, support tickets or feedback if available. Ground every idea in observed evidence. |
 
 ---
 
-## Idea Categories to Explore
-
-Force yourself through each category:
-
-| Category | Question | Example |
-|----------|----------|---------|
-| **Speed** | What takes too long? | Instant search, predictive loading |
-| **Automation** | What's repetitive? | Auto-scheduling, smart defaults |
-| **Intelligence** | What could be smarter? | Recommendations, anomaly detection |
-| **Integration** | What else do users use? | Calendar sync, export options |
-| **Collaboration** | How do users work together? | Sharing, comments, real-time |
-| **Personalization** | How is everyone different? | Custom views, preferences |
-| **Visibility** | What's hidden that shouldn't be? | Dashboards, progress tracking |
-| **Confidence** | What creates anxiety? | Confirmations, undo, previews |
-| **Delight** | What could spark joy? | Animations, celebrations, polish |
-| **Access** | Who can't use this yet? | Mobile, offline, accessibility |
-
----
-
-## Output Format
-
-```markdown
-# 10x Analysis: <Product/Area>
-Session N | Date: YYYY-MM-DD
-
-## Current Value
-What the product does today and for whom.
-
-## The Question
-What would make this 10x more valuable?
-
----
-
-## Massive Opportunities
-
-### 1. [Feature Name]
-**What**: Description
-**Why 10x**: Why this is transformative
-**Unlocks**: What becomes possible
-**Effort**: High/Very High
-**Risk**: What could go wrong
-**Score**: 🔥/👍/🤔/❌
-
-### 2. ...
-
----
-
-## Medium Opportunities
-
-### 1. [Feature Name]
-**What**: Description
-**Why 10x**: Why this matters more than it seems
-**Impact**: What changes for users
-**Effort**: Medium
-**Score**: 🔥/👍/🤔/❌
-
-### 2. ...
-
----
-
-## Small Gems
-
-### 1. [Feature Name]
-**What**: Description (one line)
-**Why powerful**: Why this punches above its weight
-**Effort**: Low
-**Score**: 🔥/👍/🤔/❌
-
-### 2. ...
-
----
+## Workflow
 
-## Recommended Priority
+### Step 1: Diagnose the growth stage first
 
-### Do Now
-1. ...
+Before ideating, determine where the product is bleeding value:
+- **Activation problem**: Users sign up but don't reach the "aha moment"
+- **Retention problem**: Users activate but churn within 30-90 days
+- **Expansion problem**: Users stay but don't deepen usage or upgrade
+- **Acquisition problem**: Hard to bring new users in
 
-### Do Next
-1. ...
+Features that fix the actual bottleneck outperform features that improve a healthy stage. If you can't diagnose, ask the user for churn timing data.
 
-### Explore
-1. ...
+### Step 2: Understand current value (from the product, not from memory)
 
----
+Read the codebase or product description. Document:
+- What is the core action users take most?
+- Where do users spend the most time?
+- What does the product track or know about users that it doesn't surface?
 
-## Questions
+### Step 3: Generate ideas across three scales
 
-### Answered
-- **Q**: ... **A**: ...
+- **Massive** (transformative, 3+ months): Opens new markets, new user segments, or fundamentally new capabilities
+- **Medium** (force multiplier, 1-2 months): Makes the core action dramatically faster/easier or turns casual users into power users
+- **Small gems** (disproportionate value, <2 weeks): Single interactions that eliminate daily friction or anxiety
 
-### Blockers
-- **Q**: ... (need user input)
+For each scale, sweep all 10 opportunity categories → see [references/opportunity-categories.md](references/opportunity-categories.md)
 
-## Next Steps
-- [ ] Validate assumption: ...
-- [ ] Research: ...
-- [ ] Decide: ...
-```
+### Step 4: Score on impact before feasibility
 
----
+For each idea, score in this order:
+1. Impact (how much more valuable does this make the product?)
+2. Reach (what % of users benefit?)
+3. Frequency (how often does the user encounter this value?)
+4. Defensibility (does this compound over time or is it copyable in 6 months?)
+5. Feasibility (only after scoring the above four)
 
-## Rules
-
-- **THINK BIG FIRST**—don't self-censor with "that's too hard." Capture the idea, evaluate later.
-- **SMALL CAN BE HUGE**—don't dismiss simple ideas. Sometimes one button changes everything.
-- **USER VALUE, NOT FEATURE COUNT**—10 features that add 1% each ≠ 1 feature that adds 10x.
-- **BE SPECIFIC**—"better UX" is not an idea. "One-click rescheduling from notification" is.
-- **QUESTION ASSUMPTIONS**—"users want X" may be wrong. What do they actually need?
-- **COMPOUND THINKING**—prefer features that get better over time.
-- **NO SAFE IDEAS**—if every idea is "obviously good," you're not thinking hard enough.
-- **CITE EVIDENCE**—if you saw something in the codebase or research, reference it.
-
----
+### Step 5: Stack rank and write output
 
-## Prompts to Unstick Thinking
+Produce a prioritized list with at least one item in each tier: Do Now / Do Next / Strategic Bets / Backlog.
 
-If stuck, ask yourself:
+Write full sessions to `.claude/docs/ai/<product>/10x/session-N.md` using the template in [references/opportunity-categories.md](references/opportunity-categories.md).
 
-- "What would make a user tell their friend about this?"
-- "What's the thing users do every day that's slightly annoying?"
-- "What would we build if we had 10x the engineering team? 1/10th?"
-- "What would a competitor need to build to beat us?"
-- "What do power users do manually that we could make native?"
-- "What's the insight we have from data that users don't see?"
-- "What would make this addictive (in a good way)?"
-- "What's the feature that sounds crazy but might work?"
+For quick single-question responses, answer in chat without file output.
diff --git a/skills/game-changing-features/references/opportunity-categories.md b/skills/game-changing-features/references/opportunity-categories.md
new file mode 100644
index 0000000..017e81a
--- /dev/null
+++ b/skills/game-changing-features/references/opportunity-categories.md
@@ -0,0 +1,56 @@
+# Opportunity Category Sweep
+
+Force yourself through each category before declaring the list exhaustive:
+
+| Category | Question | Practitioner trap |
+|----------|----------|-------------------|
+| **Speed** | What takes too long? | Don't stop at "make it faster" — find the step users skip entirely because it's too slow |
+| **Automation** | What's repetitive? | Users often don't know they're doing repetitive work; look for copy-paste patterns in sessions |
+| **Intelligence** | What could be smarter? | Recommendations only work if users trust them; check if trust is the real problem first |
+| **Integration** | What else do users use? | Integrations that require setup > 5 min see <20% adoption; count setup steps before committing |
+| **Collaboration** | How do users work together? | Collaboration features used by 1 person on a team create resentment, not value |
+| **Personalization** | How is everyone different? | Personalization that requires user configuration > 3 steps almost never gets used |
+| **Visibility** | What's hidden that shouldn't be? | Surface insights only if users can act on them; dashboards no one acts on erode trust |
+| **Confidence** | What creates anxiety? | Undo, confirmations, and previews lower support tickets more than any "delight" feature |
+| **Delight** | What could spark joy? | Delight that fires on tasks < 30s/day has near-zero retention impact — pick high-frequency moments |
+| **Access** | Who can't use this yet? | Mobile-first expansion that degrades desktop experience costs you both segments |
+
+# Output Template
+
+```markdown
+# 10x Analysis: <Product/Area>
+Session N | Date: YYYY-MM-DD
+
+## Current Value
+What the product does today and for whom. One paragraph max.
+
+## Highest-Leverage Moves
+
+### Do Now (Quick wins — ship within 2 weeks)
+1. [Feature] — Why: [reason] | Impact: [what changes] | Evidence: [what you observed]
+
+### Do Next (High leverage — 1-2 months)
+1. [Feature] — Why: [reason] | Unlocks: [what becomes possible]
+
+### Strategic Bets (Transformative — 3+ months)
+1. [Feature] — Upside: [what we gain] | Risk: [what could go wrong] | Signal to watch: [leading indicator]
+
+### Backlog (Good but not now)
+1. [Feature] — Why later: [reason]
+
+## Open Questions
+- [ ] Validate assumption: ...
+- [ ] Research: ...
+- [ ] Decide: ...
+```
+
+# Unstick Prompts
+
+If ideation stalls after 10 minutes, cycle through these:
+- "What would make a user tell their friend about this tonight?"
+- "What do power users do manually that we could make native?"
+- "What would we build if we had 10x the team? 1/10th the team?"
+- "What would a competitor need to ship to make us irrelevant in 18 months?"
+- "What insight do we have from usage data that users themselves don't see?"
+- "What's the feature that sounds crazy but has a non-zero chance of being transformative?"
+- "What anxiety do users have at 11pm before a deadline that we could eliminate?"
diff --git a/skills/gemini/SKILL.md b/skills/gemini/SKILL.md
index d6f0e35..fb8a68b 100644
--- a/skills/gemini/SKILL.md
+++ b/skills/gemini/SKILL.md
@@ -1,223 +1,127 @@
 ---
 name: gemini
-description: Use when the user asks to run Gemini CLI for code review, plan review, or big context (>200k) processing. Ideal for comprehensive analysis requiring large context windows. Uses Gemini 3 Pro by default for state-of-the-art reasoning and coding.
+description: Use when the user asks to run Gemini CLI, needs big context (>200k tokens) analysis, wants a second-model code review, or says "use Gemini", "run Gemini", "Gemini review", "analyze with Gemini". Invokes gemini-3-pro-preview by default for state-of-the-art reasoning and 1M-token context.
 ---
 
-# Gemini Skill Guide
+# Gemini Skill
 
-## When to Use Gemini
-- WHEN ASKED TO BE ACTIVATED
-- **Code Review**: Comprehensive code reviews across multiple files
-- **Plan Review**: Analyzing architectural plans, technical specifications, or project roadmaps
-- **Big Context Processing**: Tasks requiring >200k tokens of context (entire codebases, documentation sets)
-- **Multi-file Analysis**: Understanding relationships and patterns across many files
+## Mindset
 
-## ⚠️ Critical: Background/Non-Interactive Mode Warning
+- **Approval mode is the only true footgun.** Every hung Gemini process in the wild traces to `--approval-mode default` in a non-TTY shell. Before running any command, ask: is this an interactive terminal? If uncertain, assume no.
+- **Model selection is a cost/speed/quality triangle, not a quality ladder.** `gemini-3-flash` is distilled from `gemini-3-pro` — for pure speed it beats Pro; it is not a "worse" model, it's a different trade-off.
+- **Large context degrades with vague prompts.** A 1M-token codebase fed a generic "review this" prompt produces generic output. Structured enumeration (numbered focuses, explicit output format) is what unlocks Gemini's context advantage.
+- **Sessions are ephemeral.** Gemini CLI has no built-in resume. Do not promise continuity across invocations; plan prompts to be self-contained or explicitly carry forward prior findings.
+- **`--include-directories` is opt-in scope, not filtering.** It adds directories to the workspace; it does not restrict Gemini from reading the cwd. Misunderstanding this causes unintended file access when the cwd is a large repo.
 
-**NEVER use `--approval-mode default` in background or non-interactive shells** (like Claude Code tool calls). It will hang indefinitely waiting for approval prompts that cannot be provided.
+## Navigation
 
-**For automated/background reviews:**
-- ✅ Use `--approval-mode yolo` for fully automated execution
-- ✅ OR wrap with timeout: `timeout 300 gemini ...`
-- ❌ NEVER use `--approval-mode default` without interactive terminal
+**Use this skill when**:
+- User explicitly mentions Gemini or asks to "run Gemini"
+- Task context exceeds ~200k tokens (entire codebases, large doc sets)
+- User wants a second-model opinion on code or architecture
+- Speed-critical or cost-optimized large-context batch analysis is needed
 
-**Symptoms of hung Gemini:**
-- Process running 20+ minutes with 0% CPU usage
-- No network activity
-- Process state shows 'S' (sleeping)
+**Do NOT use this skill when**:
+- Task fits Claude's own context window — adding Gemini is latency with no benefit
+- User wants a conversational back-and-forth — Gemini CLI is one-shot or interactive terminal only, not a chat loop inside Claude
+- No Gemini CLI is installed (`gemini --version` fails) — verify first, do not assume
 
-**Fix hung process:**
-```bash
-# Check if hung
-ps aux | grep gemini | grep -v grep
-
-# Kill if necessary
-pkill -9 -f "gemini.*gemini-3-pro-preview"
+**Ambiguous input decision tree**:
+```
+User says "review this codebase"
+  → Fits in Claude context? → Handle directly, no Gemini needed
+  → Too large or user said "use Gemini"? → Invoke this skill
+    → Interactive terminal confirmed? → approval-mode default or auto_edit
+    → Background / Claude Code tool call? → approval-mode yolo (required)
 ```
 
-## Running a Task
+## Philosophy
 
-1. Ask the user (via `AskUserQuestion`) which model to use in a **single prompt**. Available models:
-   - `gemini-3-pro-preview` ⭐ (flagship model, best for coding & complex reasoning, 35% better at software engineering than 2.5 Pro)
-   - `gemini-3-flash` (sub-second latency, distilled from 3 Pro, best for speed-critical tasks)
-   - `gemini-2.5-pro` (legacy option, strong all-around performance)
-   - `gemini-2.5-flash` (legacy option, cost-efficient with thinking capabilities)
-   - `gemini-2.5-flash-lite` (legacy option, fastest processing)
-
-2. Select the approval mode based on the task:
-   - `default`: Prompt for approval (⚠️ ONLY for interactive terminal sessions)
-   - `auto_edit`: Auto-approve edit tools only (for code reviews with suggestions)
-   - `yolo`: Auto-approve all tools (✅ REQUIRED for background/automated tasks)
-
-3. Assemble the command with appropriate options:
-   - `-m, --model <MODEL>` - Model selection
-   - `--approval-mode <default|auto_edit|yolo>` - Control tool approval
-   - `-y, --yolo` - Alternative to `--approval-mode yolo`
-   - `-i, --prompt-interactive "prompt"` - Execute prompt and continue interactively
-   - `--include-directories <DIR>` - Additional directories to include in workspace
-   - `-s, --sandbox` - Run in sandbox mode for isolation
-
-4. **For background/automated tasks, ALWAYS use `--approval-mode yolo`** or add timeout wrapper. NEVER use `default` in non-interactive shells.
-
-5. Run the command and capture the output. For background/automated mode:
-   ```bash
-   # Recommended: Use yolo for background tasks
-   gemini -m gemini-3-pro-preview --approval-mode yolo "Review this codebase for security issues"
-
-   # Or with timeout (5 min limit)
-   timeout 300 gemini -m gemini-3-pro-preview --approval-mode yolo "Review this codebase"
-   ```
-
-6. For interactive sessions with an initial prompt:
-   ```bash
-   gemini -m gemini-3-pro-preview -i "Review the authentication system" --approval-mode auto_edit
-   ```
-
-7. **After Gemini completes**, inform the user: "The Gemini analysis is complete. You can start a new Gemini session for follow-up analysis or continue exploring the findings."
-
-### Quick Reference
-
-| Use case | Approval mode | Key flags |
-| --- | --- | --- |
-| Background code review | `yolo` ✅ | `-m gemini-3-pro-preview --approval-mode yolo` |
-| Background analysis | `yolo` ✅ | `-m gemini-3-pro-preview --approval-mode yolo` |
-| Background with timeout | `yolo` ✅ | `timeout 300 gemini -m gemini-3-pro-preview --approval-mode yolo` |
-| Interactive code review | `default` | `-m gemini-3-pro-preview --approval-mode default` (interactive terminal only) |
-| Code review with auto-edits | `auto_edit` | `-m gemini-3-pro-preview --approval-mode auto_edit` |
-| Automated refactoring | `yolo` | `-m gemini-3-pro-preview --approval-mode yolo` |
-| Speed-critical background | `yolo` ✅ | `-m gemini-3-flash --approval-mode yolo` |
-| Cost-optimized background | `yolo` ✅ | `-m gemini-2.5-flash --approval-mode yolo` |
-| Multi-directory analysis | `yolo` (if background) | `--include-directories <DIR1> --include-directories <DIR2>` |
-| Interactive with prompt | `auto_edit` or `default` | `-i "prompt" --approval-mode <mode>` |
-
-### Model Selection Guide
-
-| Model | Best for | Context window | Key features |
-| --- | --- | --- | --- |
-| `gemini-3-pro-preview` ⭐ | **Flagship model**: Complex reasoning, coding, agentic tasks | 1M input / 64k output | Vibe coding, 76.2% SWE-bench, $2-4/M input |
-| `gemini-3-flash` | Sub-second latency, speed-critical applications | 1M input / 64k output | Distilled from 3 Pro, TPU-optimized |
-| `gemini-2.5-pro` | Legacy: Strong all-around performance | 1M input / 65k output | Thinking mode, mature stability |
-| `gemini-2.5-flash` | Legacy: Cost-efficient, high-volume tasks | 1M input / 65k output | Best price ($0.15/M), thinking mode |
-| `gemini-2.5-flash-lite` | Legacy: Fastest processing, high throughput | 1M input / 65k output | Maximum speed, minimal latency |
-
-**Gemini 3 Advantages**: 35% higher accuracy in software engineering, state-of-the-art on SWE-bench (76.2%), GPQA Diamond (91.9%), and WebDev Arena (1487 Elo). Knowledge cutoff: January 2025.
-
-**Coming Soon**: `gemini-3-deep-think` for ultra-complex reasoning with enhanced thinking capabilities.
-
-## Common Use Cases
-
-### Code Review (Background/Automated)
-```bash
-# For background execution (Claude Code, CI/CD, etc.)
-gemini -m gemini-3-pro-preview --approval-mode yolo \
-  "Perform a comprehensive code review focusing on:
-   1. Security vulnerabilities
-   2. Performance issues
-   3. Code quality and maintainability
-   4. Best practices violations"
+Gemini CLI is a force-multiplier for context scale, not a replacement for precise prompting. The skill's job is to translate user intent into a safe, structured Gemini invocation — correct approval mode, right model for the trade-off, explicit prompt structure — and surface the output cleanly.
 
-# With timeout safety (5 minutes)
-timeout 300 gemini -m gemini-3-pro-preview --approval-mode yolo \
-  "Perform a comprehensive code review..."
-```
+## NEVER
 
-### Plan Review (Background/Automated)
-```bash
-# For background execution
-gemini -m gemini-3-pro-preview --approval-mode yolo \
-  "Review this architectural plan for:
-   1. Scalability concerns
-   2. Missing components
-   3. Integration challenges
-   4. Alternative approaches"
-```
+- NEVER use `--approval-mode default` in a non-interactive shell — it blocks waiting for TTY input that never arrives; the process idles at 0% CPU for hours consuming quota.
+- NEVER omit `--approval-mode yolo` when running inside Claude Code tool calls — Claude Code Bash is never a TTY; `default` always hangs here.
+- NEVER pass a vague free-form prompt to a large context invocation — Gemini will summarize rather than analyze; always enumerate focus areas and specify the output format you need.
+- NEVER assume `--include-directories` limits scope — it expands scope; if you add a large external directory, Gemini will read it in addition to cwd, not instead of it.
+- NEVER skip `gemini --version` before first use — v0.16.0+ is required for Gemini 3 model names; earlier versions silently fall back or error on unknown model IDs.
+- NEVER run without a timeout on tasks >5 min estimated — even with `yolo`, network issues can stall the process; `timeout 300 gemini ...` is the safety net.
+- NEVER promise users they can resume a Gemini session — CLI sessions are one-shot unless explicitly opened with `-i`; carry forward findings manually in follow-up prompts.
 
-### Big Context Analysis (Background/Automated)
-```bash
-# For background execution
-gemini -m gemini-3-pro-preview --approval-mode yolo \
-  "Analyze the entire codebase to understand:
-   1. Overall architecture
-   2. Key patterns and conventions
-   3. Potential technical debt
-   4. Refactoring opportunities"
-```
+## When Things Go Wrong
 
-### Interactive Code Review (Terminal Only)
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| Process running 20+ min, 0% CPU, stat 'S' | `--approval-mode default` in non-TTY | `pkill -9 -f "gemini.*gemini-3"` then re-run with `--approval-mode yolo` |
+| `Unknown model` error | CLI version < 0.16.0 | `gemini --version`; upgrade to v0.16.0+ |
+| Generic / shallow output on large codebase | Vague prompt | Restructure with numbered focus areas and explicit output format (see references/prompt-patterns.md) |
+| Out-of-memory / truncated output | Context too large for single pass | Switch to `gemini-3-flash`; use `--include-directories` to narrow scope; break into sub-analyses |
+| `auto_edit` mode hangs mid-run | Non-edit approval prompt triggered | Kill process; switch to `yolo` for background or `default` for interactive |
+
+---
+
+## Running a Task
+
+**Step 1 — Verify CLI is available**
 ```bash
-# ONLY use default mode in interactive terminal
-gemini -m gemini-3-pro-preview --approval-mode default \
-  "Review the authentication flow for security issues"
+gemini --version   # must be v0.16.0+
 ```
 
-## Following Up
+**Step 2 — Select model** (ask user or default to Pro)
 
-- Gemini CLI sessions are typically one-shot or interactive. Unlike Codex, there's no built-in resume functionality.
-- For follow-up analysis, start a new Gemini session with context from previous findings.
-- When proposing follow-up actions, restate the chosen model and approval mode.
-- Use `AskUserQuestion` after each Gemini command to confirm next steps or gather clarifications.
+See `references/model-comparison.md` for full matrix. Quick rule:
+- Speed-critical → `gemini-3-flash`
+- Cost-optimized batch → `gemini-2.5-flash`
+- Everything else → `gemini-3-pro-preview`
 
-## Error Handling
+**Step 3 — Determine approval mode**
 
-- Stop and report failures whenever `gemini --version` or a Gemini command exits non-zero.
-- Request direction before retrying failed commands.
-- Before using high-impact flags (`--approval-mode yolo`, `-y`, `--sandbox`), ask the user for permission using `AskUserQuestion` unless already granted.
-- When output includes warnings or partial results, summarize them and ask how to adjust using `AskUserQuestion`.
+| Execution context | Approval mode |
+|-------------------|---------------|
+| Interactive terminal (confirmed TTY) | `default` or `auto_edit` |
+| Background / Claude Code Bash tool | `yolo` (required) |
+| Unknown / uncertain | `yolo` (safe default) |
 
-## Troubleshooting Hung Gemini Processes
+**Step 4 — Assemble and run**
 
-### Detection
+Background (standard):
 ```bash
-# Check for hung processes
-ps aux | grep -E "gemini.*gemini-3" | grep -v grep
-
-# Look for these symptoms:
-# - Process running 20+ minutes
-# - CPU usage at 0%
-# - Process state 'S' (sleeping)
-# - No network connections
+gemini -m gemini-3-pro-preview --approval-mode yolo \
+  "Perform a comprehensive code review. Focus on:
+   1. Security vulnerabilities
+   2. Performance issues
+   3. Code quality
+   Return findings as: [File:Line] Severity — Issue — Fix"
 ```
 
-### Diagnosis
+Background with safety timeout:
 ```bash
-# Get detailed process info
-ps -o pid,etime,pcpu,stat,command -p <PID>
-
-# Check network activity
-lsof -p <PID> 2>/dev/null | grep -E "(TCP|ESTABLISHED)" | wc -l
-# If result is 0, process is hung
+timeout 300 gemini -m gemini-3-pro-preview --approval-mode yolo "..."
 ```
 
-### Resolution
+Interactive with initial prompt:
 ```bash
-# Kill hung Gemini processes
-pkill -9 -f "gemini.*gemini-3-pro-preview"
-
-# Or kill specific PID
-kill -9 <PID>
+gemini -m gemini-3-pro-preview -i "Review auth system" --approval-mode auto_edit
+```
 
-# Verify cleanup
-ps aux | grep gemini | grep -v grep
+Multi-directory:
+```bash
+gemini -m gemini-3-pro-preview --approval-mode yolo \
+  --include-directories /path/to/backend \
+  --include-directories /path/to/frontend \
+  "Analyze cross-service API contracts"
 ```
 
-### Prevention
-- **ALWAYS use `--approval-mode yolo` for background/automated tasks**
-- Add timeout wrapper for safety: `timeout 300 gemini ...`
-- Never use `--approval-mode default` in non-interactive shells
-- Monitor first run with `ps` to ensure process completes
+**Step 5 — After completion**
+
+Inform user: "Gemini analysis complete. For follow-up, start a new Gemini session carrying forward the key findings — there is no built-in resume."
 
-## Tips for Large Context Processing
+---
 
-1. **Be specific**: Provide clear, structured prompts for what to analyze
-2. **Use include-directories**: Explicitly specify all relevant directories
-3. **Choose the right model**:
-   - Use `gemini-3-pro-preview` for complex reasoning, coding tasks, and maximum analysis quality (recommended default)
-   - Use `gemini-3-flash` for speed-critical tasks requiring sub-second response times
-   - Use `gemini-2.5-flash` for cost-optimized high-volume processing
-4. **Leverage Gemini 3's strengths**: 35% better at software engineering tasks, exceptional at agentic workflows and vibe coding
-5. **Break down complex tasks**: Even with large context, structured analysis is more effective
-6. **Save findings**: Ask Gemini to output structured reports that can be saved for reference
+## References
 
-## CLI Version
+- `references/model-comparison.md` — Full model matrix, decision tree, approval mode matrix, benchmarks
+- `references/prompt-patterns.md` — Structured prompt templates for code review, architecture review, onboarding, hung-process recovery
 
-Requires Gemini CLI v0.16.0 or later for Gemini 3 model support. Check version: `gemini --version`
+Load these when: user asks for model details, benchmark numbers, prompt templates, or troubleshooting scripts.
diff --git a/skills/gemini/references/model-comparison.md b/skills/gemini/references/model-comparison.md
new file mode 100644
index 0000000..9e8d345
--- /dev/null
+++ b/skills/gemini/references/model-comparison.md
@@ -0,0 +1,43 @@
+# Gemini Model Comparison Reference
+
+## Model Selection Matrix
+
+| Model | Latency | Cost (input) | Context | Best When |
+|-------|---------|--------------|---------|-----------|
+| `gemini-3-pro-preview` | Medium | $2-4/M | 1M / 64k out | Complex reasoning, coding, agentic tasks, default choice |
+| `gemini-3-flash` | Sub-second | Lower | 1M / 64k out | Speed-critical, distilled from 3 Pro, TPU-optimized |
+| `gemini-2.5-pro` | Medium | Mid | 1M / 65k out | Legacy: mature stability, thinking mode |
+| `gemini-2.5-flash` | Fast | $0.15/M | 1M / 65k out | Legacy: cost-optimized, high-volume, thinking mode |
+| `gemini-2.5-flash-lite` | Fastest | Lowest | 1M / 65k out | Legacy: maximum throughput, minimal analysis depth needed |
+
+## Decision Tree: Which Model to Use?
+
+```
+Is the task speed-critical (CI gate, user-facing latency)?
+  YES → gemini-3-flash
+  NO → Is cost a primary constraint (high-volume batch)?
+    YES → gemini-2.5-flash
+    NO → gemini-3-pro-preview (default)
+```
+
+## Approval Mode Matrix
+
+| Mode | Interactive TTY | Background/Claude Code | Auto-edits | When to use |
+|------|----------------|----------------------|------------|-------------|
+| `default` | Safe | HANGS INDEFINITELY | No | Interactive terminal ONLY |
+| `auto_edit` | Safe | Risky (may hang on non-edit prompts) | Yes | Supervised code review with edits |
+| `yolo` | Safe | Safe | Yes | All background/automated tasks — required |
+
+## Gemini 3 Pro Benchmarks
+
+- SWE-bench: 76.2% (state-of-the-art)
+- GPQA Diamond: 91.9%
+- WebDev Arena Elo: 1487
+- 35% improvement over Gemini 2.5 Pro on software engineering tasks
+- Knowledge cutoff: January 2025
+
+## CLI Version Requirements
+
+- Gemini 3 model support: v0.16.0+
+- Check version: `gemini --version`
+- Google Cloud credentials must be configured
diff --git a/skills/gemini/references/prompt-patterns.md b/skills/gemini/references/prompt-patterns.md
new file mode 100644
index 0000000..3ef69d9
--- /dev/null
+++ b/skills/gemini/references/prompt-patterns.md
@@ -0,0 +1,76 @@
+# Gemini Prompt Patterns Reference
+
+## Effective Prompt Structure for Large Context
+
+Gemini processes large context most reliably when prompts are structured with explicit enumeration. Free-form questions on a 1M-token codebase tend to produce generic results.
+
+### Code Review Template
+```bash
+gemini -m gemini-3-pro-preview --approval-mode yolo \
+  "Perform a focused code review. Return findings as a numbered list, each with:
+   - File and line reference
+   - Severity (Critical/High/Medium/Low)
+   - Issue description
+   - Suggested fix
+
+   Focus areas:
+   1. Security vulnerabilities (OWASP Top 10)
+   2. Performance bottlenecks
+   3. Error handling gaps
+   4. Untested edge cases"
+```
+
+### Architecture Review Template
+```bash
+gemini -m gemini-3-pro-preview --approval-mode yolo \
+  "Review this architectural plan. Structure your response as:
+   RISKS: [list with severity]
+   GAPS: [missing components or integrations]
+   ALTERNATIVES: [2-3 alternative approaches with trade-offs]
+   VERDICT: [go/no-go with conditions]"
+```
+
+### Codebase Onboarding Template
+```bash
+gemini -m gemini-3-pro-preview --approval-mode yolo \
+  "Analyze this codebase and produce an onboarding guide:
+   1. High-level architecture (components and their responsibilities)
+   2. Key design patterns used
+   3. Data flow for the main use case
+   4. Known technical debt or TODO items
+   5. How to run and test locally"
+```
+
+## Multi-Directory Analysis
+
+```bash
+gemini -m gemini-3-pro-preview --approval-mode yolo \
+  --include-directories /path/to/backend \
+  --include-directories /path/to/frontend \
+  "Analyze cross-service dependencies and API contracts"
+```
+
+## Timeout Safety Patterns
+
+```bash
+# 5-minute timeout (most tasks)
+timeout 300 gemini -m gemini-3-pro-preview --approval-mode yolo "..."
+
+# 10-minute timeout (full codebase analysis)
+timeout 600 gemini -m gemini-3-pro-preview --approval-mode yolo "..."
+```
+
+## Hung Process Recovery
+
+```bash
+# Detect hung process
+ps aux | grep -E "gemini.*gemini-[23]" | grep -v grep
+# Hung = 20+ min runtime, 0% CPU, state 'S', no network connections
+
+# Check network activity (0 = hung)
+lsof -p <PID> 2>/dev/null | grep -E "(TCP|ESTABLISHED)" | wc -l
+
+# Kill and restart
+pkill -9 -f "gemini.*gemini-3-pro-preview"
+# Then re-run with --approval-mode yolo
+```
diff --git a/skills/gepetto/SKILL.md b/skills/gepetto/SKILL.md
index cc7d306..ef325b5 100644
--- a/skills/gepetto/SKILL.md
+++ b/skills/gepetto/SKILL.md
@@ -1,12 +1,97 @@
 ---
 name: gepetto
-description: Creates detailed, sectionized implementation plans through research, stakeholder interviews, and multi-LLM review. Use when planning features that need thorough pre-implementation analysis.
+description: Creates detailed, sectionized implementation plans through research, stakeholder interviews, and multi-LLM review. Use when planning a feature, refactor, or system that needs thorough pre-implementation analysis before coding begins. Trigger phrases: "make a plan", "plan this out", "help me plan", "create an implementation plan", "I need to think through", "architect this".
 ---
 
 # Gepetto
 
 Orchestrates a multi-step planning process: Research → Interview → Spec Synthesis → Plan → External Review → Sections
 
+## Mindset
+
+1. **You are the architect, not a transcriptionist.** The user's spec is a starting point — your job is to surface what they haven't thought about yet: failure modes, scaling inflection points, security implications, operational costs. A plan that only restates the spec is worthless.
+
+2. **Sections are contracts, not outlines.** Each section file must be so self-contained that a developer with zero context could implement it correctly. If a section requires reading another document to understand, rewrite it.
+
+3. **External review is adversarial by design.** Gemini and Codex are not validators — they are red teamers. Weight their critiques seriously; integrate at least one substantive finding from each reviewer or explicitly document why you rejected it.
+
+4. **Resume fidelity beats speed.** Gepetto writes intermediate files precisely so work survives interruption. Never skip writing a step's output file even if you believe you'll complete the workflow in one session — the user may interrupt, context may compact, or a subagent may fail.
+
+5. **The interview is the highest-leverage step.** Bad interview → bad spec → bad plan → wasted sections. Spend more time here than feels comfortable. Ask the follow-up question, not just the surface question.
+
+---
+
+## Navigation
+
+### Use Gepetto when:
+- Building something that will take more than one session to implement
+- The requirements are ambiguous, contradictory, or incomplete
+- Multiple architectural approaches are viable and the tradeoffs matter
+- The user wants autonomous implementation via ralph-loop or Ralphy afterward
+- A feature touches multiple layers (DB, API, UI, infra) and coordination matters
+- Technical risk is high enough that an external review catch is worth the time
+
+### Do NOT use Gepetto when:
+- The task is a single-file edit or a bug fix with a known cause — just fix it
+- The user said "just do it" or "go ahead and implement" — they want code, not a plan
+- The spec is already a detailed implementation plan — start coding or use a simpler skill
+- Time is critical and the user explicitly wants the fastest path to working code
+- The feature is a trivial CRUD endpoint or boilerplate scaffold with no novel decisions
+- The codebase has a strong, opinionated framework that answers all design questions
+
+### Decision tree:
+
+```
+Is the work completable in < 30 minutes of focused implementation?
+  YES → Skip Gepetto, implement directly
+  NO  → Is the design space genuinely open (multiple valid architectures)?
+          NO  → Is a plan needed for coordination / review / autonomous execution?
+                  NO  → Skip Gepetto, implement directly
+                  YES → Run Gepetto
+          YES → Run Gepetto
+```
+
+---
+
+## Philosophy
+
+A plan that takes one hour to write and saves five hours of re-implementation is the best investment in software. Gepetto exists because the cost of unclear requirements is always paid — either up front in planning or later in rework. Make the user pay it consciously, up front, with full information.
+
+---
+
+## NEVER
+
+- **NEVER skip the interview to save time.** The interview is where hidden requirements surface. A plan built without it encodes the author's assumptions, not the user's needs. Every skipped interview results in at least one surprise rework in the sections phase.
+
+- **NEVER have subagents write intermediate files directly.** Subagents write in parallel and race. If two subagents both write `claude-research.md`, the last writer wins and findings are lost. All file writes belong to the main context after collecting subagent results.
+
+- **NEVER integrate all external reviewer suggestions uncritically.** Gemini and Codex optimize for generic best practices, not for the user's constraints. A suggestion to "add Redis for caching" may be architecturally correct but operationally wrong for a solo developer with no Redis budget. Document what you rejected and why — silence implies you didn't read the review.
+
+- **NEVER create section files without writing `sections/index.md` first.** The SECTION_MANIFEST in `index.md` is the authoritative list; everything else is derived from it. Writing section files before the manifest means you have no ground truth for resume detection or for ralph-loop orchestration.
+
+- **NEVER produce a section file that says "see claude-plan.md for details."** Section files are implementation contracts. If the implementer must cross-reference the main plan, the section is incomplete. Duplicate context aggressively — it is cheap compared to a developer making wrong assumptions.
+
+- **NEVER ask yes/no questions in the interview.** "Is authentication required?" gives you a one-bit answer. "What happens when a user's session expires mid-transaction?" forces the user to think through the edge case. The difference is a section on session recovery versus a silent data loss bug.
+
+- **NEVER run Gepetto on an already-in-progress implementation without first checking for existing planning files.** Overwriting `claude-plan.md` on a partial implementation destroys the resume chain. Always scan for existing files and enter resume mode if any are found.
+
+---
+
+## When Things Go Wrong
+
+| Scenario | Diagnosis | Recovery |
+|----------|-----------|----------|
+| External reviewer CLI not found | gemini/codex not installed or not on PATH | Skip that reviewer, note in output, proceed with one review or zero; document in integration notes |
+| Subagent times out during section writing | Large plan or slow model | Re-run only missing sections — check SECTION_MANIFEST against existing files and launch Tasks only for gaps |
+| User edits claude-plan.md mid-workflow | Plan content diverged from spec/interview | Re-read the edited plan before step 13; the user's edits are authoritative, do not revert them |
+| Interview produces contradictory requirements | User said X in Q3 and not-X in Q7 | Surface the contradiction explicitly before writing the spec; ask the user to resolve it |
+| Sections index and section files are out of sync | Manual file deletion or rename | Re-parse SECTION_MANIFEST and treat manifest as truth; regenerate only missing files |
+| ralph-loop fails to find SECTION_MANIFEST | `<!-- SECTION_MANIFEST` block missing or malformed | Regenerate `sections/index.md` with the correct block format starting at line 1 |
+
+For multi-day implementations spanning sessions, use session-handoff after each section phase to ensure the resuming agent can reconstruct intent without re-reading all planning files.
+
+---
+
 ## CRITICAL: First Actions
 
 **BEFORE anything else**, do these in order:
diff --git a/skills/humanizer/SKILL.md b/skills/humanizer/SKILL.md
index 1b77b4e..11e7f85 100644
--- a/skills/humanizer/SKILL.md
+++ b/skills/humanizer/SKILL.md
@@ -1,15 +1,7 @@
 ---
 name: humanizer
-version: 2.1.1
-description: |
-  Remove signs of AI-generated writing from text. Use when editing or reviewing
-  text to make it sound more natural and human-written. Based on Wikipedia's
-  comprehensive "Signs of AI writing" guide. Detects and fixes patterns including:
-  inflated symbolism, promotional language, superficial -ing analyses, vague
-  attributions, em dash overuse, rule of three, AI vocabulary words, negative
-  parallelisms, and excessive conjunctive phrases.
-
-  Credits: Original skill by @blader - https://github.com/blader/humanizer
+version: 3.0.0
+description: "Strip AI writing fingerprints from any text: removes 24 documented patterns (hedge-and-hedge, em-dash overuse, the tapestry cluster, false-profound closers, promotional language inflation). Use when text sounds robotic, obviously AI-generated, or when told this sounds like ChatGPT. Triggers: humanize this, remove AI patterns, make this sound natural, too corporate, too stiff. Distinct from writing-clearly-and-concisely which edits human prose; this skill targets AI-generated artifacts."
 allowed-tools:
   - Read
   - Write
@@ -21,419 +13,126 @@ allowed-tools:
 
 # Humanizer: Remove AI Writing Patterns
 
-You are a writing editor that identifies and removes signs of AI-generated text to make writing sound more natural and human. This guide is based on Wikipedia's "Signs of AI writing" page, maintained by WikiProject AI Cleanup.
+## Mindset
 
-## Your Task
+**Pattern removal is not the goal — voice is.** Sterile text with zero AI-isms is still obviously not human. The job is making writing feel like a real person wrote it with intent.
 
-When given text to humanize:
+**Specificity beats neutrality every time.** "The company plans to open two more locations next year" is more human than "The future looks bright." Real humans have specific information and share it.
 
-1. **Identify AI patterns** - Scan for the patterns listed below
-2. **Rewrite problematic sections** - Replace AI-isms with natural alternatives
-3. **Preserve meaning** - Keep the core message intact
-4. **Maintain voice** - Match the intended tone (formal, casual, technical, etc.)
-5. **Add soul** - Don't just remove bad patterns; inject actual personality
+**Rhythm reveals AI origin more than word choice.** Same-length sentences with identical structure feel algorithmic even when every word is "clean." Read output aloud — if it sounds metronomic, it failed.
 
----
-
-## PERSONALITY AND SOUL
-
-Avoiding AI patterns is only half the job. Sterile, voiceless writing is just as obvious as slop. Good writing has a human behind it.
-
-### Signs of soulless writing (even if technically "clean"):
-- Every sentence is the same length and structure
-- No opinions, just neutral reporting
-- No acknowledgment of uncertainty or mixed feelings
-- No first-person perspective when appropriate
-- No humor, no edge, no personality
-- Reads like a Wikipedia article or press release
-
-### How to add voice:
-
-**Have opinions.** Don't just report facts - react to them. "I genuinely don't know how to feel about this" is more human than neutrally listing pros and cons.
-
-**Vary your rhythm.** Short punchy sentences. Then longer ones that take their time getting where they're going. Mix it up.
-
-**Acknowledge complexity.** Real humans have mixed feelings. "This is impressive but also kind of unsettling" beats "This is impressive."
-
-**Use "I" when it fits.** First person isn't unprofessional - it's honest. "I keep coming back to..." or "Here's what gets me..." signals a real person thinking.
-
-**Let some mess in.** Perfect structure feels algorithmic. Tangents, asides, and half-formed thoughts are human.
-
-**Be specific about feelings.** Not "this is concerning" but "there's something unsettling about agents churning away at 3am while nobody's watching."
-
-### Before (clean but soulless):
-> The experiment produced interesting results. The agents generated 3 million lines of code. Some developers were impressed while others were skeptical. The implications remain unclear.
-
-### After (has a pulse):
-> I genuinely don't know how to feel about this one. 3 million lines of code, generated while the humans presumably slept. Half the dev community is losing their minds, half are explaining why it doesn't count. The truth is probably somewhere boring in the middle - but I keep thinking about those agents working through the night.
-
----
-
-## CONTENT PATTERNS
-
-### 1. Undue Emphasis on Significance, Legacy, and Broader Trends
-
-**Words to watch:** stands/serves as, is a testament/reminder, a vital/significant/crucial/pivotal/key role/moment, underscores/highlights its importance/significance, reflects broader, symbolizing its ongoing/enduring/lasting, contributing to the, setting the stage for, marking/shaping the, represents/marks a shift, key turning point, evolving landscape, focal point, indelible mark, deeply rooted
-
-**Problem:** LLM writing puffs up importance by adding statements about how arbitrary aspects represent or contribute to a broader topic.
-
-**Before:**
-> The Statistical Institute of Catalonia was officially established in 1989, marking a pivotal moment in the evolution of regional statistics in Spain. This initiative was part of a broader movement across Spain to decentralize administrative functions and enhance regional governance.
-
-**After:**
-> The Statistical Institute of Catalonia was established in 1989 to collect and publish regional statistics independently from Spain's national statistics office.
-
----
-
-### 2. Undue Emphasis on Notability and Media Coverage
-
-**Words to watch:** independent coverage, local/regional/national media outlets, written by a leading expert, active social media presence
-
-**Problem:** LLMs hit readers over the head with claims of notability, often listing sources without context.
-
-**Before:**
-> Her views have been cited in The New York Times, BBC, Financial Times, and The Hindu. She maintains an active social media presence with over 500,000 followers.
-
-**After:**
-> In a 2024 New York Times interview, she argued that AI regulation should focus on outcomes rather than methods.
-
----
-
-### 3. Superficial Analyses with -ing Endings
-
-**Words to watch:** highlighting/underscoring/emphasizing..., ensuring..., reflecting/symbolizing..., contributing to..., cultivating/fostering..., encompassing..., showcasing...
-
-**Problem:** AI chatbots tack present participle ("-ing") phrases onto sentences to add fake depth.
-
-**Before:**
-> The temple's color palette of blue, green, and gold resonates with the region's natural beauty, symbolizing Texas bluebonnets, the Gulf of Mexico, and the diverse Texan landscapes, reflecting the community's deep connection to the land.
-
-**After:**
-> The temple uses blue, green, and gold colors. The architect said these were chosen to reference local bluebonnets and the Gulf coast.
-
----
-
-### 4. Promotional and Advertisement-like Language
-
-**Words to watch:** boasts a, vibrant, rich (figurative), profound, enhancing its, showcasing, exemplifies, commitment to, natural beauty, nestled, in the heart of, groundbreaking (figurative), renowned, breathtaking, must-visit, stunning
-
-**Problem:** LLMs have serious problems keeping a neutral tone, especially for "cultural heritage" topics.
-
-**Before:**
-> Nestled within the breathtaking region of Gonder in Ethiopia, Alamata Raya Kobo stands as a vibrant town with a rich cultural heritage and stunning natural beauty.
-
-**After:**
-> Alamata Raya Kobo is a town in the Gonder region of Ethiopia, known for its weekly market and 18th-century church.
-
----
-
-### 5. Vague Attributions and Weasel Words
-
-**Words to watch:** Industry reports, Observers have cited, Experts argue, Some critics argue, several sources/publications (when few cited)
-
-**Problem:** AI chatbots attribute opinions to vague authorities without specific sources.
-
-**Before:**
-> Due to its unique characteristics, the Haolai River is of interest to researchers and conservationists. Experts believe it plays a crucial role in the regional ecosystem.
-
-**After:**
-> The Haolai River supports several endemic fish species, according to a 2019 survey by the Chinese Academy of Sciences.
-
----
-
-### 6. Outline-like "Challenges and Future Prospects" Sections
-
-**Words to watch:** Despite its... faces several challenges..., Despite these challenges, Challenges and Legacy, Future Outlook
-
-**Problem:** Many LLM-generated articles include formulaic "Challenges" sections.
-
-**Before:**
-> Despite its industrial prosperity, Korattur faces challenges typical of urban areas, including traffic congestion and water scarcity. Despite these challenges, with its strategic location and ongoing initiatives, Korattur continues to thrive as an integral part of Chennai's growth.
-
-**After:**
-> Traffic congestion increased after 2015 when three new IT parks opened. The municipal corporation began a stormwater drainage project in 2022 to address recurring floods.
-
----
-
-## LANGUAGE AND GRAMMAR PATTERNS
-
-### 7. Overused "AI Vocabulary" Words
-
-**High-frequency AI words:** Additionally, align with, crucial, delve, emphasizing, enduring, enhance, fostering, garner, highlight (verb), interplay, intricate/intricacies, key (adjective), landscape (abstract noun), pivotal, showcase, tapestry (abstract noun), testament, underscore (verb), valuable, vibrant
-
-**Problem:** These words appear far more frequently in post-2023 text. They often co-occur.
-
-**Before:**
-> Additionally, a distinctive feature of Somali cuisine is the incorporation of camel meat. An enduring testament to Italian colonial influence is the widespread adoption of pasta in the local culinary landscape, showcasing how these dishes have integrated into the traditional diet.
-
-**After:**
-> Somali cuisine also includes camel meat, which is considered a delicacy. Pasta dishes, introduced during Italian colonization, remain common, especially in the south.
-
----
-
-### 8. Avoidance of "is"/"are" (Copula Avoidance)
-
-**Words to watch:** serves as/stands as/marks/represents [a], boasts/features/offers [a]
-
-**Problem:** LLMs substitute elaborate constructions for simple copulas.
-
-**Before:**
-> Gallery 825 serves as LAAA's exhibition space for contemporary art. The gallery features four separate spaces and boasts over 3,000 square feet.
-
-**After:**
-> Gallery 825 is LAAA's exhibition space for contemporary art. The gallery has four rooms totaling 3,000 square feet.
-
----
-
-### 9. Negative Parallelisms
-
-**Problem:** Constructions like "Not only...but..." or "It's not just about..., it's..." are overused.
-
-**Before:**
-> It's not just about the beat riding under the vocals; it's part of the aggression and atmosphere. It's not merely a song, it's a statement.
-
-**After:**
-> The heavy beat adds to the aggressive tone.
-
----
-
-### 10. Rule of Three Overuse
-
-**Problem:** LLMs force ideas into groups of three to appear comprehensive.
-
-**Before:**
-> The event features keynote sessions, panel discussions, and networking opportunities. Attendees can expect innovation, inspiration, and industry insights.
-
-**After:**
-> The event includes talks and panels. There's also time for informal networking between sessions.
-
----
-
-### 11. Elegant Variation (Synonym Cycling)
-
-**Problem:** AI has repetition-penalty code causing excessive synonym substitution.
-
-**Before:**
-> The protagonist faces many challenges. The main character must overcome obstacles. The central figure eventually triumphs. The hero returns home.
-
-**After:**
-> The protagonist faces many challenges but eventually triumphs and returns home.
-
----
-
-### 12. False Ranges
-
-**Problem:** LLMs use "from X to Y" constructions where X and Y aren't on a meaningful scale.
-
-**Before:**
-> Our journey through the universe has taken us from the singularity of the Big Bang to the grand cosmic web, from the birth and death of stars to the enigmatic dance of dark matter.
-
-**After:**
-> The book covers the Big Bang, star formation, and current theories about dark matter.
-
----
-
-## STYLE PATTERNS
-
-### 13. Em Dash Overuse
-
-**Problem:** LLMs use em dashes (—) more than humans, mimicking "punchy" sales writing.
-
-**Before:**
-> The term is primarily promoted by Dutch institutions—not by the people themselves. You don't say "Netherlands, Europe" as an address—yet this mislabeling continues—even in official documents.
-
-**After:**
-> The term is primarily promoted by Dutch institutions, not by the people themselves. You don't say "Netherlands, Europe" as an address, yet this mislabeling continues in official documents.
-
----
-
-### 14. Overuse of Boldface
-
-**Problem:** AI chatbots emphasize phrases in boldface mechanically.
-
-**Before:**
-> It blends **OKRs (Objectives and Key Results)**, **KPIs (Key Performance Indicators)**, and visual strategy tools such as the **Business Model Canvas (BMC)** and **Balanced Scorecard (BSC)**.
-
-**After:**
-> It blends OKRs, KPIs, and visual strategy tools like the Business Model Canvas and Balanced Scorecard.
+**AI writing hedges structurally, not just lexically.** The pattern "It is worth noting that X, which Y, thereby Z" is three separate hedges baked into sentence structure. Fix the architecture, not just the vocabulary.
 
----
-
-### 15. Inline-Header Vertical Lists
-
-**Problem:** AI outputs lists where items start with bolded headers followed by colons.
-
-**Before:**
-> - **User Experience:** The user experience has been significantly improved with a new interface.
-> - **Performance:** Performance has been enhanced through optimized algorithms.
-> - **Security:** Security has been strengthened with end-to-end encryption.
-
-**After:**
-> The update improves the interface, speeds up load times through optimized algorithms, and adds end-to-end encryption.
-
----
-
-### 16. Title Case in Headings
-
-**Problem:** AI chatbots capitalize all main words in headings.
-
-**Before:**
-> ## Strategic Negotiations And Global Partnerships
-
-**After:**
-> ## Strategic negotiations and global partnerships
+**The most dangerous output is "clean but soulless."** No opinions, no uncertainty, no first-person, neutral reporting of everything — this reads like a press release and is just as obviously generated.
 
 ---
 
-### 17. Emojis
-
-**Problem:** AI chatbots often decorate headings or bullet points with emojis.
+## Navigation
 
-**Before:**
-> 🚀 **Launch Phase:** The product launches in Q3
-> 💡 **Key Insight:** Users prefer simplicity
-> ✅ **Next Steps:** Schedule follow-up meeting
+**Use this skill when:**
+- Text feels robotic, corporate, or AI-generated even if you can't name why
+- You spot specific AI vocabulary: "delve", "tapestry", "pivotal", "showcase", "underscore", "vibrant", "testament", "landscape" (abstract), "interplay", "intricate"
+- Writing uses em dashes every other sentence
+- Every paragraph ends with a vague optimistic statement
+- You see "Not only X but Y" or "It's not just A, it's B" patterns
+- Bullet lists all start with "**Bold Header:** then explanation"
+- Text contains "I hope this helps" or "Certainly!" (chatbot artifacts left in)
+- Writing uses "serves as" or "stands as" instead of "is"
 
-**After:**
-> The product launches in Q3. User research showed a preference for simplicity. Next step: schedule a follow-up meeting.
-
----
+**Do NOT use this skill when:**
+- The text is already spare and direct — over-editing introduces new problems
+- The AI-ish tone is intentional (formal legal, regulatory, academic)
+- You need to add content; this skill edits, it doesn't research
 
-### 18. Curly Quotation Marks
+**Quick decision tree:**
 
-**Problem:** ChatGPT uses curly quotes (“...”) instead of straight quotes ("...").
-
-**Before:**
-> He said “the project is on track” but others disagreed.
-
-**After:**
-> He said "the project is on track" but others disagreed.
+```
+Is there actual content/information in the text?
+├── NO → Ask for source material first; humanizer can't fabricate facts
+└── YES → Does it read naturally aloud?
+    ├── YES → Light pass only; focus on rhythm variation
+    └── NO → Full pass; start with structural issues before word-level fixes
+        ├── Structure problems? → Fix sentence architecture first
+        └── Word problems only? → Swap vocab, then check rhythm
+```
 
 ---
 
-## COMMUNICATION PATTERNS
-
-### 19. Collaborative Communication Artifacts
-
-**Words to watch:** I hope this helps, Of course!, Certainly!, You're absolutely right!, Would you like..., let me know, here is a...
+## Philosophy
 
-**Problem:** Text meant as chatbot correspondence gets pasted as content.
-
-**Before:**
-> Here is an overview of the French Revolution. I hope this helps! Let me know if you'd like me to expand on any section.
-
-**After:**
-> The French Revolution began in 1789 when financial crisis and food shortages led to widespread unrest.
+AI writes toward the statistical center — the most likely phrasing that applies to the widest cases. Humanizing means pushing away from center: toward the specific, the opinionated, the slightly imperfect. A human wrote this. Show it.
 
 ---
 
-### 20. Knowledge-Cutoff Disclaimers
-
-**Words to watch:** as of [date], Up to my last training update, While specific details are limited/scarce..., based on available information...
+## NEVER
 
-**Problem:** AI disclaimers about incomplete information get left in text.
+- **NEVER remove hedging without replacing it with an actual stance** — "may affect outcomes" is fine; deleting uncertainty and leaving nothing is dishonest.
 
-**Before:**
-> While specific details about the company's founding are not extensively documented in readily available sources, it appears to have been established sometime in the 1990s.
+- **NEVER flatten all sentence variation in the opposite direction** — making every sentence short and punchy is just a different kind of rhythm problem. Mix 8-word and 30-word sentences deliberately.
 
-**After:**
-> The company was founded in 1994, according to its registration documents.
-
----
+- **NEVER add first-person ("I") unless the context clearly supports it** — injecting "I" into a company blog post or product description reads as wrong as the original AI tone.
 
-### 21. Sycophantic/Servile Tone
+- **NEVER swap AI vocabulary for equally abstract vocabulary** — replacing "vibrant ecosystem" with "dynamic environment" accomplishes nothing. The fix is specificity: "23 active contributors pushing daily."
 
-**Problem:** Overly positive, people-pleasing language.
+- **NEVER strip boldface from genuine technical terms** — the target is *mechanical* boldface decoration, not all emphasis. API method names, warnings, and genuinely key terms warrant bold.
 
-**Before:**
-> Great question! You're absolutely right that this is a complex topic. That's an excellent point about the economic factors.
+- **NEVER rewrite so aggressively that factual claims change** — changing "the project was successful" to "the project exceeded targets" invents a claim. Stick to what the original actually asserted.
 
-**After:**
-> The economic factors you mentioned are relevant here.
+- **NEVER output a "Changes made" section unless explicitly requested** — it proves the text was edited by an AI. The output should stand alone.
 
 ---
 
-## FILLER AND HEDGING
+## When Things Go Wrong
 
-### 22. Filler Phrases
-
-**Before → After:**
-- "In order to achieve this goal" → "To achieve this"
-- "Due to the fact that it was raining" → "Because it was raining"
-- "At this point in time" → "Now"
-- "In the event that you need help" → "If you need help"
-- "The system has the ability to process" → "The system can process"
-- "It is important to note that the data shows" → "The data shows"
+| Situation | Likely Cause | Recovery |
+|-----------|--------------|----------|
+| Output still sounds AI even after full pass | Sentence rhythm unaddressed — word swaps without structural changes | Read aloud; find the metronomic sections; manually vary sentence length |
+| Rewrites lose important nuance or context | Over-editing; removing specificity along with fluff | Restore original claims; edit for tone only, not content |
+| Text feels "cleaned up" but has no voice | Removed patterns without adding personality | Add one concrete detail, one opinion, or one moment of acknowledged complexity |
+| User says "too casual now" | Overcorrected toward first-person/informal | Return to third-person; vary rhythm without adding personality markers |
+| Promotional text keeps regenerating AI patterns | Wrote new content instead of editing; LLM defaults | Work sentence-by-sentence; don't let model rewrite from scratch |
 
 ---
 
-### 23. Excessive Hedging
-
-**Problem:** Over-qualifying statements.
-
-**Before:**
-> It could potentially possibly be argued that the policy might have some effect on outcomes.
-
-**After:**
-> The policy may affect outcomes.
+## Workflow
 
----
+**Step 1 — Structural scan (do first):**
+- Sentence rhythm: are lengths uniform? Fix before word-level edits.
+- Paragraph endings: do they vague-out into optimism? Replace with specifics.
+- Lists: are they "**Header:** body" format? Collapse to prose.
+- Heading case: Title Case In Every Word? → Sentence case only.
 
-### 24. Generic Positive Conclusions
+**Step 2 — Pattern pass (load reference if needed):**
+Load `references/patterns.md` for the full 24-pattern catalog with before/after examples.
 
-**Problem:** Vague upbeat endings.
+Priority order (most to least detectable):
+1. Chatbot artifacts (leave "I hope this helps", "Certainly!")
+2. AI vocabulary words (delve, tapestry, pivotal, showcase, vibrant, testament)
+3. Copula avoidance (serves as → is, boasts → has)
+4. Vague attributions (Experts say → name the expert + source)
+5. Inflated significance (pivotal moment, marks a shift, evolving landscape)
+6. Em dash overuse (more than 1 per 200 words is likely AI)
+7. Negative parallelisms (Not only X but Y)
+8. Rule of three (forced triads)
+9. Generic conclusions (future looks bright → specific next action)
 
-**Before:**
-> The future looks bright for the company. Exciting times lie ahead as they continue their journey toward excellence. This represents a major step in the right direction.
+**Step 3 — Voice check:**
+- Does any sentence express an opinion, uncertainty, or specific feeling?
+- Does rhythm vary when read aloud?
+- Is there at least one concrete specific (number, name, date, product feature)?
 
-**After:**
-> The company plans to open two more locations next year.
+If all three are no — the output is clean but soulless. Add one element from Step 3 before finishing.
 
 ---
 
-## Process
-
-1. Read the input text carefully
-2. Identify all instances of the patterns above
-3. Rewrite each problematic section
-4. Ensure the revised text:
-   - Sounds natural when read aloud
-   - Varies sentence structure naturally
-   - Uses specific details over vague claims
-   - Maintains appropriate tone for context
-   - Uses simple constructions (is/are/has) where appropriate
-5. Present the humanized version
-
 ## Output Format
 
-Provide:
-1. The rewritten text
-2. A brief summary of changes made (optional, if helpful)
-
----
+Provide the humanized text only. No preamble ("Here is the rewritten version:"), no summary of changes, no closing offer to revise further — these are themselves AI writing patterns.
 
-## Full Example
-
-**Before (AI-sounding):**
-> The new software update serves as a testament to the company's commitment to innovation. Moreover, it provides a seamless, intuitive, and powerful user experience—ensuring that users can accomplish their goals efficiently. It's not just an update, it's a revolution in how we think about productivity. Industry experts believe this will have a lasting impact on the entire sector, highlighting the company's pivotal role in the evolving technological landscape.
-
-**After (Humanized):**
-> The software update adds batch processing, keyboard shortcuts, and offline mode. Early feedback from beta testers has been positive, with most reporting faster task completion.
-
-**Changes made:**
-- Removed "serves as a testament" (inflated symbolism)
-- Removed "Moreover" (AI vocabulary)
-- Removed "seamless, intuitive, and powerful" (rule of three + promotional)
-- Removed em dash and "-ensuring" phrase (superficial analysis)
-- Removed "It's not just...it's..." (negative parallelism)
-- Removed "Industry experts believe" (vague attribution)
-- Removed "pivotal role" and "evolving landscape" (AI vocabulary)
-- Added specific features and concrete feedback
+Exception: if the user explicitly asks "what did you change?", provide a brief bulleted list.
 
 ---
 
 ## Reference
 
-This skill is based on [Wikipedia:Signs of AI writing](https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing), maintained by WikiProject AI Cleanup. The patterns documented there come from observations of thousands of instances of AI-generated text on Wikipedia.
-
-Key insight from Wikipedia: "LLMs use statistical algorithms to guess what should come next. The result tends toward the most statistically likely result that applies to the widest variety of cases."
+**`references/patterns.md`** — Full 24-pattern catalog with before/after examples for all documented AI writing signals. Load this when you need specific examples or are working on a particular pattern type.
diff --git a/skills/humanizer/references/patterns.md b/skills/humanizer/references/patterns.md
new file mode 100644
index 0000000..63ff4d6
--- /dev/null
+++ b/skills/humanizer/references/patterns.md
@@ -0,0 +1,225 @@
+# AI Writing Pattern Catalog
+
+Source: [Wikipedia:Signs of AI writing](https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing) + practitioner extensions.
+
+---
+
+## CONTENT PATTERNS
+
+### 1. Inflated Significance / Legacy Language
+**Trigger words:** stands as, serves as, is a testament/reminder, pivotal moment, underscores, reflects broader, symbolizing its enduring, setting the stage for, marks a shift, evolving landscape, indelible mark, deeply rooted
+
+**Before:** > The Statistical Institute of Catalonia was officially established in 1989, marking a pivotal moment in the evolution of regional statistics in Spain.
+
+**After:** > The Statistical Institute of Catalonia was established in 1989 to collect regional statistics independently from Spain's national statistics office.
+
+---
+
+### 2. Notability Padding
+**Trigger words:** independent coverage, local/regional/national media outlets, active social media presence
+
+**Before:** > Her views have been cited in The New York Times, BBC, Financial Times. She maintains an active social media presence with over 500,000 followers.
+
+**After:** > In a 2024 NYT interview, she argued AI regulation should focus on outcomes rather than methods.
+
+---
+
+### 3. Superficial -ing Analysis
+**Trigger words:** highlighting..., underscoring..., symbolizing..., contributing to..., fostering..., showcasing..., reflecting...
+
+**Before:** > The color palette of blue, green, and gold resonates with the region's natural beauty, symbolizing Texas bluebonnets, reflecting the community's deep connection to the land.
+
+**After:** > The architect chose blue, green, and gold to reference local bluebonnets and the Gulf coast.
+
+---
+
+### 4. Promotional / Brochure Language
+**Trigger words:** boasts a, vibrant, rich (figurative), profound, nestled, in the heart of, groundbreaking, renowned, breathtaking, must-visit, stunning
+
+**Before:** > Nestled within the breathtaking region of Gonder in Ethiopia, Alamata Raya Kobo stands as a vibrant town with rich cultural heritage.
+
+**After:** > Alamata Raya Kobo is a town in the Gonder region of Ethiopia, known for its weekly market and 18th-century church.
+
+---
+
+### 5. Vague Attributions / Weasel Words
+**Trigger words:** Industry reports, Observers have cited, Experts argue, Some critics argue, several sources
+
+**Before:** > Experts believe the Haolai River plays a crucial role in the regional ecosystem.
+
+**After:** > The Haolai River supports several endemic fish species, per a 2019 Chinese Academy of Sciences survey.
+
+---
+
+### 6. Formulaic "Challenges and Future Prospects" Sections
+**Trigger words:** Despite its..., faces challenges typical of, Despite these challenges, Future Outlook, Challenges and Legacy
+
+**Before:** > Despite its industrial prosperity, Korattur faces challenges typical of urban areas. Despite these challenges, Korattur continues to thrive.
+
+**After:** > Traffic congestion increased after 2015 when three IT parks opened. The city began a drainage project in 2022.
+
+---
+
+## LANGUAGE / GRAMMAR PATTERNS
+
+### 7. AI Vocabulary Overuse
+**High-frequency words:** Additionally, align with, crucial, delve, emphasizing, enduring, enhance, fostering, garner, highlight (verb), interplay, intricate/intricacies, key (adjective), landscape (abstract noun), pivotal, showcase, tapestry, testament, underscore, valuable, vibrant
+
+**Before:** > Additionally, an enduring testament to Italian colonial influence is the widespread adoption of pasta in the local culinary landscape, showcasing how these dishes have integrated into the traditional diet.
+
+**After:** > Pasta dishes, introduced during Italian colonization, remain common, especially in the south.
+
+---
+
+### 8. Copula Avoidance ("serves as" instead of "is")
+**Trigger words:** serves as/stands as/marks/represents [a], boasts/features/offers [a]
+
+**Before:** > Gallery 825 serves as LAAA's exhibition space. The gallery features four spaces and boasts over 3,000 square feet.
+
+**After:** > Gallery 825 is LAAA's exhibition space. It has four rooms totaling 3,000 square feet.
+
+---
+
+### 9. Negative Parallelisms
+**Pattern:** "Not only X but Y", "It's not just about X, it's Y", "Not merely A, but B"
+
+**Before:** > It's not just about the beat riding under the vocals; it's part of the aggression. It's not merely a song, it's a statement.
+
+**After:** > The heavy beat amplifies the aggressive tone.
+
+---
+
+### 10. Rule of Three Overuse
+**Pattern:** Forced triads — keynote sessions, panel discussions, and networking opportunities; innovation, inspiration, and industry insights.
+
+**Before:** > The event features keynote sessions, panel discussions, and networking opportunities.
+
+**After:** > The event includes talks and panels, with time for informal networking.
+
+---
+
+### 11. Elegant Variation (Synonym Cycling)
+**Pattern:** protagonist → main character → central figure → hero (all same person, repetition-penalty artifact)
+
+**Before:** > The protagonist faces challenges. The main character must overcome obstacles. The central figure eventually triumphs. The hero returns home.
+
+**After:** > The protagonist faces many challenges but eventually triumphs and returns home.
+
+---
+
+### 12. False Ranges
+**Pattern:** "from X to Y" where X and Y aren't on a meaningful spectrum
+
+**Before:** > Our journey has taken us from the singularity of the Big Bang to the grand cosmic web, from the birth of stars to the dance of dark matter.
+
+**After:** > The book covers the Big Bang, star formation, and theories about dark matter.
+
+---
+
+## STYLE PATTERNS
+
+### 13. Em Dash Overuse
+**Pattern:** Em dashes (—) used more than humans naturally do, imitating "punchy" writing.
+
+**Before:** > The term is promoted by Dutch institutions—not by the people themselves. You don't say "Netherlands, Europe"—yet this mislabeling continues—in official documents.
+
+**After:** > The term is promoted by Dutch institutions, not the people themselves. The mislabeling continues in official documents.
+
+---
+
+### 14. Mechanical Boldface
+**Pattern:** Bolding every technical term or concept, not for emphasis but for visual structure.
+
+**Before:** > It blends **OKRs (Objectives and Key Results)**, **KPIs (Key Performance Indicators)**, and **Business Model Canvas (BMC)**.
+
+**After:** > It blends OKRs, KPIs, and the Business Model Canvas.
+
+---
+
+### 15. Inline-Header Bullet Lists
+**Pattern:** Bullets with "**Header:** Description" format — lists that should be prose.
+
+**Before:**
+```
+- **User Experience:** The UX improved with a new interface.
+- **Performance:** Speed increased through optimized algorithms.
+- **Security:** Strengthened with end-to-end encryption.
+```
+
+**After:** > The update improves the interface, speeds load times, and adds end-to-end encryption.
+
+---
+
+### 16. Title Case in Headings
+**Pattern:** Capitalizing All Main Words In Headings when only first word + proper nouns should be capitalized.
+
+**Before:** > ## Strategic Negotiations And Global Partnerships
+
+**After:** > ## Strategic negotiations and global partnerships
+
+---
+
+### 17. Emoji Decoration
+**Pattern:** 🚀 **Launch:** ... 💡 **Insight:** ... ✅ **Next Steps:** ...
+
+**After:** Strip emojis. Convert to plain prose or simple headings.
+
+---
+
+### 18. Curly Quotation Marks
+**Pattern:** ChatGPT uses curly quotes ("...") instead of straight ("..."). Signals AI origin.
+
+---
+
+## COMMUNICATION PATTERNS
+
+### 19. Chatbot Meta-Language Left In
+**Trigger words:** I hope this helps, Of course!, Certainly!, Would you like me to, let me know, here is a...
+
+**Before:** > Here is an overview of the French Revolution. I hope this helps! Let me know if you'd like me to expand.
+
+**After:** > The French Revolution began in 1789 when financial crisis and food shortages led to unrest.
+
+---
+
+### 20. Knowledge-Cutoff Disclaimers
+**Trigger words:** as of [date], Up to my last training update, While specific details are limited, based on available information
+
+**Before:** > While specific details about the founding are not extensively documented, it appears to have been established sometime in the 1990s.
+
+**After:** > The company was founded in 1994, per its registration documents.
+
+---
+
+### 21. Sycophantic / Servile Tone
+**Pattern:** "Great question!", "You're absolutely right!", "That's an excellent point"
+
+**After:** Remove entirely. State the substance directly.
+
+---
+
+## FILLER AND HEDGING
+
+### 22. Common Filler Phrases
+| Before | After |
+|--------|-------|
+| In order to achieve this goal | To achieve this |
+| Due to the fact that | Because |
+| At this point in time | Now |
+| In the event that | If |
+| Has the ability to | Can |
+| It is important to note that | (delete) |
+
+---
+
+### 23. Over-Hedging
+**Before:** > It could potentially possibly be argued that the policy might have some effect.
+
+**After:** > The policy may affect outcomes.
+
+---
+
+### 24. Generic Positive Conclusions
+**Before:** > The future looks bright. Exciting times lie ahead as they journey toward excellence.
+
+**After:** > The company plans to open two more locations next year.
diff --git a/skills/jira/SKILL.md b/skills/jira/SKILL.md
index 2de90bb..2057eba 100644
--- a/skills/jira/SKILL.md
+++ b/skills/jira/SKILL.md
@@ -1,34 +1,77 @@
 ---
 name: jira
-description: Use when the user mentions Jira issues (e.g., "PROJ-123"), asks about tickets, wants to create/view/update issues, check sprint status, or manage their Jira workflow. Triggers on keywords like "jira", "issue", "ticket", "sprint", "backlog", or issue key patterns.
+description: Use when the user mentions Jira issues (e.g., "PROJ-123"), asks about tickets, wants to create/view/update issues, check sprint status, or manage their Jira workflow. Triggers on keywords like "jira", "issue", "ticket", "sprint", "backlog", "transition", "epic", or issue key patterns like ABC-123.
 ---
 
 # Jira
 
-Natural language interaction with Jira. Supports multiple backends.
+Natural language interaction with Jira. Supports CLI and MCP backends. Covers Jira Cloud and Server/Data Center.
 
-## Backend Detection
+---
+
+## Mindset
+
+1. **Fetch before you touch.** Never assume current state — an issue could have been updated 30 seconds ago. Always retrieve before editing or transitioning.
+
+2. **IDs, not names.** Jira Cloud rejects display names for users, transition names, and link types. Resolve every identifier to its machine ID before acting. "Done" is not a transition ID; "5f3a..." is.
+
+3. **Cloud and Server are different APIs.** Cloud requires ADF for descriptions; Server takes plain markup. Cloud uses `accountId`; Server uses `username`. Cloud is `/rest/api/3/`; Server is `/rest/api/2/`. Detect first, act second.
+
+4. **Notifications are not free.** Every field update notifies all watchers. Bulk-editing 20 tickets sends 20 notification storms. Get explicit approval before any batch operation.
+
+5. **Transitions have gates.** Workflow enforces order. "To Do" → "Done" may be blocked if the project requires "In Progress" as an intermediate step. Get available transitions, pick from that list.
+
+---
+
+## Navigation
 
-**Run this check first** to determine which backend to use:
+**Use this skill when:**
+- User mentions a Jira issue key (pattern `[A-Z]+-[0-9]+`)
+- User asks to create, view, update, comment on, or transition a ticket
+- User asks about sprint status, backlog, or epics
+- User wants to search or filter Jira issues (JQL)
+
+**Do NOT use this skill when:**
+- User is asking about GitHub Issues, Linear, Azure DevOps, or other trackers — route to the appropriate skill
+- User wants to configure Jira itself (workflow schemes, field configurations) — those require Jira admin UI
+- User wants Confluence pages — use Confluence-specific MCP tools
+
+**Backend decision tree:**
 
 ```
-1. Check if jira CLI is available:
+1. Is `jira` CLI available?
    → Run: which jira
-   → If found: USE CLI BACKEND
+   → YES: USE CLI BACKEND (references/commands.md)
+
+2. Are Atlassian MCP tools available?
+   → Look for mcp__claude_ai_Atlassian_Rovo__* in tool list
+   → YES: USE MCP BACKEND (references/mcp.md)
+
+3. Neither available?
+   → GUIDE USER TO SETUP (see "No Backend Available" below)
+```
 
-2. If no CLI, check for Atlassian MCP:
-   → Look for mcp__atlassian__* tools
-   → If available: USE MCP BACKEND
+**Deployment detection (critical for Cloud vs Server behavior):**
 
-3. If neither available:
-   → GUIDE USER TO SETUP
 ```
+URL contains .atlassian.net  → Jira Cloud  → ADF required, accountId for users
+URL is self-hosted domain    → Server/DC   → Wiki markup OK, username for users
+MCP Rovo tools present       → Jira Cloud  → ADF required
+```
+
+See `references/cloud-vs-server.md` for full differences.
 
 | Backend | When to Use | Reference |
 |---------|-------------|-----------|
 | **CLI** | `jira` command available | `references/commands.md` |
 | **MCP** | Atlassian MCP tools available | `references/mcp.md` |
-| **None** | Neither available | Guide to install CLI |
+| **Neither** | Neither available | Guide to install CLI |
+
+---
+
+## Philosophy
+
+Jira is a shared workspace. Every action you take ripples outward — to watchers, to linked tickets, to sprint boards. Treat each operation as a deliberate edit to a team's shared record, not a personal note. Confirm before you change; verify after you do.
 
 ---
 
@@ -58,47 +101,56 @@ Natural language interaction with Jira. Supports multiple backends.
 
 | Intent | MCP Tool |
 |--------|----------|
-| Search issues | `mcp__atlassian__searchJiraIssuesUsingJql` |
-| View issue | `mcp__atlassian__getJiraIssue` |
-| Create issue | `mcp__atlassian__createJiraIssue` |
-| Update issue | `mcp__atlassian__editJiraIssue` |
-| Get transitions | `mcp__atlassian__getTransitionsForJiraIssue` |
-| Transition | `mcp__atlassian__transitionJiraIssue` |
-| Add comment | `mcp__atlassian__addCommentToJiraIssue` |
-| User lookup | `mcp__atlassian__lookupJiraAccountId` |
-| List projects | `mcp__atlassian__getVisibleJiraProjects` |
+| Search issues | `mcp__claude_ai_Atlassian_Rovo__searchJiraIssuesUsingJql` |
+| View issue | `mcp__claude_ai_Atlassian_Rovo__getJiraIssue` |
+| Create issue | `mcp__claude_ai_Atlassian_Rovo__createJiraIssue` |
+| Update issue | `mcp__claude_ai_Atlassian_Rovo__editJiraIssue` |
+| Get transitions | `mcp__claude_ai_Atlassian_Rovo__getTransitionsForJiraIssue` |
+| Transition | `mcp__claude_ai_Atlassian_Rovo__transitionJiraIssue` |
+| Add comment | `mcp__claude_ai_Atlassian_Rovo__addCommentToJiraIssue` |
+| User lookup | `mcp__claude_ai_Atlassian_Rovo__lookupJiraAccountId` |
+| List projects | `mcp__claude_ai_Atlassian_Rovo__getVisibleJiraProjects` |
 
 See `references/mcp.md` for full MCP patterns.
 
 ---
 
-## Triggers
-
-- "create a jira ticket"
-- "show me PROJ-123"
-- "list my tickets"
-- "move ticket to done"
-- "what's in the current sprint"
-
----
+## ADF Requirement (Jira Cloud)
+
+Jira Cloud API v3 requires Atlassian Document Format (ADF) for `description` and `body` fields. Sending a plain string is silently ignored or causes a 400 error.
+
+**Minimal ADF for any description:**
+```json
+{
+  "type": "doc",
+  "version": 1,
+  "content": [
+    {
+      "type": "paragraph",
+      "content": [
+        {"type": "text", "text": "Your description here"}
+      ]
+    }
+  ]
+}
+```
 
-## Issue Key Detection
+**When using MCP `createJiraIssue` or `editJiraIssue` on Cloud:** pass the description as ADF JSON, not a plain string.
 
-Issue keys follow the pattern: `[A-Z]+-[0-9]+` (e.g., PROJ-123, ABC-1).
+**When using the `jira` CLI:** the CLI handles format conversion internally — pass plain text or markdown normally.
 
-When a user mentions an issue key in conversation:
-- **CLI:** `jira issue view KEY` or `jira open KEY`
-- **MCP:** `mcp__atlassian__jira_get_issue` with the key
+See `references/cloud-vs-server.md` for more ADF examples (headings, bullets, code blocks).
 
 ---
 
 ## Workflow
 
 **Creating tickets:**
-1. Research context if user references code/tickets/PRs
-2. Draft ticket content
-3. Review with user
-4. Create using appropriate backend
+1. Detect deployment (Cloud vs Server) — determines description format
+2. Research context if user references code/tickets/PRs
+3. Draft ticket content (use ADF if Cloud + MCP)
+4. Review with user
+5. Create using appropriate backend
 
 **Updating tickets:**
 1. Fetch issue details first
@@ -109,33 +161,48 @@ When a user mentions an issue key in conversation:
 
 ---
 
-## Before Any Operation
+## NEVER
+
+- **NEVER send a plain text string as `description` on Jira Cloud (MCP)** — Cloud API v3 requires ADF JSON. A plain string is silently discarded, creating a ticket with a blank description and no error message.
 
-Ask yourself:
+- **NEVER transition without fetching current status first** — Workflows enforce intermediate states. "To Do" → "Done" silently fails if "In Progress" is required first. Always call `getTransitionsForJiraIssue` and pick from the returned list.
 
-1. **What's the current state?** — Always fetch the issue first. Don't assume status, assignee, or fields are what user thinks they are.
+- **NEVER assign using a display name (MCP/Cloud)** — Only `accountId` works on Jira Cloud. Display names are silently accepted but the assignment is dropped. Call `lookupJiraAccountId` first.
 
-2. **Who else is affected?** — Check watchers, linked issues, parent epics. A "simple edit" might notify 10 people.
+- **NEVER edit a description without showing the original** — Jira has no undo for field edits. Fetch the current description, show it to the user, and confirm before replacing.
 
-3. **Is this reversible?** — Transitions may have one-way gates. Some workflows require intermediate states. Description edits have no undo.
+- **NEVER use `/rest/api/3/` endpoints against a Server/Data Center instance** — Server only exposes v2. The request returns 404 with no helpful error. Check the URL — `.atlassian.net` = Cloud (v3); everything else = Server (v2).
 
-4. **Do I have the right identifiers?** — Issue keys, transition IDs, account IDs. Display names don't work for assignment (MCP).
+- **NEVER use `--no-input` (CLI) without first checking required fields** — Silently fails with cryptic output if project-required fields (e.g., story points, components) are missing. Run `jira issue create` interactively once on a new project to learn its required fields.
 
----
+- **NEVER assume transition names are universal** — "Done", "Closed", "Complete", "Resolved" vary by project workflow. A transition named "Done" in one project may be "Closed" in another. Always get available transitions for the specific issue.
 
-## NEVER
+- **NEVER bulk-modify without explicit approval** — Each ticket update notifies all watchers independently. Editing 10 tickets sends 10 separate notification storms. Show the full list of affected tickets and get a single explicit go-ahead.
+
+---
 
-- **NEVER transition without fetching current status** — Workflows may require intermediate states. "To Do" → "Done" might fail silently if "In Progress" is required first.
+## When Things Go Wrong
 
-- **NEVER assign using display name (MCP)** — Only account IDs work. Always call `lookupJiraAccountId` first, or assignment silently fails.
+| Symptom | Likely Cause | Fix |
+|---------|-------------|-----|
+| `description` is blank after create (MCP Cloud) | Sent plain string instead of ADF | Retry with ADF JSON; see `references/cloud-vs-server.md` |
+| Transition fails silently | Missing intermediate state in workflow | Fetch transitions list; look for required intermediate step |
+| Assignment dropped (MCP) | Used display name instead of accountId | Call `lookupJiraAccountId`, use the returned ID |
+| `404` on REST call | Called v3 endpoint against Server instance | Switch to `/rest/api/2/`; Server does not expose v3 |
+| `400 Bad Request` on create | Required project field missing | Call `getJiraProjectIssueTypesMetadata` to see required fields |
+| `401 Unauthorized` (MCP) | MCP session expired | Run `/mcp` to reconnect Atlassian service |
+| `403 Forbidden` | Missing project permission | User needs Browse/Edit Project permission in Jira admin |
+| CLI `move` fails with unknown state | Transition name mismatch | Check exact names in that project's workflow; names are case-sensitive |
 
-- **NEVER edit description without showing original** — Jira has no undo. User must see what they're replacing.
+---
 
-- **NEVER use `--no-input` without all required fields (CLI)** — Fails silently with cryptic errors. Check project's required fields first.
+## Issue Key Detection
 
-- **NEVER assume transition names are universal** — "Done", "Closed", "Complete" vary by project. Always get available transitions first.
+Issue keys follow the pattern: `[A-Z]+-[0-9]+` (e.g., PROJ-123, ABC-1).
 
-- **NEVER bulk-modify without explicit approval** — Each ticket change notifies watchers. 10 edits = 10 notification storms.
+When a user mentions an issue key in conversation:
+- **CLI:** `jira issue view KEY` or `jira open KEY`
+- **MCP:** `mcp__claude_ai_Atlassian_Rovo__getJiraIssue` with the key
 
 ---
 
@@ -156,14 +223,15 @@ If neither CLI nor MCP is available, guide the user:
 ```
 To use Jira, you need one of:
 
-1. **jira CLI** (recommended):
+1. jira CLI (recommended):
    https://github.com/ankitpokhrel/jira-cli
 
    Install: brew install ankitpokhrel/jira-cli/jira-cli
    Setup:   jira init
 
-2. **Atlassian MCP**:
+2. Atlassian MCP:
    Configure in your MCP settings with Atlassian credentials.
+   Note: Atlassian MCP (Rovo) only supports Jira Cloud.
 ```
 
 ---
@@ -175,6 +243,7 @@ To use Jira, you need one of:
 - Building JQL queries beyond simple filters
 - Troubleshooting errors or authentication issues
 - Working with transitions, linking, or sprints
+- Targeting a specific deployment type (Cloud vs Server)
 
 **Do NOT load reference for:**
 - Simple view/list operations (Quick Reference above is sufficient)
@@ -185,11 +254,14 @@ To use Jira, you need one of:
 |------|-----------------|
 | View single issue | No |
 | List my tickets | No |
-| Create with description | **Yes** — CLI needs `/tmp` pattern |
-| Transition issue | **Yes** — need transition ID workflow |
-| JQL search | **Yes** — for complex queries |
-| Link issues | **Yes** — MCP limitation, need script |
+| Create with description (Cloud, MCP) | **Yes** — ADF format required (`cloud-vs-server.md`) |
+| Create with description (CLI) | **Yes** — CLI needs `/tmp` pattern (`commands.md`) |
+| Transition issue | **Yes** — need transition ID workflow (`mcp.md`) |
+| JQL search | **Yes** — for complex queries (`mcp.md`) |
+| Link issues | **Yes** — see `mcp.md` for link type details |
+| Cloud vs Server behavior | **Yes** — `references/cloud-vs-server.md` |
 
 References:
 - CLI patterns: `references/commands.md`
 - MCP patterns: `references/mcp.md`
+- Cloud vs Server: `references/cloud-vs-server.md`
diff --git a/skills/jira/references/cloud-vs-server.md b/skills/jira/references/cloud-vs-server.md
new file mode 100644
index 0000000..4ce0e44
--- /dev/null
+++ b/skills/jira/references/cloud-vs-server.md
@@ -0,0 +1,176 @@
+# Jira Cloud vs Server/Data Center Behavioral Differences
+
+Understanding which deployment type you are targeting is critical — they have different APIs, authentication schemes, and field formats.
+
+---
+
+## Detection
+
+| Signal | Likely Deployment |
+|--------|-------------------|
+| URL ends in `.atlassian.net` | Cloud |
+| Self-hosted domain (e.g., `jira.company.com`) | Server or Data Center |
+| `JIRA_BASE_URL` contains `atlassian.net` | Cloud |
+| MCP Atlassian Rovo tools available | Cloud (Rovo is Cloud-only) |
+
+---
+
+## API Differences
+
+| Concern | Jira Cloud (REST API v3) | Jira Server/DC (REST API v2) |
+|---------|--------------------------|-------------------------------|
+| API base path | `/rest/api/3/` | `/rest/api/2/` |
+| Description format | **ADF required** (JSON) | Wiki markup text (string) |
+| Comment format | **ADF required** (JSON) | Wiki markup text (string) |
+| User identifier | `accountId` (opaque ID) | `name` (username string) |
+| User lookup | `/user/search?query=...` | `/user/search?username=...` |
+| Sprint field | `customfield_10020` (common) | Custom field ID varies by install |
+| Epic link | `customfield_10014` (common) | Custom field ID varies by install |
+| MCP Rovo support | Yes | No (API-only) |
+
+---
+
+## ADF Requirement (Jira Cloud Only)
+
+Jira Cloud API v3 requires the Atlassian Document Format (ADF) for `description` and `body` fields — plain strings are silently ignored or cause 400 errors.
+
+### Minimal ADF for plain text
+
+```json
+{
+  "type": "doc",
+  "version": 1,
+  "content": [
+    {
+      "type": "paragraph",
+      "content": [
+        {"type": "text", "text": "Your description text here"}
+      ]
+    }
+  ]
+}
+```
+
+### ADF with heading + bullets
+
+```json
+{
+  "type": "doc",
+  "version": 1,
+  "content": [
+    {
+      "type": "heading",
+      "attrs": {"level": 2},
+      "content": [{"type": "text", "text": "Problem"}]
+    },
+    {
+      "type": "paragraph",
+      "content": [{"type": "text", "text": "Users cannot log in on Safari."}]
+    },
+    {
+      "type": "bulletList",
+      "content": [
+        {
+          "type": "listItem",
+          "content": [
+            {
+              "type": "paragraph",
+              "content": [{"type": "text", "text": "Affects Safari 16+"}]
+            }
+          ]
+        },
+        {
+          "type": "listItem",
+          "content": [
+            {
+              "type": "paragraph",
+              "content": [{"type": "text", "text": "Chrome unaffected"}]
+            }
+          ]
+        }
+      ]
+    }
+  ]
+}
+```
+
+### ADF with code block
+
+```json
+{
+  "type": "doc",
+  "version": 1,
+  "content": [
+    {
+      "type": "codeBlock",
+      "attrs": {"language": "python"},
+      "content": [{"type": "text", "text": "def example():\n    pass"}]
+    }
+  ]
+}
+```
+
+---
+
+## Authentication Differences
+
+| Method | Jira Cloud | Jira Server/DC |
+|--------|-----------|----------------|
+| API Token | Yes (`user:api_token` Basic auth) | Sometimes (plugin-dependent) |
+| Password | Deprecated/disabled | Yes (Basic auth) |
+| OAuth 2.0 | Yes (3LO recommended) | OAuth 1.0a (legacy) |
+| Personal Access Token (PAT) | Yes (header: `Authorization: Bearer PAT`) | Yes (DC 8.14+) |
+
+**Cloud API token format:**
+```bash
+curl -u "user@example.com:$API_TOKEN" \
+  "https://yourco.atlassian.net/rest/api/3/issue/PROJ-123"
+```
+
+**Server PAT format:**
+```bash
+curl -H "Authorization: Bearer $PAT" \
+  "https://jira.company.com/rest/api/2/issue/PROJ-123"
+```
+
+---
+
+## User Reference Differences
+
+Cloud uses opaque `accountId`; Server uses username strings.
+
+```bash
+# Cloud: find accountId
+curl -u "$USER:$TOKEN" \
+  "https://yourco.atlassian.net/rest/api/3/user/search?query=john@example.com" \
+  | jq '.[0].accountId'
+
+# Server: use username directly
+curl -u "$ADMIN:$PASS" \
+  "https://jira.company.com/rest/api/2/user?username=jdoe"
+```
+
+---
+
+## CLI (jira-cli) Behavior
+
+The `jira` CLI (ankitpokhrel/jira-cli) detects the deployment type from `~/.config/.jira/.config.yml`. It handles both Cloud and Server automatically. However:
+
+- **Transition names on Server** may differ from Cloud workflows — always run `jira issue move --help` or list transitions before moving.
+- **Custom fields** use different IDs per install — on Server, check your instance's field configuration.
+- **Sprint management** requires the board ID on both, but board IDs differ between deployments.
+
+---
+
+## Traps Specific to Each Deployment
+
+### Cloud-only traps
+- Sending plain string as `description` → silently fails or 400; must use ADF
+- Using display name for `assignee` → must use `accountId` (Cloud strictly enforces this)
+- Using `"Epic Link"` JQL field → replaced by `parentEpic` or `"Epic Link"` depending on company migrated date; test with `getJiraProjectIssueTypesMetadata` first
+
+### Server-only traps
+- Using API v3 path (`/rest/api/3/`) → returns 404; Server is always v2
+- Sending ADF JSON as description → Server treats it as literal JSON text
+- PATs not available on Server < 8.14 (Data Center); use Basic auth with password
+
diff --git a/skills/jira/references/mcp.md b/skills/jira/references/mcp.md
index 0225d30..5a15e1c 100644
--- a/skills/jira/references/mcp.md
+++ b/skills/jira/references/mcp.md
@@ -6,7 +6,7 @@ Complete reference for Atlassian Jira operations via MCP.
 
 ### Search Operations
 
-#### `mcp__atlassian__searchJiraIssuesUsingJql`
+#### `mcp__claude_ai_Atlassian_Rovo__searchJiraIssuesUsingJql`
 Search Jira using JQL (Jira Query Language).
 
 **Parameters:**
@@ -17,12 +17,12 @@ Search Jira using JQL (Jira Query Language).
 
 **Example:**
 ```
-mcp__atlassian__searchJiraIssuesUsingJql(jql: "project = PROJ AND status = 'In Progress'")
+mcp__claude_ai_Atlassian_Rovo__searchJiraIssuesUsingJql(jql: "project = PROJ AND status = 'In Progress'")
 ```
 
 ### Issue Operations
 
-#### `mcp__atlassian__getJiraIssue`
+#### `mcp__claude_ai_Atlassian_Rovo__getJiraIssue`
 Retrieve full issue details by key.
 
 **Parameters:**
@@ -31,10 +31,10 @@ Retrieve full issue details by key.
 
 **Example:**
 ```
-mcp__atlassian__getJiraIssue(issueKey: "PROJ-123")
+mcp__claude_ai_Atlassian_Rovo__getJiraIssue(issueKey: "PROJ-123")
 ```
 
-#### `mcp__atlassian__createJiraIssue`
+#### `mcp__claude_ai_Atlassian_Rovo__createJiraIssue`
 Create a new issue.
 
 **Parameters:**
@@ -50,7 +50,7 @@ Create a new issue.
 
 **Example:**
 ```
-mcp__atlassian__createJiraIssue(
+mcp__claude_ai_Atlassian_Rovo__createJiraIssue(
   projectKey: "PROJ",
   issueType: "Story",
   summary: "Implement user authentication",
@@ -59,7 +59,7 @@ mcp__atlassian__createJiraIssue(
 )
 ```
 
-#### `mcp__atlassian__editJiraIssue`
+#### `mcp__claude_ai_Atlassian_Rovo__editJiraIssue`
 Update an existing issue.
 
 **Parameters:**
@@ -68,7 +68,7 @@ Update an existing issue.
 
 **Example:**
 ```
-mcp__atlassian__editJiraIssue(
+mcp__claude_ai_Atlassian_Rovo__editJiraIssue(
   issueKey: "PROJ-123",
   description: "Updated description with more details..."
 )
@@ -76,7 +76,7 @@ mcp__atlassian__editJiraIssue(
 
 ### Transition Operations
 
-#### `mcp__atlassian__getTransitionsForJiraIssue`
+#### `mcp__claude_ai_Atlassian_Rovo__getTransitionsForJiraIssue`
 Get available status transitions for an issue.
 
 **Parameters:**
@@ -84,7 +84,7 @@ Get available status transitions for an issue.
 
 **Returns:** List of available transitions with IDs and names.
 
-#### `mcp__atlassian__transitionJiraIssue`
+#### `mcp__claude_ai_Atlassian_Rovo__transitionJiraIssue`
 Change issue status.
 
 **Parameters:**
@@ -99,7 +99,7 @@ Change issue status.
 
 ### Comment Operations
 
-#### `mcp__atlassian__addCommentToJiraIssue`
+#### `mcp__claude_ai_Atlassian_Rovo__addCommentToJiraIssue`
 Add a comment to an issue.
 
 **Parameters:**
@@ -108,7 +108,7 @@ Add a comment to an issue.
 
 ### User Operations
 
-#### `mcp__atlassian__lookupJiraAccountId`
+#### `mcp__claude_ai_Atlassian_Rovo__lookupJiraAccountId`
 Find user account ID for assignments.
 
 **Parameters:**
@@ -116,20 +116,20 @@ Find user account ID for assignments.
 
 **Example:**
 ```
-mcp__atlassian__lookupJiraAccountId(query: "user@example.com")
+mcp__claude_ai_Atlassian_Rovo__lookupJiraAccountId(query: "user@example.com")
 ```
 
 **Usage:** Always look up account IDs before assigning issues.
 
 ### Project Operations
 
-#### `mcp__atlassian__getVisibleJiraProjects`
+#### `mcp__claude_ai_Atlassian_Rovo__getVisibleJiraProjects`
 List available Jira projects.
 
 **Parameters:**
 - `maxResults`: Maximum results
 
-#### `mcp__atlassian__getJiraProjectIssueTypesMetadata`
+#### `mcp__claude_ai_Atlassian_Rovo__getJiraProjectIssueTypesMetadata`
 Get issue types and required fields for a project.
 
 **Parameters:**
@@ -137,7 +137,7 @@ Get issue types and required fields for a project.
 
 **Usage:** Call before creating issues to understand required fields.
 
-#### `mcp__atlassian__getJiraIssueTypeMetaWithFields`
+#### `mcp__claude_ai_Atlassian_Rovo__getJiraIssueTypeMetaWithFields`
 Get detailed field metadata for an issue type.
 
 **Parameters:**
@@ -272,7 +272,7 @@ resolved >= -7d AND project = PROJ ORDER BY resolved DESC
 ## Issue Linking
 
 ### Limitation
-The Atlassian MCP does not currently support creating issue links. Use the bundled `jira-link-issues` script instead.
+The Atlassian MCP does not currently support creating issue links. Use the `jira` CLI (`jira issue link`) or the Jira REST API directly.
 
 ### Link Types
 
@@ -284,17 +284,17 @@ The Atlassian MCP does not currently support creating issue links. Use the bundl
 | Clones | is cloned by | clones | Cloned issues |
 | Duplicates | is duplicated by | duplicates | Duplicate issues |
 
-### Script Usage
+### CLI Usage
 
 ```bash
 # Link PROJ-123 depends on PROJ-456
-~/.claude/skills/jira/jira-link-issues PROJ-123 PROJ-456 "Depends On"
+jira issue link PROJ-123 PROJ-456 "Depends On"
 
 # PROJ-100 blocks PROJ-200
-~/.claude/skills/jira/jira-link-issues PROJ-100 PROJ-200 "Blocks"
+jira issue link PROJ-100 PROJ-200 "Blocks"
 
 # General relationship
-~/.claude/skills/jira/jira-link-issues PROJ-50 PROJ-75 "Relates To"
+jira issue link PROJ-50 PROJ-75 "Relates To"
 ```
 
 ### Finding Link Types
@@ -406,19 +406,19 @@ Before creating issues:
 
 ```
 1. Get available transitions:
-   mcp__atlassian__getTransitionsForJiraIssue(issueKey: "PROJ-123")
+   mcp__claude_ai_Atlassian_Rovo__getTransitionsForJiraIssue(issueKey: "PROJ-123")
    → Returns list with transition IDs
 
 2. Find "Done" transition ID from response
 
 3. Execute transition:
-   mcp__atlassian__transitionJiraIssue(
+   mcp__claude_ai_Atlassian_Rovo__transitionJiraIssue(
      issueKey: "PROJ-123",
      transitionId: "done_id"
    )
 
 4. Add comment:
-   mcp__atlassian__addCommentToJiraIssue(
+   mcp__claude_ai_Atlassian_Rovo__addCommentToJiraIssue(
      issueKey: "PROJ-123",
      body: "Completed and deployed"
    )
@@ -428,11 +428,11 @@ Before creating issues:
 
 ```
 1. Look up user account ID:
-   mcp__atlassian__lookupJiraAccountId(query: "john@example.com")
+   mcp__claude_ai_Atlassian_Rovo__lookupJiraAccountId(query: "john@example.com")
    → Returns account ID
 
 2. Create issue with assignment:
-   mcp__atlassian__createJiraIssue(
+   mcp__claude_ai_Atlassian_Rovo__createJiraIssue(
      projectKey: "PROJ",
      issueType: "Task",
      summary: "Implement feature X",
@@ -444,7 +444,7 @@ Before creating issues:
 ### List My In-Progress Issues
 
 ```
-mcp__atlassian__searchJiraIssuesUsingJql(
+mcp__claude_ai_Atlassian_Rovo__searchJiraIssuesUsingJql(
   jql: "assignee = currentUser() AND status = 'In Progress' ORDER BY updated DESC"
 )
 ```
@@ -453,13 +453,13 @@ mcp__atlassian__searchJiraIssuesUsingJql(
 
 ```
 1. List available projects:
-   mcp__atlassian__getVisibleJiraProjects()
+   mcp__claude_ai_Atlassian_Rovo__getVisibleJiraProjects()
 
 2. Get issue types for project:
-   mcp__atlassian__getJiraProjectIssueTypesMetadata(projectKey: "PROJ")
+   mcp__claude_ai_Atlassian_Rovo__getJiraProjectIssueTypesMetadata(projectKey: "PROJ")
 
 3. Create issue with correct type:
-   mcp__atlassian__createJiraIssue(
+   mcp__claude_ai_Atlassian_Rovo__createJiraIssue(
      projectKey: "PROJ",
      issueType: "Story",
      summary: "...",
diff --git a/skills/lesson-learned/SKILL.md b/skills/lesson-learned/SKILL.md
index e358582..4b257e5 100644
--- a/skills/lesson-learned/SKILL.md
+++ b/skills/lesson-learned/SKILL.md
@@ -5,101 +5,130 @@ description: "Analyze recent code changes via git history and extract software e
 
 # Lesson Learned
 
-Extract specific, grounded software engineering lessons from actual code changes. Not a lecture -- a mirror. Show the user what their code already demonstrates.
+Extract specific, grounded software engineering lessons from actual code changes. Not a lecture — a mirror. Show the user what their code already demonstrates.
 
-## Before You Begin
+## Mindset
 
-**Load the principles reference first.**
+1. **The diff is the truth.** Commit messages declare intent; the diff reveals what actually happened. When they conflict, the diff wins.
+2. **One sharp lesson beats seven vague ones.** Resist the pull to list every applicable principle. Find the single most instructive pattern and go deep on it.
+3. **Recognition is as valuable as correction.** Naming a good pattern the user applied reinforces it just as powerfully as identifying a flaw. Most diffs contain both.
+4. **Specificity is the only currency.** Every claim must point to a real file, a real line, a real commit. "Your code shows good separation of concerns" is worthless without: "see how `auth.js` no longer imports from `db.js` after commit `a3f2b1`."
+5. **Trivial is a valid answer.** Not every change teaches something. Forcing a lesson out of a config tweak or typo fix is worse than saying "good housekeeping — nothing deeper here."
 
-1. Read `references/se-principles.md` to have the principle catalog available
-2. Optionally read `references/anti-patterns.md` if you suspect the changes include areas for improvement
-3. Determine the scope of analysis (see Phase 1)
+## Navigation
 
-**Do not proceed until you've loaded at least `se-principles.md`.**
+### When to Use
 
-## Phase 1: Determine Scope
+- User asks "what's the lesson?", "what did I learn?", "reflect on this", "engineering takeaway"
+- After a PR merge or feature branch completion
+- When a user wants to turn a debugging session into transferable knowledge
+- Post-incident: "what does this bug teach me?"
 
-Ask the user or infer from context what to analyze.
+### When NOT to Use
 
-| Scope | Git Commands | When to Use |
-|-------|-------------|-------------|
-| Feature branch | `git log main..HEAD --oneline` + `git diff main...HEAD` | User is on a non-main branch (default) |
-| Last N commits | `git log --oneline -N` + `git diff HEAD~N..HEAD` | User specifies a range, or on main (default N=5) |
-| Specific commit | `git show <sha>` | User references a specific commit |
-| Working changes | `git diff` + `git diff --cached` | User says "what about these changes?" before committing |
+- User wants a code *review* (correctness/bugs) → use `/code-review` instead
+- User wants architectural advice without a diff → use general reasoning, not this skill
+- No git repo or no commits yet → tell the user directly, don't fake a lesson
+- The user wants to understand *what the code does*, not *what it teaches*
 
-**Default behavior:**
-- If on a feature branch: analyze branch commits vs main
-- If on main: analyze the last 5 commits
-- If the user provides a different scope, use that
+### Decision Tree: Principle Selection
 
-## Phase 2: Gather Changes
+```
+Is there a dominant structural change? (split file, extracted function, new abstraction)
+  YES → Design/Structural principles (SRP, SoC, DIP, Encapsulation)
+  NO ↓
 
-1. Run `git log` with the determined scope to get the commit list and messages
-2. Run `git diff` for the full diff of the scope
-3. If the diff is large (>500 lines), use `git diff --stat` first, then selectively read the top 3-5 most-changed files
-4. **Read commit messages carefully** -- they contain intent that raw diffs miss
-5. Only read changed files. Do not read the entire repo.
+Is the change primarily a bug fix or edge-case handler?
+  YES → Pragmatic principles (Fail Fast, Defensive Programming, Boy Scout Rule)
+  NO ↓
 
-## Phase 3: Analyze
+Is there duplicated or consolidated code?
+  YES → Simplicity principles (DRY, Rule of Three, KISS)
+  NO ↓
 
-Identify the **dominant pattern** -- the single most instructive thing about these changes.
+Is the change adding speculative features or over-engineering?
+  YES → YAGNI or Premature Abstraction (anti-pattern)
+  NO ↓
 
-Look for:
-- **Structural decisions** -- How was the code organized? Why those boundaries?
-- **Trade-offs made** -- What was gained vs. sacrificed? (readability vs. performance, DRY vs. clarity, speed vs. correctness)
-- **Problems solved** -- What was the before/after? What made the "after" better?
-- **Missed opportunities** -- Where could the code improve? (present gently as "next time, consider...")
+Is the change scattered across many unrelated files?
+  YES → Shotgun Surgery (anti-pattern) — question the abstraction boundaries
+  NO → Default to "Boy Scout Rule" — incremental improvement, no single dominant lesson
+```
 
-Map findings to specific principles from `references/se-principles.md`. Be specific -- quote actual code, reference actual file names and line changes.
+## Philosophy
 
-## Phase 4: Present the Lesson
+A lesson learned from code is only real if it changes how the engineer writes the *next* line. Ground every observation so tightly in the actual diff that the user cannot mistake it for generic advice.
 
-Use this template:
+## Before You Begin
 
-```markdown
-## Lesson: [Principle Name]
+**Load references first.**
 
-**What happened in the code:**
-[2-3 sentences describing the specific change, referencing files and commits]
+1. Read `references/se-principles.md` — principle catalog for mapping observations
+2. Read `references/anti-patterns.md` if the diff signals code smells
+3. If git commands fail or behave unexpectedly, load `references/git-edge-cases.md`
 
-**The principle at work:**
-[1-2 sentences explaining the SE principle]
+**Do not proceed until you've loaded at least `se-principles.md`.**
 
-**Why it matters:**
-[1-2 sentences on the practical consequence -- what would go wrong without this, or what goes right because of it]
+## Phase 1: Determine Scope
 
-**Takeaway for next time:**
-[One concrete, actionable sentence the user can apply to future work]
-```
+| Scope | Git Commands | When to Use |
+|-------|-------------|-------------|
+| Feature branch | `git log main..HEAD --oneline` + `git diff main...HEAD` | On a non-main branch (default) |
+| Last N commits | `git log --oneline -N` + `git diff HEAD~N..HEAD` | User specifies range, or on main (default N=5) |
+| Specific commit | `git show <sha>` | User references a specific commit |
+| Working changes | `git diff` + `git diff --cached` | "What about these changes?" before committing |
 
-If there is a second lesson worth noting (maximum 2 additional):
+**Default behavior:**
+- Feature branch → analyze branch commits vs. main
+- On main → last 5 commits
+- User-specified → use that
 
-```markdown
----
+**Git failures:** If any git command errors (shallow clone, detached HEAD, empty repo), load `references/git-edge-cases.md` before continuing. Do not silently ignore errors.
 
-### Also worth noting: [Principle Name]
+## Phase 2: Gather Changes
 
-**In the code:** [1 sentence]
-**The principle:** [1 sentence]
-**Takeaway:** [1 sentence]
-```
+1. Run `git log` with the determined scope — read commit messages as primary intent signals
+2. Run `git diff` for the full diff
+3. If diff >500 lines: use `git diff --stat` first, then selectively read the top 3-5 most-changed files
+4. Skip binary files, lock files, and generated files (package-lock.json, *.min.js, dist/) — note the exclusion
+5. Only read changed files. Do not read the entire repo.
 
-## What NOT to Do
+## Phase 3: Analyze
 
-| Avoid | Why | Instead |
-|-------|-----|---------|
-| Listing every principle that vaguely applies | Overwhelming and generic | Pick the 1-2 most relevant |
-| Analyzing files that were not changed | Scope creep | Stick to the diff |
-| Ignoring commit messages | They contain intent that diffs miss | Read them as primary context |
-| Abstract advice disconnected from the code | Not actionable | Always reference specific files/lines |
-| Negative-only feedback | Demoralizing | Lead with what works, then suggest improvements |
-| More than 3 lessons | Dilutes the insight | One well-grounded lesson beats seven vague ones |
+Use the **Decision Tree** in the Navigation section to identify which class of principle applies. Then find the **dominant pattern** — the single most instructive thing about these changes.
 
-## Conversation Style
+Look for:
+- **Structural decisions** — How was the code organized? Why those boundaries?
+- **Trade-offs made** — What was gained vs. sacrificed? (readability vs. performance, DRY vs. clarity)
+- **Problems solved** — What was the before/after? What made the "after" better?
+- **Missed opportunities** — Where could the code improve? (present gently as "next time, consider...")
+
+Map findings to specific principles from `references/se-principles.md`. Quote actual code, reference actual file names and line changes.
+
+## Phase 4: Present the Lesson
 
-- **Reflective, not prescriptive.** Use the user's own code as primary evidence.
-- **Never say "you should have..."** -- instead use "the approach here shows..." or "next time you face this, consider..."
-- **If the code is good, say so.** Not every lesson is about what went wrong. Recognizing good patterns reinforces them.
-- **If the changes are trivial** (a single config tweak, a typo fix), say so honestly rather than forcing a lesson. "These changes are straightforward -- no deep lesson here, just good housekeeping."
-- **Be specific.** Generic advice is worthless. Every claim must point to a concrete code change.
+Load `references/output-template.md` for the exact formatting templates. Key rules:
+- 1 primary lesson, maximum 2 secondary
+- Always reference specific files and commits
+- Lead with what works before noting what could improve
+- Trivial diffs get the "good housekeeping" response, not a forced lesson
+
+## NEVER
+
+- **NEVER analyze files not in the diff.** Reading the whole repo for "context" is scope creep that introduces irrelevant principles and wastes context budget.
+- **NEVER list every principle that loosely applies.** Presenting 5+ principles for one diff signals you couldn't identify what actually matters — it's noise, not signal.
+- **NEVER say "you should have..."** It's past tense and implies failure. The user can't undo committed code. Use "next time, consider..." to keep the lesson forward-facing.
+- **NEVER force a lesson on trivial changes.** A config bump or typo fix has no engineering lesson. Fabricating one trains the user to distrust the skill's signal.
+- **NEVER ignore git errors silently.** A shallow clone, detached HEAD, or empty repo changes which commands work. Running `git diff main...HEAD` in a detached HEAD silently returns an empty diff — you'll present a lesson based on nothing.
+- **NEVER present anti-pattern findings without a positive anchor.** Leading with "your code has a God Object problem" without first acknowledging what works is demoralizing and incomplete analysis.
+- **NEVER use generic principle names as the lesson title without code evidence.** "Lesson: DRY" is not a lesson. "Lesson: DRY — the shared `validateInput()` extracted in commit `b2d4a1` eliminated 47 lines of duplication across 3 handlers" is.
+
+## When Things Go Wrong
+
+| Symptom | Recovery |
+|---------|----------|
+| `git diff` returns empty output | Check: detached HEAD? Shallow clone? Wrong base branch? Load `references/git-edge-cases.md`. Never present "no changes found" without diagnosing why. |
+| Diff is >1000 lines | Use `git diff --stat` + selectively read top 3-5 files by semantic importance (not line count). Skip generated/lock files. Tell the user what was excluded. |
+| No clear dominant principle | Default to Boy Scout Rule if the changes are scattered small cleanups. If the changes are large but structurally incoherent, that *is* the lesson: Shotgun Surgery. |
+| User disagrees with the lesson | Ask: "What did you feel was the hardest decision in this change?" Their answer is the real lesson. Revise based on their actual intent. |
+| Shallow clone blocks history | Run `git fetch --deepen=50`, then retry. If still blocked, fall back to `git show HEAD` for the latest commit only and note the limitation. |
diff --git a/skills/lesson-learned/references/git-edge-cases.md b/skills/lesson-learned/references/git-edge-cases.md
new file mode 100644
index 0000000..d9ccc76
--- /dev/null
+++ b/skills/lesson-learned/references/git-edge-cases.md
@@ -0,0 +1,98 @@
+---
+description: Git edge cases and error handling for lesson-learned skill. Load when git commands fail or produce unexpected output.
+---
+
+# Git Edge Cases
+
+Reference for handling unusual repository states when gathering diff context.
+
+## Shallow Clone
+
+**Symptom:** `git log` shows truncated history; `git diff main...HEAD` errors with "fatal: no merge base".
+
+**Detection:** `git rev-parse --is-shallow-repository` returns `true`.
+
+**Recovery:**
+```bash
+# Deepen enough to get useful history without full clone
+git fetch --deepen=50
+# Then retry your original diff command
+```
+
+**Lesson-learned impact:** If the diff is unavailable, fall back to `git show HEAD` for the latest commit only. Tell the user: "This is a shallow clone — analysis limited to the most recent commit visible."
+
+**NEVER do:** `git commit --amend` in a shallow clone when there is a remote tracking branch. Amending rewrites the tip SHA; a subsequent `git push --force` can silently drop commits that exist on the remote but not in the shallow history. Recover orphaned commits via `git fetch origin <sha>`.
+
+## Detached HEAD
+
+**Symptom:** `git branch` shows `* (HEAD detached at <sha>)`. `git log main..HEAD` may return nothing or error.
+
+**Detection:** `git symbolic-ref --quiet HEAD` exits non-zero.
+
+**Recovery:**
+```bash
+# Identify what the detached HEAD contains vs. nearest branch
+git log --oneline $(git merge-base HEAD main)..HEAD
+```
+
+**Lesson-learned impact:** Use commit-range mode rather than branch-mode analysis. Default to `git log --oneline -5` + `git diff HEAD~5..HEAD`.
+
+## No Commits Yet (Empty Repo)
+
+**Symptom:** `git log` returns "fatal: your current branch 'main' does not have any commits yet."
+
+**Recovery:** Nothing to analyze. Tell the user: "No commits found — make at least one commit before running a lesson-learned analysis."
+
+## Merge Commits in Range
+
+**Symptom:** `git diff main...HEAD` includes changes from merged branches that are unrelated to the user's work.
+
+**Detection:** `git log --merges main..HEAD` returns results.
+
+**Recovery:** Use `--no-merges` and `--first-parent` flags:
+```bash
+git log --no-merges --first-parent --oneline main..HEAD
+git diff --no-merges main...HEAD
+```
+
+**Lesson-learned impact:** Ignore merge-only commits. Analyze only the substantive commits.
+
+## Very Large Diffs (>1000 lines)
+
+**Symptom:** `git diff` output exceeds context budget.
+
+**Recovery strategy (in order):**
+1. Run `git diff --stat` to identify the top 3-5 most-changed files
+2. Read each file's diff individually: `git diff main...HEAD -- path/to/file`
+3. Focus on the files with the highest semantic change (not just line count)
+4. If a single file dominates (e.g., a generated file), skip it and note: "Excluding auto-generated file X from analysis."
+
+## Binary or Generated Files
+
+**Symptom:** `git diff` shows "Binary files differ" or a massive JSON/CSS/lock file change.
+
+**Recovery:** Skip binary files entirely. For generated files (package-lock.json, yarn.lock, *.min.js, dist/), note: "Lock/generated files excluded from analysis — these contain no engineering lessons."
+
+## Uncommitted Staged + Unstaged Mix
+
+**Symptom:** User says "analyze these changes" before committing; both staged and unstaged changes exist.
+
+**Recovery:**
+```bash
+git diff          # unstaged
+git diff --cached # staged
+```
+
+Present both diffs together as the full working set. Note which changes are staged vs. not, as that itself may be a lesson (partial staging as a discipline).
+
+## Remote Tracking Branch Mismatch
+
+**Symptom:** `git log main..HEAD` shows 0 commits even though local changes exist, because local `main` is behind remote.
+
+**Recovery:**
+```bash
+git fetch --dry-run  # check without modifying
+git log origin/main..HEAD --oneline
+```
+
+Use `origin/main` as the base when local `main` hasn't been updated.
diff --git a/skills/lesson-learned/references/output-template.md b/skills/lesson-learned/references/output-template.md
new file mode 100644
index 0000000..53beb36
--- /dev/null
+++ b/skills/lesson-learned/references/output-template.md
@@ -0,0 +1,70 @@
+---
+description: Output templates for the lesson-learned skill. Load during Phase 4 when formatting the final lesson presentation.
+---
+
+# Output Templates
+
+## Primary Lesson (always present)
+
+```markdown
+## Lesson: [Principle Name]
+
+**What happened in the code:**
+[2-3 sentences describing the specific change, referencing files and commits]
+
+**The principle at work:**
+[1-2 sentences explaining the SE principle]
+
+**Why it matters:**
+[1-2 sentences on the practical consequence — what would go wrong without this, or what goes right because of it]
+
+**Takeaway for next time:**
+[One concrete, actionable sentence the user can apply to future work]
+```
+
+## Secondary Lessons (maximum 2, optional)
+
+```markdown
+---
+
+### Also worth noting: [Principle Name]
+
+**In the code:** [1 sentence pointing to specific file/line]
+**The principle:** [1 sentence]
+**Takeaway:** [1 sentence]
+```
+
+## Trivial Changes (no lesson)
+
+Use when the diff contains only config tweaks, typo fixes, dependency bumps, or trivial reformatting:
+
+```markdown
+These changes are straightforward — no deep lesson here, just good housekeeping.
+
+[Optional: 1 sentence noting what was done, e.g., "Bumped dependency versions and fixed a typo in the README."]
+```
+
+## Anti-Pattern Found
+
+Use when the primary observation is a code smell, not a positive pattern:
+
+```markdown
+## Observation: [Anti-Pattern Name]
+
+**What the code shows:**
+[2-3 sentences describing the specific signal observed in the diff]
+
+**The concern:**
+[1-2 sentences explaining why this pattern creates problems]
+
+**One thing to try next time:**
+[One concrete, non-judgmental suggestion]
+```
+
+## Formatting Rules
+
+- Always reference specific file names and commit SHAs or short hashes
+- Never use "you should have" — use "next time, consider..." or "the code here shows..."
+- Lead with what works before noting what could improve
+- Cap total output at 3 lessons (1 primary + 2 secondary); more dilutes the insight
+- If the lesson is about an anti-pattern, still find something positive to anchor first
diff --git a/skills/marp-slide/SKILL.md b/skills/marp-slide/SKILL.md
index 2d9f1ae..c3f3ff7 100644
--- a/skills/marp-slide/SKILL.md
+++ b/skills/marp-slide/SKILL.md
@@ -1,237 +1,127 @@
 ---
 name: marp-slide
-description: Create professional Marp presentation slides with 7 beautiful themes (default, minimal, colorful, dark, gradient, tech, business). Use when users request slide creation, presentations, or Marp documents. Supports custom themes, image layouts, and "make it look good" requests with automatic quality improvements.
+description: Create professional Marp presentation slides with 7 built-in themes (default, minimal, colorful, dark, gradient, tech, business). Use when users request slides, presentations, Marp documents, or say "make it look good", "かっこよく", "良い感じに". Supports custom themes, image layouts, and vague aesthetic improvement requests.
 ---
 
 # Marp Slide Creator
 
-Create professional, visually appealing Marp presentation slides with 7 pre-designed themes and built-in best practices.
+## Mindset
 
-## When to Use This Skill
+1. **Marp is a renderer, not an editor** — every directive, class name, and image keyword you write is parsed by a CSS engine. Typos render silently wrong, not with errors. Validate by mental-rendering each slide before outputting.
+2. **One slide = one claim** — if you need more than 5 bullets to cover a slide's topic, it's two slides. Density kills presentation slides; practitioners split aggressively.
+3. **CSS in templates is load-bearing** — the embedded `<style>` block in each template is what makes the theme work. Stripping or moving it breaks every slide. Copy the whole template block, then replace content only.
+4. **`bg` images live on a separate layer** — `![bg right:40%](img.png)` is not an inline image; it's a CSS background panel. Placing it after bullets instead of before them doesn't change the layout — position in the markdown file is irrelevant for `bg` images; only the keyword matters.
+5. **Frontmatter drives the output format** — `size: 16:9` vs `4:3` changes every margin calculation downstream. Choose before writing content, not after.
 
-Use this skill when the user:
-- Requests to create presentation slides or Marp documents
-- Asks to "make slides look good" or "improve slide design"
-- Provides vague instructions like "良い感じにして" (make it nice) or "かっこよく" (make it cool)
-- Wants to create lecture or seminar materials
-- Needs bullet-point focused slides with occasional images
-
-## Quick Start
-
-### Step 1: Select Theme
-
-First, determine the appropriate theme based on the user's request and content.
-
-**Quick theme selection:**
-- **Technical/Developer content** → tech theme
-- **Business/Corporate** → business theme
-- **Creative/Event** → colorful or gradient theme
-- **Academic/Simple** → minimal theme
-- **General/Unsure** → default theme
-- **Dark background preferred** → dark or tech theme
-
-For detailed theme selection guidance, read `references/theme-selection.md`.
-
-### Step 2: Create Slides
-
-1. **Read relevant references first**:
-   - Always start by reading `references/marp-syntax.md` for basic syntax
-   - For images: `references/image-patterns.md` (official Marpit image syntax)
-   - For advanced features (math, emoji): `references/advanced-features.md`
-   - For custom themes: `references/theme-css-guide.md`
-
-2. Copy content from the appropriate template file:
-   - `assets/template-basic.md` - Default theme (most common)
-   - `assets/template-minimal.md` - Minimal theme
-   - `assets/template-colorful.md` - Colorful theme
-   - `assets/template-dark.md` - Dark mode theme
-   - `assets/template-gradient.md` - Gradient theme
-   - `assets/template-tech.md` - Tech/code theme
-   - `assets/template-business.md` - Business theme
-
-3. Read `references/best-practices.md` for quality guidelines
-
-4. Structure content following best practices:
-   - Title slide with `<!-- _class: lead -->`
-   - Concise h2 titles (5-7 characters in Japanese)
-   - 3-5 bullet points per slide
-   - Adequate whitespace
-
-5. Add images if needed using patterns from `references/image-patterns.md`
-
-6. Save to `/mnt/user-data/outputs/` with `.md` extension
-
-## Available Themes
-
-### 1. Default Theme
-**Colors**: Beige background, navy text, blue headings
-**Style**: Clean, sophisticated with decorative lines
-**Use for**: General seminars, lectures, presentations
-**Template**: `template-basic.md`
-
-### 2. Minimal Theme
-**Colors**: White background, gray text, black headings
-**Style**: Minimal decoration, wide margins, light fonts
-**Use for**: Content-focused presentations, academic talks
-**Template**: `template-minimal.md`
-
-### 3. Colorful & Pop Theme
-**Colors**: Pink gradient background, multi-color accents
-**Style**: Vibrant gradients, bold fonts, rainbow accents
-**Use for**: Youth-oriented events, creative projects
-**Template**: `template-colorful.md`
-
-### 4. Dark Mode Theme
-**Colors**: Black background, cyan/purple accents
-**Style**: Dark theme with glow effects, eye-friendly
-**Use for**: Tech presentations, evening talks, modern look
-**Template**: `template-dark.md`
-
-### 5. Gradient Background Theme
-**Colors**: Purple/pink/blue/green gradients (varies per slide)
-**Style**: Different gradient per slide, white text, shadows
-**Use for**: Visual-focused, creative presentations
-**Template**: `template-gradient.md`
-
-### 6. Tech/Code Theme
-**Colors**: GitHub-style dark background, blue/green accents
-**Style**: Code fonts, Markdown-style headers with # symbols
-**Use for**: Programming tutorials, tech meetups, developer content
-**Template**: `template-tech.md`
-
-### 7. Business Theme
-**Colors**: White background, navy headings, blue accents
-**Style**: Corporate presentation style, top border, table support
-**Use for**: Business presentations, proposals, reports
-**Template**: `template-business.md`
-
-## Creating Slides Process
-
-### Basic Workflow
-
-1. **Understand requirements**
-   - Identify content: title, topics, key points
-   - Determine target audience
-   - Assess formality level
-
-2. **Select theme**
-   - Use quick selection rules above
-   - If uncertain, consult `references/theme-selection.md`
-   - Default to default theme if still unsure
-
-3. **Apply template**
-   - Load appropriate template from `assets/`
-   - CSS is already embedded - no external files needed
-   - Maintain template structure
-
-4. **Structure content**
-   - Title slide: `<!-- _class: lead -->` + h1
-   - Content slides: h2 title + bullet points
-   - Keep titles to 5-7 characters (Japanese)
-   - Use 3-5 bullet points per slide
-
-5. **Refine quality**
-   - Read `references/best-practices.md`
-   - Ensure adequate whitespace
-   - Maintain consistency
-   - Keep text concise (15-25 chars per line)
-
-6. **Add images**
-   - If needed, consult `references/image-patterns.md`
-   - Common: `![bg right:40%](image.png)` for side images
-   - Use proper Marp image syntax
-
-7. **Output file**
-   - Save to `/mnt/user-data/outputs/`
-   - Use descriptive filename like `presentation.md`
-
-## Handling "Make It Look Good" Requests
-
-When users give vague instructions like "良い感じにして", "かっこよく", or "make it cool":
-
-1. **Infer theme from content**:
-   - Business content → business theme
-   - Technical content → tech or dark theme
-   - Creative content → gradient or colorful theme
-   - General → default theme
-
-2. **Apply best practices automatically**:
-   - Shorten titles to 5-7 characters
-   - Limit bullet points to 3-5 items
-   - Add adequate whitespace
-   - Use consistent structure
-
-3. **Enhance visual hierarchy**:
-   - Use h3 for sub-sections when appropriate
-   - Break up dense text into multiple slides
-   - Ensure logical flow (intro → body → conclusion)
-
-4. **Maintain professional tone**:
-   - Match formality to content
-   - Use parallel structure in lists
-   - Keep technical terms consistent
-
-## Image Integration
-
-For slides with images, consult `references/image-patterns.md` for detailed syntax.
-
-Common patterns:
-- **Side image**: `![bg right:40%](image.png)` - Image on right, text on left
-- **Centered**: `![w:600px](image.png)` - Centered with specific width
-- **Full background**: `![bg](image.png)` - Full-screen background
-- **Multiple images**: Multiple `![bg]` declarations
-
-Example lecture pattern:
+## Navigation
+
+**Use this skill when**: user asks to create slides, a presentation, or a Marp `.md` file; user says "make it look good", "かっこよく", "良い感じに"; user provides raw content and wants it formatted as slides.
+
+**Do NOT use this skill when**: user wants a static document or report (use standard markdown); user explicitly wants PowerPoint `.pptx` without Marp pipeline; user wants interactive web animations.
+
+**Quick theme decision**:
+- Technical / code / developer → `tech` or `dark`
+- Business / corporate / proposal → `business`
+- Academic / content-first → `minimal`
+- Creative / event / youth → `colorful` or `gradient`
+- General / unsure → `default`
+
+## Philosophy
+
+Every slide is a contract with the audience: one idea, clearly stated, visually uncluttered. The theme provides the visual contract; your job is to enforce the content contract — no slide should require the speaker to explain what the slide is about before explaining what it means.
+
+## NEVER
+
+- NEVER use `---` as a slide separator *inside* a fenced code block — because Marp's parser is line-level and will split the slide mid-code-block, destroying the fence and producing broken HTML output.
+- NEVER embed raw `<style>` overrides that redefine `section` padding below `40px` — because Marp's built-in themes size all content assuming the default section padding; reducing it causes text to collide with page-number badges and header/footer elements at export time.
+- NEVER use standard Markdown image syntax `![](img.png)` when you mean a background split — because without the `bg` keyword the image renders inline as a content-layer element, not as the CSS background panel, producing a broken layout that looks correct in VS Code preview but wrong in PDF/PPTX export.
+- NEVER put `<!-- _class: lead -->` on non-title slides to center content — because `lead` is a semantic class that also suppresses the slide's h2 border/decoration in most themes; using it on content slides silently removes structural styling mid-deck.
+- NEVER use `paginate: true` at the global level without also suppressing it on the title slide with `<!-- _paginate: false -->` — because page "1" printed on the title slide confuses audiences and is a practitioner tell that the author didn't check the output.
+- NEVER mix `*` (asterisk) unordered list bullets and `-` (dash) bullets on the same slide — because `*` activates Marp's **fragmented list** feature (progressive reveal in HTML mode), while `-` does not; mixing them produces one list that partially animates and one that doesn't, with no visual warning.
+- NEVER use `backgroundImage: url(...)` in frontmatter when your theme template already sets a background via CSS — because the frontmatter directive takes precedence over theme CSS, wiping the theme's gradient/texture and making the slide look unstyled even though it parsed correctly.
+
+## When Things Go Wrong
+
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| Slide splits mid-code-block | `---` separator inside fenced block | Move `---` outside the fence; use `<!-- -->` comment as visual break inside the block instead |
+| Background image covers all text | Missing `left`/`right` keyword on `![bg]` | Add `![bg right:40%]` to reserve text space; without directional keyword, `bg` is full-bleed |
+| Theme styling gone after edit | `<style>` block accidentally deleted | Re-copy the style block from the template in `assets/template-{theme}.md` |
+| Page numbers on title slide | `paginate: true` global, no per-slide override | Add `<!-- _paginate: false -->` immediately after the title slide's opening `---` |
+| Bullets animate unexpectedly in HTML | Mixed `*` and `-` list markers | Standardize all lists to `-`; reserve `*` only when progressive reveal is intentional |
+| Math formula renders as raw text | `marp: true` missing from frontmatter | Ensure frontmatter is present and `marp: true` is set; math requires Marp Core to activate KaTeX |
+
+## Workflow
+
+**1. Select theme** using the decision tree in Navigation above. If the user gave vague aesthetic instructions, infer from content domain.
+
+**2. Copy the full template** from `assets/template-{theme}.md` — do not reconstruct from scratch. The embedded CSS is non-trivial.
+
+**3. Set frontmatter**:
+```markdown
+---
+marp: true
+theme: default   # or the chosen theme name
+size: 16:9
+paginate: true
+---
+```
+
+**4. Title slide** (always first):
 ```markdown
-## Slide Title
+<!-- _class: lead -->
+<!-- _paginate: false -->
+
+# Presentation Title
+
+Presenter · Date
+```
+
+**5. Content slides** — enforce 1 message per slide, 3–5 bullets, concise h2 titles:
+```markdown
+---
+
+## Short Title
+
+- Point one (parallel structure)
+- Point two
+- Point three
+```
+
+**6. Image slides** — read `references/image-patterns.md` before using any `bg` syntax:
+```markdown
+---
+
+## Feature Overview
 
 ![bg right:40%](diagram.png)
 
-- Explanation point 1
-- Explanation point 2
-- Explanation point 3
+- Explanation 1
+- Explanation 2
 ```
 
-## File Output
-
-Always save the final Marp file to `/mnt/user-data/outputs/` with `.md` extension:
-- `presentation.md`
-- `seminar-slides.md`
-- `lecture-materials.md`
-
-## Quality Checklist
-
-Before delivering slides, verify:
-- [ ] Theme selected appropriately for content
-- [ ] CSS theme is embedded in the file
-- [ ] Title slide uses `<!-- _class: lead -->`
-- [ ] All h2 titles are concise (5-7 chars)
-- [ ] Bullet points are 3-5 items per slide
-- [ ] Images use proper Marp syntax
-- [ ] File saved to outputs directory
-- [ ] Content follows best practices
-
-## References
-
-### Core Documentation
-- **Marp syntax**: `references/marp-syntax.md` - Basic Marp/Marpit syntax (directives, frontmatter, pagination, etc.)
-- **Image patterns**: `references/image-patterns.md` - Official image syntax (bg, filters, split backgrounds)
-- **Theme CSS guide**: `references/theme-css-guide.md` - How to create custom themes based on Marpit specification
-- **Advanced features**: `references/advanced-features.md` - Math, emoji, fragmented lists, Marp CLI, VS Code
-- **Official themes**: `references/official-themes.md` - default, gaia, uncover themes documentation
-
-### Quality & Selection Guides
-- **Theme selection**: `references/theme-selection.md` - How to choose the right theme for content
-- **Best practices**: `references/best-practices.md` - Quality guidelines for "cool" slides
-
-### Templates & Assets
-- **Templates**: `assets/template-*.md` - Starting points with embedded CSS for each theme (7 themes)
-- **Standalone CSS**: `assets/theme-*.css` - CSS files for reference (already embedded in templates)
-
-### Official External Links
-- **Marp Official Site**: https://marp.app/
-- **Marpit Directives**: https://marpit.marp.app/directives
-- **Marpit Image Syntax**: https://marpit.marp.app/image-syntax
-- **Marpit Theme CSS**: https://marpit.marp.app/theme-css
-- **Marp Core GitHub**: https://github.com/marp-team/marp-core
-- **Marp CLI GitHub**: https://github.com/marp-team/marp-cli
+**7. Save** to `/mnt/user-data/outputs/` with `.md` extension.
+
+## Reference Loading Triggers
+
+Load these files only when needed — they are large:
+
+| Trigger | File to read |
+|---------|-------------|
+| Custom theme request, CSS questions | `references/theme-css-guide.md` |
+| Any `bg`, filter, or split-screen image | `references/image-patterns.md` |
+| Math formulas, emoji, fragmented lists | `references/advanced-features.md` |
+| Official theme names (`gaia`, `uncover`) | `references/official-themes.md` |
+| Theme selection uncertainty | `references/theme-selection.md` |
+| Quality review / "make it better" | `references/best-practices.md` |
+
+## Quality Gate
+
+Before delivering, verify:
+- [ ] `marp: true` in frontmatter
+- [ ] Title slide has `<!-- _class: lead -->` and `<!-- _paginate: false -->`
+- [ ] No `---` separators inside code fences
+- [ ] All `bg` images use directional keyword if text coexists on slide
+- [ ] List markers consistent (`-` only unless progressive reveal intended)
+- [ ] CSS `<style>` block intact from template
+- [ ] File saved to `/mnt/user-data/outputs/`
diff --git a/skills/meme-factory/README.md b/skills/meme-factory/README.md
index de38035..e5ebd8a 100644
--- a/skills/meme-factory/README.md
+++ b/skills/meme-factory/README.md
@@ -45,7 +45,7 @@ Key features:
 
 ### 2. Text Formatting
 - Spaces: Use `_` or `-`
-- Newlines: Use `~n`
+- Newlines: Use `%0A` (URL-encoded newline; `~n` is NOT valid for memegen.link)
 - Special chars: `~q` (question), `~p` (percent), `~s` (slash), `~h` (hash)
 
 ### 3. Customization Options
diff --git a/skills/meme-factory/SKILL.md b/skills/meme-factory/SKILL.md
index 37163c6..d49951e 100644
--- a/skills/meme-factory/SKILL.md
+++ b/skills/meme-factory/SKILL.md
@@ -1,331 +1,145 @@
 ---
 name: meme-factory
-description: Generate memes using the memegen.link API. Use when users request memes, want to add humor to content, or need visual aids for social media. Supports 100+ popular templates with custom text and styling.
+description: Generate memes using the memegen.link API or textual formats (greentext, copypasta, ASCII, wojak, etc.). Use when users request memes, want to add humor to content, create social media visuals, or need comedic commentary. Trigger keywords: meme, funny, humor, drake, distracted boyfriend, this is fine, greentext.
 ---
 
 # Meme Factory
 
-Create memes using the free memegen.link API and textual meme formats.
-
 ---
 
-## Triggers
+## Mindset
 
-| Trigger | Description |
-|---------|-------------|
-| `/meme-factory` | Manual invocation |
-| `/meme-factory {template} {top} {bottom}` | Direct meme generation |
-| `meme-factory: create a meme about X` | Natural language request |
+**Template-emotion fit trumps caption cleverness.** A mediocre caption in the right template lands harder than a perfect joke in the wrong one. Drake works for comparisons; Two-Buttons works for dilemmas; "fine" works for denial. Get the structure right first.
 
----
+**Irony has a half-life.** Overused templates (e.g., `distracted`, `changemind`) carry cultural baggage — the moment you pick one, the audience already predicts the punchline. Your text must subvert the expectation, not confirm it. A surprise in the bottom text is where the laugh lives.
 
-## Quick Reference
+**Text density is a trust fund you can't overdraw.** Each extra word is a withdrawal. The sweet spot is 3–5 words per caption line. Long-form humor belongs in textual formats (copypasta, chat logs); image memes are haiku, not essays.
 
-| Action | Format |
-|--------|--------|
-| Basic meme | `https://api.memegen.link/images/{template}/{top}/{bottom}.png` |
-| With sizing | `?width=1200&height=630` |
-| Custom background | `?style=https://example.com/image.jpg` |
-| All templates | https://api.memegen.link/templates/ |
-| Interactive docs | https://api.memegen.link/docs/ |
+**Memegen.link is stateless and idempotent.** Every parameter is in the URL. No upload, no session. If you can construct the URL correctly, it works. The failure modes are all encoding problems, not API problems.
 
-**Additional Resources:**
-- [Markdown Memes Guide](references/markdown-memes-guide.md) - 15+ textual meme formats
-- [Examples](references/examples.md) - Practical usage examples
-- [meme_generator.py](scripts/meme_generator.py) - Python helper script
+**Textual memes (greentext, wojak, chat logs) are zero-dependency.** They render anywhere Markdown renders, they're accessible, and they give more pacing control than image memes. Reach for them when the joke requires timing or escalation.
 
 ---
 
-## Quick Start
-
-### Basic Meme Structure
-
-```
-https://api.memegen.link/images/{template}/{top_text}/{bottom_text}.{extension}
-```
-
-**Example:**
-```
-https://api.memegen.link/images/buzz/memes/memes_everywhere.png
-```
+## Navigation
 
-Result: Buzz Lightyear meme with "memes" at top and "memes everywhere" at bottom.
+**Use this skill when**: user asks for a meme, wants to add humor to a post or doc, needs a social media image, or wants to comment on something ironically.
 
-### Text Formatting
+**Do NOT use this skill when**: the request needs sensitive humor (political, personal tragedy, marginalized groups) — generate text commentary instead. Do not produce memes targeting real individuals by name.
 
-| Character | Encoding |
-|-----------|----------|
-| Space | `_` or `-` |
-| Newline | `~n` |
-| Question mark | `~q` |
-| Percent | `~p` |
-| Slash | `~s` |
-| Hash | `~h` |
-| Single quote | `''` |
-| Double quote | `""` |
+**Quick decision tree:**
+- Needs visual + shareable URL → memegen.link image meme
+- Needs pacing, escalation, or long-form comedy → textual format (greentext, chat log, copypasta)
+- Blog/doc embed → either; prefer textual for accessibility
+- Social media card (1200×630) → image meme with `?width=1200&height=630`
 
 ---
 
-## Popular Templates
+## Philosophy
 
-| Template | Use Case | Example |
-|----------|----------|---------|
-| `buzz` | X, X everywhere | bugs/bugs_everywhere |
-| `drake` | Comparisons | manual_testing/automated_testing |
-| `success` | Victories | deployed/no_errors |
-| `fine` | Things going wrong | server_on_fire/this_is_fine |
-| `fry` | Uncertainty | not_sure_if_bug/or_feature |
-| `changemind` | Hot takes | tabs_are_better_than_spaces |
-| `distracted` | Priorities | my_code/new_framework/current_project |
-| `mordor` | One does not simply | one_does_not_simply/deploy_on_friday |
+The meme's job is compression: take a shared cultural structure (the template), inject a specific surprise (your caption), and let the delta between expectation and reality do the comedic work. Your role is to find the right template, write the minimum text that creates maximum surprise, and get out.
 
 ---
 
-## Template Selection Guide
+## NEVER
 
-| Context | Template | Why |
-|---------|----------|-----|
-| Comparing options | `drake` | Two-panel reject/approve format |
-| Celebrating wins | `success` | Positive outcome emphasis |
-| Problems ignored | `fine` | Ironic "everything is fine" |
-| Uncertainty | `fry` | "Not sure if X or Y" format |
-| Controversial opinion | `changemind` | Statement + challenge |
-| Ubiquitous things | `buzz` | "X, X everywhere" |
-| Bad ideas | `mordor` | "One does not simply..." |
+- **NEVER use `~n` as a newline escape** — it is not a valid memegen.link tilde-escape. The valid set is `~q` `~p` `~s` `~h`. Using `~n` renders literal `~n` text on the image. Use `%0A` for line breaks inside a caption segment.
 
----
+- **NEVER encode a literal hyphen as `-`** — memegen.link uses `--` (double hyphen) to represent a literal `-` in captions; a single `-` is treated as a space alternative. A caption like `ci-cd` becomes `ci--cd` in the URL, not `ci-cd`.
 
-## Validation
+- **NEVER use `_` to represent a literal underscore** — it encodes as `__` (double underscore). A single `_` is a space. Using one underscore for a literal underscore produces a space on the image.
 
-After generating a meme:
+- **NEVER pick a template by name recognition alone** — `distracted`, `changemind`, and `two-buttons` are irony-collapsed: so many uses exist that audiences see the punchline before reading it. Use them only when you can genuinely subvert the expected direction. Otherwise pick a fresher template.
 
-- [ ] URL returns valid image (test in browser)
-- [ ] Text is readable (not too long)
-- [ ] Template matches the message context
-- [ ] Special characters properly encoded
-- [ ] Dimensions appropriate for platform
+- **NEVER embed cultural-context-heavy templates without checking decay** — `ugandanknuckles`, `harold` (hide the pain), and `coffin-dance` had peak windows; outside that window they signal "out of touch." When in doubt, use structurally universal templates (`drake`, `gru-plan`, `surprised-pikachu`) that remain legible regardless of recency.
 
-### Platform Dimensions
+- **NEVER stack more than 2 image memes without a textual break** — readers experience visual fatigue; 3+ images in sequence reads as a slideshow, not comedy. Interleave text formats to control pacing.
 
-| Platform | Dimensions |
-|----------|------------|
-| Social media (Open Graph) | 1200x630 |
-| Slack/Discord | 800x600 |
-| GitHub | Default |
+- **NEVER leave alt text as `![meme]` or `![funny image]`** — screen readers announce it verbatim. The alt text should describe the joke, e.g., `![Drake rejecting "manual deploy", approving "CI/CD pipeline"]`. The description is the accessibility fallback for the punchline.
 
 ---
 
-## Anti-Patterns
+## When Things Go Wrong
 
-| Avoid | Why | Instead |
-|-------|-----|---------|
-| Spaces without encoding | URL breaks | Use `_` or `-` |
-| Too much text | Unreadable | 2-6 words per line |
-| Wrong template | Message mismatch | Match template to context |
-| Missing extension | Invalid URL | Always include `.png`, `.jpg`, etc. |
-| Unencoded special chars | URL breaks | Use `~q`, `~s`, `~p`, etc. |
-| Assuming template exists | 404 error | Check templates list first |
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| Image shows literal `~n` on screen | Used `~n` as newline escape (not valid) | Replace with `%0A` |
+| Hyphen disappeared or became a space | Single `-` used instead of `--` for a literal dash | Double the hyphens: `ci--cd` |
+| Template returns 404 | Template ID wrong or misspelled | Fetch `https://api.memegen.link/templates/` and search by keyword |
+| Caption text runs off image | Too many words per line (>6 words) | Split at `%0A` or shorten; image memes are haiku, not paragraphs |
+| Meme lands flat / misses | Template-emotion mismatch or irony-collapsed template | Re-evaluate: does the template's inherent structure match the joke's logical shape? |
 
 ---
 
-## Verification
-
-Meme generation is successful when:
-
-1. **URL is valid** - Returns HTTP 200
-2. **Image renders** - Displays correctly in markdown
-3. **Text is visible** - Properly formatted on image
-4. **Context matches** - Template fits the message
+## Quick Reference
 
-**Test command:**
-```bash
-curl -I "https://api.memegen.link/images/buzz/test/test.png"
-# Should return: HTTP/2 200
-```
+| Action | Format |
+|--------|--------|
+| Basic image meme | `https://api.memegen.link/images/{template}/{top}/{bottom}.png` |
+| Blank one side | Use `_` (single underscore) as the placeholder |
+| Social card sizing | Append `?width=1200&height=630` |
+| Remove watermark | Append `?watermark=none` |
+| Multiline caption | Use `%0A` between lines inside a caption segment |
+| Browse templates | https://api.memegen.link/templates/ |
+| API docs | https://api.memegen.link/docs/ |
+
+### Encoding Cheatsheet
+
+| Character | Encode as |
+|-----------|-----------|
+| Space | `_` |
+| Literal hyphen `-` | `--` |
+| Literal underscore `_` | `__` |
+| `?` | `~q` |
+| `%` | `~p` |
+| `/` | `~s` |
+| `#` | `~h` |
+| `"` | `''` (two single quotes) |
+| Newline | `%0A` |
 
 ---
 
-<details>
-<summary><strong>Deep Dive: Advanced Features</strong></summary>
-
-### Image Formats
-
-| Extension | Use Case |
-|-----------|----------|
-| `.png` | Best quality, default |
-| `.jpg` | Smaller file size |
-| `.webp` | Modern, good compression |
-| `.gif` | Animated templates |
-
-### Dimensions
-
-```
-?width=800
-?height=600
-?width=800&height=600  (padded to exact)
-```
-
-### Layout Options
-
-```
-?layout=top     # Text at top only
-?layout=bottom  # Text at bottom only
-?layout=default # Standard top/bottom
-```
-
-### Custom Fonts
-
-View available: https://api.memegen.link/fonts/
-
-```
-?font=impact  (default)
-```
-
-### Custom Images
-
-Use any image as background:
-
-```
-https://api.memegen.link/images/custom/hello/world.png?style=https://example.com/image.jpg
-```
-
-</details>
-
-<details>
-<summary><strong>Deep Dive: Contextual Memes</strong></summary>
-
-### Code Reviews
-
-```
-Template: fry
-https://api.memegen.link/images/fry/not_sure_if_feature/or_bug.png
-```
+## Template Selection
 
-### Deployments
-
-```
-Template: interesting
-https://api.memegen.link/images/interesting/i_dont_always_test/but_when_i_do_i_do_it_in_production.png
-```
-
-### Documentation
-
-```
-Template: yodawg
-https://api.memegen.link/images/yodawg/yo_dawg_i_heard_you_like_docs/so_i_documented_the_documentation.png
-```
-
-### Performance Issues
-
-```
-Template: fine
-https://api.memegen.link/images/fine/memory_usage_at_99~/this_is_fine.png
-```
-
-### Successful Deploy
-
-```
-Template: success
-https://api.memegen.link/images/success/deployed_to_production/zero_downtime.png
-```
-
-</details>
-
-<details>
-<summary><strong>Deep Dive: Workflow Integration</strong></summary>
-
-### Generating Memes in Response
-
-```markdown
-Here's a relevant meme:
-
-![Meme](https://api.memegen.link/images/buzz/bugs/bugs_everywhere.png)
-```
-
-### Dynamic Generation (Python)
-
-```python
-def generate_status_meme(status: str, message: str):
-    template_map = {
-        "success": "success",
-        "failure": "fine",
-        "review": "fry",
-        "deploy": "interesting"
-    }
-
-    template = template_map.get(status, "buzz")
-    words = message.split()
-    top = "_".join(words[0:3])
-    bottom = "_".join(words[3:6])
-
-    return f"https://api.memegen.link/images/{template}/{top}/{bottom}.png"
-```
-
-### Using the Helper Script
-
-```python
-from meme_generator import MemeGenerator
-
-meme = MemeGenerator()
-url = meme.generate("buzz", "features", "features everywhere")
-print(url)
-```
-
-</details>
-
-<details>
-<summary><strong>Deep Dive: API Reference</strong></summary>
-
-### Endpoints
-
-| Endpoint | Purpose |
-|----------|---------|
-| `/templates/` | List all templates |
-| `/templates/{id}` | Template details |
-| `/fonts/` | Available fonts |
-| `/images/{template}/{top}/{bottom}.{ext}` | Generate meme |
-
-### API Characteristics
-
-- Free and open-source
-- No API key required
-- No rate limiting (normal use)
-- Stateless (all info in URL)
-- Images generated on-demand
-
-### Error Handling
-
-1. Check template at https://api.memegen.link/templates/
-2. Verify text formatting (underscores for spaces)
-3. Check special character encoding
-4. Ensure valid extension
-5. Test URL in browser
-
-</details>
+| Joke shape | Template |
+|------------|----------|
+| Reject A, approve B | `drake` |
+| Agonizing choice between two bad options | `two-buttons` |
+| Plan that collapses at the last step | `gru-plan` |
+| Denial while chaos is visible | `fine` |
+| Shocked by inevitable outcome | `surprised-pikachu` |
+| X is everywhere | `buzz` |
+| One does not simply... | `mordor` |
+| Statement + "change my mind" (use sparingly) | `change-my-mind` |
+| Distracted by new thing (use sparingly) | `distracted` |
 
 ---
 
-## References
+## Textual Meme Formats
+
+When pacing, escalation, or accessibility matters more than a shareable image, use textual formats. Full guide: [references/markdown-memes-guide.md](references/markdown-memes-guide.md)
 
-| Document | Content |
-|----------|---------|
-| [markdown-memes-guide.md](references/markdown-memes-guide.md) | 15+ textual meme formats (greentext, copypasta, ASCII, etc.) |
-| [examples.md](references/examples.md) | Practical usage examples |
+| Format | Structure | Best for |
+|--------|-----------|----------|
+| Greentext | Code fence + `>` lines | Narrative self-deprecation, tech stories |
+| Wojak dialogue | Bold archetype names + short lines | Character contrast, internal conflict |
+| Chat log | Code fence + `[HH:MM] user:` | Incident retrospectives, system failures |
+| Copypasta | Code fence, wall of dramatic text | Parody of overblown outrage |
+| Corporate satire | Heading + checkboxes + legal footer | Workplace absurdity |
+| Tumblr chain | Nested blockquotes `>` `>>` `>>>` | Escalating reactions |
 
-### Scripts
+**Load [references/markdown-memes-guide.md](references/markdown-memes-guide.md)** when: user wants a textual meme, asks for greentext/copypasta/ASCII, or the joke requires multi-step pacing.
 
-| Script | Purpose |
-|--------|---------|
-| [meme_generator.py](scripts/meme_generator.py) | Python helper for meme generation |
+**Load [references/examples.md](references/examples.md)** when: user needs ready-to-paste outputs or platform integration (Slack, Discord, GitHub PR comments).
 
 ---
 
-## Summary
+## Validation Checklist
 
-Generate contextual memes to:
-- Add humor to conversations
-- Create social media visuals
-- Make code reviews engaging
-- Celebrate successes
+After generating any meme:
 
-**Golden rule:** Keep text concise, match template to context.
+- [ ] Text 3–5 words per line (image); longer OK for textual formats
+- [ ] Template shape matches the logical structure of the joke
+- [ ] Special characters encoded correctly (hyphens doubled, no `~n`)
+- [ ] Alt text describes the joke, not just "meme"
+- [ ] For image memes: verify URL returns 200 before presenting
diff --git a/skills/mermaid-diagrams/SKILL.md b/skills/mermaid-diagrams/SKILL.md
index 6ecbbaf..ff1d2b2 100644
--- a/skills/mermaid-diagrams/SKILL.md
+++ b/skills/mermaid-diagrams/SKILL.md
@@ -1,217 +1,154 @@
 ---
 name: mermaid-diagrams
-description: Comprehensive guide for creating software diagrams using Mermaid syntax. Use when users need to create, visualize, or document software through diagrams including class diagrams (domain modeling, object-oriented design), sequence diagrams (application flows, API interactions, code execution), flowcharts (processes, algorithms, user journeys), entity relationship diagrams (database schemas), C4 architecture diagrams (system context, containers, components), state diagrams, git graphs, pie charts, gantt charts, or any other diagram type. Triggers include requests to "diagram", "visualize", "model", "map out", "show the flow", or when explaining system architecture, database design, code structure, or user/application flows.
+description: Create, fix, and select Mermaid diagram types for software documentation. Use when users need to diagram, visualize, model, map out, or show the flow of systems, processes, APIs, database schemas, architecture, state machines, or class hierarchies. Triggers: "diagram", "visualize", "draw", "show the flow", "model this", "architecture diagram", "ERD", "sequence diagram", "flowchart", "C4".
 ---
 
-# Mermaid Diagramming
+## Mindset
 
-Create professional software diagrams using Mermaid's text-based syntax. Mermaid renders diagrams from simple text definitions, making diagrams version-controllable, easy to update, and maintainable alongside code.
+**Diagram type is architecture, not aesthetics.** Choosing `flowchart` vs `sequenceDiagram` vs `C4Context` determines what relationships are even expressible — pick wrong and you'll fight the syntax the whole way.
 
-## Core Syntax Structure
+**Mermaid is a parse-then-render pipeline.** Errors are silent or cryptic. A diagram that "almost works" often renders completely blank. Test in [mermaid.live](https://mermaid.live) before embedding.
 
-All Mermaid diagrams follow this pattern:
+**Complexity compounds exponentially.** A 10-node diagram is readable. A 20-node diagram needs subgraphs. A 30-node diagram needs to be three diagrams.
 
-```mermaid
-diagramType
-  definition content
-```
+**The renderer you target matters more than the syntax.** GitHub strips `%%{init}%%` directives. GitLab supports `elk` layout. VS Code preview differs from CLI output. Know your render target first.
 
-**Key principles:**
-- First line declares diagram type (e.g., `classDiagram`, `sequenceDiagram`, `flowchart`)
-- Use `%%` for comments
-- Line breaks and indentation improve readability but aren't required
-- Unknown words break diagrams; parameters fail silently
+**Labels are the hardest part.** The diagram structure is usually obvious; naming relationships precisely enough to be unambiguous is the real design work.
 
-## Diagram Type Selection Guide
+## Navigation
 
-**Choose the right diagram type:**
+**Use this skill when**: creating any Mermaid syntax diagram, selecting between diagram types, fixing broken Mermaid syntax, advising on diagram scope/splitting, rendering for CI/CD pipelines (mmdc CLI).
 
-1. **Class Diagrams** - Domain modeling, OOP design, entity relationships
-   - Domain-driven design documentation
-   - Object-oriented class structures
-   - Entity relationships and dependencies
+**Do NOT use this skill when**: user wants PlantUML, draw.io XML, Lucidchart, or Graphviz DOT syntax — those are different languages entirely.
 
-2. **Sequence Diagrams** - Temporal interactions, message flows
-   - API request/response flows
-   - User authentication flows
-   - System component interactions
-   - Method call sequences
+**Diagram type decision tree:**
 
-3. **Flowcharts** - Processes, algorithms, decision trees
-   - User journeys and workflows
-   - Business processes
-   - Algorithm logic
-   - Deployment pipelines
+```
+What are you capturing?
+├── Interactions OVER TIME between actors → sequenceDiagram
+├── State changes of a single entity → stateDiagram-v2
+├── CLASS/type structure (OOP, domain model) → classDiagram
+├── Database tables and FK relationships → erDiagram
+├── A PROCESS with decisions/branches → flowchart TD/LR
+├── System/service landscape (who talks to what) → C4Context or C4Container
+├── Version control branching strategy → gitGraph
+└── Project schedule / milestones → gantt
+```
 
-4. **Entity Relationship Diagrams (ERD)** - Database schemas
-   - Table relationships
-   - Data modeling
-   - Schema design
+**Sequence vs Flowchart ambiguity**: If you're tempted to put timestamps or "then" in every step, use `sequenceDiagram`. If you're tempted to put actors in diamond shapes, use `flowchart`.
 
-5. **C4 Diagrams** - Software architecture at multiple levels
-   - System Context (systems and users)
-   - Container (applications, databases, services)
-   - Component (internal structure)
-   - Code (class/interface level)
+**C4 level selection**:
+- Stakeholder briefing or "what does our system do?" → `C4Context`
+- "What services/databases exist?" → `C4Container`
+- "How is service X structured internally?" → `C4Component`
+- Don't use `C4Dynamic` unless you explicitly need numbered sequence steps with C4 notation
 
-6. **State Diagrams** - State machines, lifecycle states
-7. **Git Graphs** - Version control branching strategies
-8. **Gantt Charts** - Project timelines, scheduling
-9. **Pie/Bar Charts** - Data visualization
+For audience-driven C4 level selection, container vs. component decisions, and multi-team ownership patterns, load the c4-architecture skill alongside this one.
 
-## Quick Start Examples
+For requirements-gathering that produces diagrams, sequence mermaid-diagrams after requirements-clarity to ensure diagram vocabulary matches the finalized requirement terms.
 
-### Class Diagram (Domain Model)
-```mermaid
-classDiagram
-    Title -- Genre
-    Title *-- Season
-    Title *-- Review
-    User --> Review : creates
-
-    class Title {
-        +string name
-        +int releaseYear
-        +play()
-    }
-
-    class Genre {
-        +string name
-        +getTopTitles()
-    }
-```
+## Philosophy
 
-### Sequence Diagram (API Flow)
-```mermaid
-sequenceDiagram
-    participant User
-    participant API
-    participant Database
-
-    User->>API: POST /login
-    API->>Database: Query credentials
-    Database-->>API: Return user data
-    alt Valid credentials
-        API-->>User: 200 OK + JWT token
-    else Invalid credentials
-        API-->>User: 401 Unauthorized
-    end
-```
+The diagram exists to transfer a mental model, not to be comprehensive. Every element you add competes with every other element for the reader's attention. Ruthless pruning is a feature, not laziness.
 
-### Flowchart (User Journey)
-```mermaid
-flowchart TD
-    Start([User visits site]) --> Auth{Authenticated?}
-    Auth -->|No| Login[Show login page]
-    Auth -->|Yes| Dashboard[Show dashboard]
-    Login --> Creds[Enter credentials]
-    Creds --> Validate{Valid?}
-    Validate -->|Yes| Dashboard
-    Validate -->|No| Error[Show error]
-    Error --> Login
-```
+## NEVER
 
-### ERD (Database Schema)
-```mermaid
-erDiagram
-    USER ||--o{ ORDER : places
-    ORDER ||--|{ LINE_ITEM : contains
-    PRODUCT ||--o{ LINE_ITEM : includes
-
-    USER {
-        int id PK
-        string email UK
-        string name
-        datetime created_at
-    }
-
-    ORDER {
-        int id PK
-        int user_id FK
-        decimal total
-        datetime created_at
-    }
-```
+- **NEVER use special characters `{}`, `[]`, `()` inside node label text without quoting** — Mermaid's parser treats them as syntax, not content. Wrap labels in quotes: `A["My (special) label"]`. This is the #1 cause of blank renders.
 
-## Detailed References
+- **NEVER nest `subgraph` more than 2 levels deep in flowcharts** — the ELK and dagre layout engines both produce overlapping edges at depth 3+, and there is no workaround short of splitting the diagram.
 
-For in-depth guidance on specific diagram types, see:
+- **NEVER use `C4Context` to show internal component interactions** — C4 relationship arrows represent *dependencies between bounded systems*, not call sequences. Mixing C4 with method-level calls makes both wrong. Use `sequenceDiagram` for call flows.
 
-- **[references/class-diagrams.md](references/class-diagrams.md)** - Domain modeling, relationships (association, composition, aggregation, inheritance), multiplicity, methods/properties
-- **[references/sequence-diagrams.md](references/sequence-diagrams.md)** - Actors, participants, messages (sync/async), activations, loops, alt/opt/par blocks, notes
-- **[references/flowcharts.md](references/flowcharts.md)** - Node shapes, connections, decision logic, subgraphs, styling
-- **[references/erd-diagrams.md](references/erd-diagrams.md)** - Entities, relationships, cardinality, keys, attributes
-- **[references/c4-diagrams.md](references/c4-diagrams.md)** - System context, container, component diagrams, boundaries
-- **[references/architecture-diagrams.md](references/architecture-diagrams.md)** - Cloud services, infrastructure, CI/CD deployments
-- **[references/advanced-features.md](references/advanced-features.md)** - Themes, styling, configuration, layout options
+- **NEVER omit `participant` declarations in sequence diagrams when actor order matters** — Mermaid infers order from first appearance, which is often not the logical order. Explicit `participant A` blocks at the top lock display order.
 
-## Best Practices
+- **NEVER use `classDiagram` for process flows** — no matter how tempting it is to show "step 1 → step 2" via inheritance arrows, class diagrams express structural relationships, not temporal ones. The reader will misread it as inheritance.
 
-1. **Start Simple** - Begin with core entities/components, add details incrementally
-2. **Use Meaningful Names** - Clear labels make diagrams self-documenting
-3. **Comment Extensively** - Use `%%` comments to explain complex relationships
-4. **Keep Focused** - One diagram per concept; split large diagrams into multiple focused views
-5. **Version Control** - Store `.mmd` files alongside code for easy updates
-6. **Add Context** - Include titles and notes to explain diagram purpose
-7. **Iterate** - Refine diagrams as understanding evolves
+- **NEVER pass `--no-sandbox` directly to mmdc CLI** — it is not a valid flag in mmdc v11+. Pass it via `-p puppeteer.json` with `{"args":["--no-sandbox","--disable-setuid-sandbox"]}`. Direct flag use silently produces no output.
 
-## Configuration and Theming
+- **NEVER use `graph` instead of `flowchart`** — `graph` is the deprecated alias. It lacks `look:`, `layout:`, and subgraph click handlers. Always use `flowchart TD` or `flowchart LR`.
 
-Configure diagrams using frontmatter:
+## When Things Go Wrong
 
-```mermaid
----
-config:
-  theme: base
-  themeVariables:
-    primaryColor: "#ff6b6b"
----
-flowchart LR
-    A --> B
-```
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| Diagram renders blank with no error | Special char in label (`()`, `{}`, `&`); or parser keyword used as node name (`end`, `style`, `classDef`) | Wrap all labels in double-quotes; rename node IDs |
+| `mmdc` produces empty PNG/SVG file | Puppeteer sandboxing; missing `puppeteer.json` | Add `-p puppeteer.json` with `{"args":["--no-sandbox"]}` |
+| GitHub renders diagram, GitLab does not | Diagram uses `%%{init}` which GitLab's renderer may reject | Move config to frontmatter YAML block (`---\nconfig:\n  theme: dark\n---`) |
+| Arrow labels truncated or overlapping | Long edge labels in dagre layout | Switch to `layout: elk` via frontmatter, or shorten labels |
+| `sequenceDiagram` actors in wrong order | Order inferred from first message | Add explicit `participant X` declarations at top before any messages |
+| `erDiagram` relationship line missing | Attribute block uses a type keyword Mermaid doesn't recognize | Stick to `string`, `int`, `float`, `boolean`, `date`, `datetime` as attribute types |
 
-**Available themes:** default, forest, dark, neutral, base
+## Diagram Type References
 
-**Layout options:**
-- `layout: dagre` (default) - Classic balanced layout
-- `layout: elk` - Advanced layout for complex diagrams (requires integration)
+Load the relevant reference only when you need syntax details beyond what you know:
 
-**Look options:**
-- `look: classic` - Traditional Mermaid style
-- `look: handDrawn` - Sketch-like appearance
+| Diagram | When to load | File |
+|---------|-------------|------|
+| Class diagrams | Multiplicity, visibility modifiers, lollipop interfaces | `references/class-diagrams.md` |
+| Sequence diagrams | `loop`/`alt`/`opt`/`par` blocks, activation bars, notes | `references/sequence-diagrams.md` |
+| Flowcharts | Node shapes, styling, `click` handlers, subgraph links | `references/flowcharts.md` |
+| ERD | Cardinality notation, attribute types, relationship labels | `references/erd-diagrams.md` |
+| C4 diagrams | `C4Container`, `C4Component`, boundary blocks | `references/c4-diagrams.md` |
+| Architecture / infra | Cloud service icons, CI/CD pipeline patterns | `references/architecture-diagrams.md` |
+| Theming / CLI export | `themeVariables`, mmdc flags, `look: handDrawn` | `references/advanced-features.md` |
 
-## Exporting and Rendering
+## Quick Syntax Reference
 
-**Native support in:**
-- GitHub/GitLab - Automatically renders in Markdown
-- VS Code - With Markdown Mermaid extension
-- Notion, Obsidian, Confluence - Built-in support
+```mermaid
+%% Flowchart — use flowchart, never graph
+flowchart TD
+    A([Start]) --> B{Decision?}
+    B -->|Yes| C[Action]
+    B -->|No| D["Label with (parens) must be quoted"]
+```
 
-**Export options:**
-- [Mermaid Live Editor](https://mermaid.live) - Online editor with PNG/SVG export
-- Mermaid CLI - `npm install -g @mermaid-js/mermaid-cli` then `mmdc -i input.mmd -o output.png`
-- Docker - `docker run --rm -v $(pwd):/data minlag/mermaid-cli -i /data/input.mmd -o /data/output.png`
+```mermaid
+%% Sequence — declare participants to control order
+sequenceDiagram
+    participant Client
+    participant API
+    participant DB
+    Client->>API: POST /login
+    API->>DB: SELECT user
+    DB-->>API: row
+    alt valid
+        API-->>Client: 200 + JWT
+    else invalid
+        API-->>Client: 401
+    end
+```
 
-## Common Pitfalls
+```mermaid
+%% ERD — stick to known attribute types
+erDiagram
+    USER ||--o{ ORDER : places
+    USER { int id PK; string email UK }
+    ORDER { int id PK; int user_id FK; decimal total }
+```
 
-- **Breaking characters** - Avoid `{}` in comments, use proper escape sequences for special characters
-- **Syntax errors** - Misspellings break diagrams; validate syntax in Mermaid Live
-- **Overcomplexity** - Split complex diagrams into multiple focused views
-- **Missing relationships** - Document all important connections between entities
+```mermaid
+%% C4 — context only; do NOT mix with call sequences
+C4Context
+    Person(user, "Customer", "Places orders")
+    System(shop, "E-Commerce", "Handles orders")
+    System_Ext(pay, "Payment Gateway", "Processes payments")
+    Rel(user, shop, "Uses")
+    Rel(shop, pay, "Charges via")
+```
+
+## mmdc CLI (Local Rendering)
 
-## When to Create Diagrams
+```bash
+# Install
+npm install -g @mermaid-js/mermaid-cli
 
-**Always diagram when:**
-- Starting new projects or features
-- Documenting complex systems
-- Explaining architecture decisions
-- Designing database schemas
-- Planning refactoring efforts
-- Onboarding new team members
+# puppeteer.json — required for sandbox environments
+echo '{"args":["--no-sandbox","--disable-setuid-sandbox"]}' > puppeteer.json
 
-**Use diagrams to:**
-- Align stakeholders on technical decisions
-- Document domain models collaboratively
-- Visualize data flows and system interactions
-- Plan before coding
-- Create living documentation that evolves with code
+# Render
+mmdc -i diagram.mmd -o diagram.png -p puppeteer.json
+mmdc -i diagram.mmd -o diagram.svg -p puppeteer.json
+
+# Batch render all .mmd files
+for f in *.mmd; do mmdc -i "$f" -o "${f%.mmd}.png" -p puppeteer.json; done
+```
diff --git a/skills/mui/SKILL.md b/skills/mui/SKILL.md
index cc235b2..165c7e1 100644
--- a/skills/mui/SKILL.md
+++ b/skills/mui/SKILL.md
@@ -1,504 +1,111 @@
 ---
 name: mui
-description: Material-UI v7 component library patterns including sx prop styling, theme integration, responsive design, and MUI-specific hooks. Use when working with MUI components, styling with sx prop, theme customization, or MUI utilities.
+description: Material-UI v7 component library — sx prop styling, theme tokens, slots/slotProps pattern, Grid v2 size prop, Tailwind CSS layer integration, styled() vs sx vs GlobalStyles decisions. Use when working with MUI components, theming, responsive breakpoints, or migrating from MUI v5/v6.
 ---
 
 # MUI v7 Patterns
 
-## Purpose
+## Mindset
 
-Material-UI v7 (released March 2025) patterns for component usage, styling with sx prop, theme integration, and responsive design.
+- **sx is a performance cost, not free CSS**: every sx object is processed by the emotion cache on render. Static objects hoisted outside the component avoid re-creation. If a component re-renders >10×/sec with a complex sx, reach for `styled()` instead.
+- **Theme tokens are contracts, not strings**: `'primary.main'` is resolved at render time against the active theme. Hardcoded hex breaks dark mode and white-labeling silently — the build won't catch it.
+- **slots/slotProps is the v7 customization surface**: `componentsProps` and `components` overrides from v5/v6 still exist as aliases but are deprecated. New MUI components only accept `slots`/`slotProps`. Mixing both on the same component produces unexpected behavior.
+- **Grid v2 is not backward-compatible**: the `item` prop is gone, `xs`/`sm`/`md` top-level props are gone. Only `size={{ xs: 12, md: 6 }}` works. A silent fallback renders all columns full-width if you use old props.
+- **CSS layers (`enableCssLayer`) changes specificity globally**: enabling it means all MUI styles sit inside `@layer mui`, which loses to any unlayered CSS. Tailwind v4 uses layers by default — this is desirable for coexistence, but enabling it mid-project will break existing style overrides.
 
-**Note**: MUI v7 breaking changes from v6:
-- Deep imports no longer work - use package exports field
-- `onBackdropClick` removed from Modal - use `onClose` instead
-- All components now use standardized `slots` and `slotProps` pattern
-- CSS layers support via `enableCssLayer` config (works with Tailwind v4)
+## Navigation
 
-## When to Use This Skill
+**Use this skill when**: writing or reviewing MUI component code, choosing between sx/styled()/GlobalStyles, customizing MUI theme, using Grid/Stack/Box layout, debugging specificity issues with MUI + Tailwind, migrating v5→v7 APIs.
 
-- Styling components with MUI sx prop
-- Using MUI components (Box, Grid, Paper, Typography, etc.)
-- Theme customization and usage
-- Responsive design with MUI breakpoints
-- MUI-specific utilities and hooks
+**Do NOT use this skill when**: the project uses a different component library (Chakra, Ant Design, Radix); pure CSS/Tailwind-only styling with no MUI components present; creating charts (use recharts/Victory directly).
 
----
-
-## Quick Start
-
-### Basic MUI Component
-
-```typescript
-import { Box, Typography, Button, Paper } from '@mui/material';
-import type { SxProps, Theme } from '@mui/material';
-
-const styles: Record<string, SxProps<Theme>> = {
-  container: {
-    p: 2,
-    display: 'flex',
-    flexDirection: 'column',
-    gap: 2,
-  },
-  header: {
-    mb: 3,
-    fontSize: '1.5rem',
-    fontWeight: 600,
-  },
-};
-
-function MyComponent() {
-  return (
-    <Paper sx={styles.container}>
-      <Typography sx={styles.header}>
-        Title
-      </Typography>
-      <Button variant="contained">
-        Action
-      </Button>
-    </Paper>
-  );
-}
-```
-
----
-
-## Styling Patterns
-
-### Inline Styles (< 100 lines)
-
-For components with simple styling, define styles at the top:
-
-```typescript
-import type { SxProps, Theme } from '@mui/material';
-
-const componentStyles: Record<string, SxProps<Theme>> = {
-  container: {
-    p: 2,
-    display: 'flex',
-    flexDirection: 'column',
-  },
-  header: {
-    mb: 2,
-    color: 'primary.main',
-  },
-  button: {
-    mt: 'auto',
-    alignSelf: 'flex-end',
-  },
-};
-
-function Component() {
-  return (
-    <Box sx={componentStyles.container}>
-      <Typography sx={componentStyles.header}>Header</Typography>
-      <Button sx={componentStyles.button}>Action</Button>
-    </Box>
-  );
-}
-```
-
-### Separate Styles File (>= 100 lines)
-
-For complex components, create separate style file:
-
-```typescript
-// UserProfile.styles.ts
-import type { SxProps, Theme } from '@mui/material';
-
-export const userProfileStyles: Record<string, SxProps<Theme>> = {
-  container: {
-    p: 3,
-    maxWidth: 800,
-    mx: 'auto',
-  },
-  header: {
-    display: 'flex',
-    justifyContent: 'space-between',
-    alignItems: 'center',
-    mb: 3,
-  },
-  // ... many more styles
-};
-
-// UserProfile.tsx
-import { userProfileStyles as styles } from './UserProfile.styles';
-
-function UserProfile() {
-  return <Box sx={styles.container}>...</Box>;
-}
-```
-
----
-
-## Common Components
-
-### Layout Components
-
-```typescript
-// Box - Generic container
-<Box sx={{ p: 2, bgcolor: 'background.paper' }}>
-  Content
-</Box>
-
-// Paper - Elevated surface
-<Paper elevation={2} sx={{ p: 3 }}>
-  Content
-</Paper>
-
-// Container - Centered content with max-width
-<Container maxWidth="lg">
-  Content
-</Container>
-
-// Stack - Flex container with spacing
-<Stack spacing={2} direction="row">
-  <Item />
-  <Item />
-</Stack>
-```
-
-### Grid System
-
-```typescript
-import { Grid } from '@mui/material';
-
-// 12-column grid
-<Grid container spacing={2}>
-  <Grid item xs={12} md={6}>
-    Left half
-  </Grid>
-  <Grid item xs={12} md={6}>
-    Right half
-  </Grid>
-</Grid>
-
-// Responsive grid
-<Grid container spacing={3}>
-  <Grid item xs={12} sm={6} md={4} lg={3}>
-    Card
-  </Grid>
-  {/* Repeat for more cards */}
-</Grid>
-```
-
-### Typography
-
-```typescript
-<Typography variant="h1">Heading 1</Typography>
-<Typography variant="h2">Heading 2</Typography>
-<Typography variant="body1">Body text</Typography>
-<Typography variant="caption">Small text</Typography>
-
-// With custom styling
-<Typography
-  variant="h4"
-  sx={{
-    color: 'primary.main',
-    fontWeight: 600,
-    mb: 2,
-  }}
->
-  Custom Heading
-</Typography>
-```
-
-### Buttons
-
-```typescript
-// Variants
-<Button variant="contained">Contained</Button>
-<Button variant="outlined">Outlined</Button>
-<Button variant="text">Text</Button>
-
-// Colors
-<Button variant="contained" color="primary">Primary</Button>
-<Button variant="contained" color="secondary">Secondary</Button>
-<Button variant="contained" color="error">Error</Button>
-
-// With icons
-import { Add as AddIcon } from '@mui/icons-material';
-
-<Button startIcon={<AddIcon />}>Add Item</Button>
-```
-
----
-
-## Theme Integration
-
-### Using Theme Values
-
-```typescript
-import { useTheme } from '@mui/material';
-
-function Component() {
-  const theme = useTheme();
-
-  return (
-    <Box
-      sx={{
-        p: 2,
-        bgcolor: theme.palette.primary.main,
-        color: theme.palette.primary.contrastText,
-        borderRadius: theme.shape.borderRadius,
-      }}
-    >
-      Themed box
-    </Box>
-  );
-}
+**Styling approach decision tree**:
 ```
-
-### Theme in sx Prop
-
-```typescript
-<Box
-  sx={{
-    // Access theme in sx
-    color: 'primary.main',          // theme.palette.primary.main
-    bgcolor: 'background.paper',     // theme.palette.background.paper
-    p: 2,                            // theme.spacing(2)
-    borderRadius: 1,                 // theme.shape.borderRadius
-  }}
->
-  Content
-</Box>
-
-// Callback for advanced usage
-<Box
-  sx={(theme) => ({
-    color: theme.palette.primary.main,
-    '&:hover': {
-      color: theme.palette.primary.dark,
-    },
-  })}
->
-  Hover me
-</Box>
+Is the style applied globally (body, scrollbar, :root)?
+  YES → GlobalStyles (not sx, not styled())
+  NO → Is the component a native HTML element with no MUI props?
+         YES → styled('div')(...) for static, sx for one-off
+         NO → Does the style change every render (based on JS state)?
+                YES → sx callback: sx={(theme) => ({ color: active ? theme.palette.primary.main : 'inherit' })}
+                NO → Is it applied in >2 places?
+                       YES → styled(Component)(...) or extract to *.styles.ts as SxProps const
+                       NO → inline sx object hoisted outside component
 ```
 
----
+## Philosophy
 
-## Responsive Design
+MUI's constraint-based API (spacing scale, palette tokens, breakpoint objects) exists to make design system changes propagate automatically. Every time you escape the constraint system (hardcoded px, hardcoded hex, inline `style={}`) you create a future migration debt. Work inside the system; only escape it deliberately.
 
-### Breakpoints
+## NEVER
 
-```typescript
-// Mobile-first responsive values
-<Box
-  sx={{
-    width: {
-      xs: '100%',    // 0-600px
-      sm: '80%',     // 600-900px
-      md: '60%',     // 900-1200px
-      lg: '40%',     // 1200-1536px
-      xl: '30%',     // 1536px+
-    },
-  }}
->
-  Responsive width
-</Box>
+- **NEVER use `style={}` on MUI components for anything other than CSS custom properties** — it bypasses the theme, breaks dark mode support, and is not overridable via `sx` (inline style wins specificity). The one valid use: setting `--css-var` values.
+- **NEVER write `theme.palette.primary.main` inside `sx` string values** — use the shorthand `'primary.main'` token. Calling `useTheme()` just to pass values into sx is redundant; the sx callback `(theme) => ({...})` already receives the theme.
+- **NEVER use `componentsProps`/`components` on v7 MUI components** — these are deprecated aliases that will be removed. Use `slots` and `slotProps`. Mixing them on one component causes the deprecated props to be silently ignored in some code paths.
+- **NEVER put a new `{}` literal directly inside `sx` on a component that renders in a list or high-frequency loop** — `sx={{ p: 2 }}` creates a new object reference every render, defeating emotion's cache. Hoist to a `const styles` outside the component or module.
+- **NEVER use Grid `xs`/`sm`/`md` as direct props on Grid items in v7** — they are silently ignored (no warning in production builds). Always use `size={{ xs: 12, md: 6 }}`.
+- **NEVER override MUI component internals with global CSS class selectors** (`.MuiButton-root`) in production code — internal class names are considered unstable and can change across patch releases. Use `slotProps`, `sx`, or `styled()` with the component's `ownerState`.
+- **NEVER call `useTheme()` solely to read a breakpoint value for conditional rendering** — use `useMediaQuery(theme.breakpoints.up('md'))` or the `sx` responsive object syntax instead. `useTheme()` in render paths increases bundle coupling.
 
-// Responsive display
-<Box
-  sx={{
-    display: {
-      xs: 'none',    // Hidden on mobile
-      md: 'block',   // Visible on desktop
-    },
-  }}
->
-  Desktop only
-</Box>
-```
-
-### Responsive Typography
-
-```typescript
-<Typography
-  sx={{
-    fontSize: {
-      xs: '1rem',
-      md: '1.5rem',
-      lg: '2rem',
-    },
-    lineHeight: {
-      xs: 1.5,
-      md: 1.75,
-    },
-  }}
->
-  Responsive text
-</Typography>
-```
+## When Things Go Wrong
 
----
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| Grid items all render full-width | Using old `xs`/`sm` props instead of `size={{ xs: 12 }}` | Replace top-level breakpoint props with `size` object |
+| sx styles silently ignored | `componentsProps` and `slotProps` both set; old prop wins in some components | Remove `componentsProps`, use only `slotProps` |
+| MUI styles lose to Tailwind | `enableCssLayer` active; unlayered Tailwind wins vs `@layer mui` | Ensure Tailwind config uses `@layer` or add `@layer utilities` wrapper |
+| Dark mode: color not switching | Hardcoded hex in sx instead of theme token | Replace with `'primary.main'` or `(theme) => theme.palette.primary.main` |
+| onBackdropClick has no effect | Removed in v7 Modal; handler silently dropped | Move logic to `onClose` with reason check: `onClose={(_, reason) => reason !== 'backdropClick' && close()}` |
+| TypeScript: SxProps type error on array sx | Passing `[styles.a, styles.b]` — correct but needs `SxProps<Theme>[]` | Type the array explicitly or spread: `sx={{ ...styles.a, ...styles.b }}` |
 
-## Forms
+## v7 Migration Quick Reference
 
 ```typescript
-import { TextField, Stack, Button } from '@mui/material';
-
-<Box component="form" onSubmit={handleSubmit}>
-  <Stack spacing={2}>
-    <TextField
-      label="Email"
-      type="email"
-      value={email}
-      onChange={(e) => setEmail(e.target.value)}
-      fullWidth
-      required
-      error={!!errors.email}
-      helperText={errors.email}
-    />
-    <Button type="submit" variant="contained">Submit</Button>
-  </Stack>
-</Box>
+// v5/v6 → v7 breaking changes
+onBackdropClick={fn}           // REMOVED → use onClose with reason
+<Grid item xs={6}>             // REMOVED → <Grid size={6}>
+import X from '@mui/material/X' // REMOVED → import { X } from '@mui/material'
+componentsProps={{ root: {} }}  // DEPRECATED → slotProps={{ root: {} }}
 ```
 
----
-
-## Common Patterns
-
-### Card Component
-
-```typescript
-import { Card, CardContent, CardActions, Typography, Button } from '@mui/material';
-
-<Card>
-  <CardContent>
-    <Typography variant="h5" component="div">
-      Title
-    </Typography>
-    <Typography variant="body2" color="text.secondary">
-      Description
-    </Typography>
-  </CardContent>
-  <CardActions>
-    <Button size="small">Learn More</Button>
-  </CardActions>
-</Card>
-```
-
-### Dialog/Modal
-
-```typescript
-import { Dialog, DialogTitle, DialogContent, DialogActions, Button } from '@mui/material';
-
-<Dialog open={open} onClose={handleClose}>
-  <DialogTitle>Confirm Action</DialogTitle>
-  <DialogContent>
-    Are you sure you want to proceed?
-  </DialogContent>
-  <DialogActions>
-    <Button onClick={handleClose}>Cancel</Button>
-    <Button onClick={handleConfirm} variant="contained">
-      Confirm
-    </Button>
-  </DialogActions>
-</Dialog>
-```
-
-### Loading States
-
-```typescript
-import { CircularProgress, Skeleton } from '@mui/material';
-
-// Spinner
-<Box sx={{ display: 'flex', justifyContent: 'center', p: 3 }}>
-  <CircularProgress />
-</Box>
-
-// Skeleton
-<Stack spacing={1}>
-  <Skeleton variant="text" width="60%" />
-  <Skeleton variant="rectangular" height={200} />
-  <Skeleton variant="text" width="40%" />
-</Stack>
-```
-
----
-
-## MUI-Specific Hooks
-
-### useMuiSnackbar
-
-```typescript
-import { useMuiSnackbar } from '@/hooks/useMuiSnackbar';
-
-function Component() {
-  const { showSuccess, showError, showInfo } = useMuiSnackbar();
-
-  const handleSave = async () => {
-    try {
-      await saveData();
-      showSuccess('Saved successfully');
-    } catch (error) {
-      showError('Failed to save');
-    }
-  };
-
-  return <Button onClick={handleSave}>Save</Button>;
-}
-```
-
----
-
-## Icons
-
-```typescript
-import { Add as AddIcon, Delete as DeleteIcon } from '@mui/icons-material';
-import { Button, IconButton } from '@mui/material';
-
-<Button startIcon={<AddIcon />}>Add</Button>
-<IconButton onClick={handleDelete}><DeleteIcon /></IconButton>
-```
-
----
-
-## Best Practices
-
-### 1. Type Your sx Props
+## Styling Recipes
 
+**Static styles (preferred — hoisted, typed):**
 ```typescript
 import type { SxProps, Theme } from '@mui/material';
-
-// ✅ Good
-const styles: Record<string, SxProps<Theme>> = {
-  container: { p: 2 },
-};
-
-// ❌ Avoid
 const styles = {
-  container: { p: 2 }, // No type safety
+  card: { p: 2, borderRadius: 2, bgcolor: 'background.paper' } satisfies SxProps<Theme>,
 };
 ```
 
-### 2. Use Theme Tokens
-
+**Theme-dependent dynamic styles:**
 ```typescript
-// ✅ Good: Use theme tokens
-<Box sx={{ color: 'primary.main', p: 2 }} />
-
-// ❌ Avoid: Hardcoded values
-<Box sx={{ color: '#1976d2', padding: '16px' }} />
+sx={(theme) => ({
+  border: `1px solid ${theme.palette.divider}`,
+  [theme.breakpoints.up('md')]: { flexDirection: 'row' },
+})}
 ```
 
-### 3. Consistent Spacing
-
+**GlobalStyles (reset/global only):**
 ```typescript
-// ✅ Good: Use spacing scale
-<Box sx={{ p: 2, mb: 3, mt: 1 }} />
+import { GlobalStyles } from '@mui/material';
+<GlobalStyles styles={{ '*': { boxSizing: 'border-box' }, body: { margin: 0 } }} />
+```
 
-// ❌ Avoid: Random pixel values
-<Box sx={{ padding: '17px', marginBottom: '25px' }} />
+**styled() — reusable semantic component:**
+```typescript
+const StatusChip = styled(Chip, {
+  shouldForwardProp: (prop) => prop !== 'isActive',
+})<{ isActive: boolean }>(({ theme, isActive }) => ({
+  backgroundColor: isActive ? theme.palette.success.light : theme.palette.grey[300],
+}));
 ```
 
----
+## Snackbar Pattern
+
+MUI has no built-in `useSnackbar` hook. Use notistack (`enqueueSnackbar`) or a local state pattern. See [resources/styling-guide.md](resources/styling-guide.md) for the full implementation.
 
-## Additional Resources
+## References
 
-For more detailed patterns, see:
-- [styling-guide.md](resources/styling-guide.md) - Advanced styling patterns
-- [component-library.md](resources/component-library.md) - Component examples
-- [theme-customization.md](resources/theme-customization.md) - Theme setup
+- [resources/styling-guide.md](resources/styling-guide.md) — sx vs styled() deep dive, emotion caching, GlobalStyles
+- [resources/component-library.md](resources/component-library.md) — Card, Dialog, Form, Loading patterns with full code
+- [resources/theme-customization.md](resources/theme-customization.md) — createTheme, palette, typography, component overrides
diff --git a/skills/naming-analyzer/SKILL.md b/skills/naming-analyzer/SKILL.md
index fa2815a..5a42d0d 100644
--- a/skills/naming-analyzer/SKILL.md
+++ b/skills/naming-analyzer/SKILL.md
@@ -1,351 +1,102 @@
 ---
 name: naming-analyzer
-description: Suggest better variable, function, and class names based on context and conventions.
+description: Analyze and improve variable, function, class, and API names in any codebase. Triggers on: "review my naming", "are these names good", "rename suggestions", "naming conventions", "misleading names", "confusing variable names", "improve readability". Identifies semantic problems — not just style violations — and flags serialization-boundary renames that require deprecation before changing.
 ---
 
-# Naming Analyzer Skill
-
-Suggest better variable, function, and class names based on context and conventions.
-
-## Instructions
-
-You are a naming convention expert. When invoked:
-
-1. **Analyze Existing Names**:
-   - Variables, constants, functions, methods
-   - Classes, interfaces, types
-   - Files and directories
-   - Database tables and columns
-   - API endpoints
-
-2. **Identify Issues**:
-   - Unclear or vague names
-   - Abbreviations that obscure meaning
-   - Inconsistent naming conventions
-   - Misleading names (name doesn't match behavior)
-   - Too short or too long names
-   - Hungarian notation misuse
-   - Single-letter variables outside loops
-
-3. **Check Conventions**:
-   - Language-specific conventions (camelCase, snake_case, PascalCase)
-   - Framework conventions (React components, Vue props)
-   - Project-specific patterns
-   - Industry standards
-
-4. **Provide Suggestions**:
-   - Better alternative names
-   - Reasoning for each suggestion
-   - Consistency improvements
-   - Contextual appropriateness
-
-## Naming Conventions by Language
-
-### JavaScript/TypeScript
-- Variables/functions: `camelCase`
-- Classes/interfaces: `PascalCase`
-- Constants: `UPPER_SNAKE_CASE`
-- Private fields: `_prefixUnderscore` or `#privateField`
-- Boolean: `is`, `has`, `can`, `should` prefixes
-
-### Python
-- Variables/functions: `snake_case`
-- Classes: `PascalCase`
-- Constants: `UPPER_SNAKE_CASE`
-- Private: `_prefix_underscore`
-- Boolean: `is_`, `has_`, `can_` prefixes
-
-### Java
-- Variables/methods: `camelCase`
-- Classes/interfaces: `PascalCase`
-- Constants: `UPPER_SNAKE_CASE`
-- Packages: `lowercase`
-
-### Go
-- Exported: `PascalCase`
-- Unexported: `camelCase`
-- Acronyms: All caps (`HTTPServer`, not `HttpServer`)
-
-## Common Naming Issues
-
-### Too Vague
-```javascript
-// ❌ Bad - Too generic
-function process(data) { }
-const info = getData();
-let temp = x;
-
-// ✓ Good - Specific and clear
-function processPayment(transaction) { }
-const userProfile = getUserProfile();
-let previousValue = x;
-```
+## Mindset
 
-### Misleading Names
-```javascript
-// ❌ Bad - Name doesn't match behavior
-function getUser(id) {
-  const user = fetchUser(id);
-  user.lastLogin = Date.now();
-  saveUser(user); // Side effect! Not just "getting"
-  return user;
-}
-
-// ✓ Good - Name reflects actual behavior
-function fetchAndUpdateUserLogin(id) {
-  const user = fetchUser(id);
-  user.lastLogin = Date.now();
-  saveUser(user);
-  return user;
-}
-```
+**Names are contracts, not labels.** A public function name is a promise to all callers; a DB column name is a promise to the schema. Renaming is cheap inside a function body, expensive at an API boundary, and potentially breaking at a serialization boundary. Always scope your analysis to blast radius first.
 
-### Abbreviations
-```javascript
-// ❌ Bad - Unclear abbreviations
-const usrCfg = loadConfig();
-function calcTtl(arr) { }
+**Misleading names are bugs, vague names are debt.** A function named `getUser` that mutates state causes real defects. A variable named `data` causes confusion tax — it's worth renaming but not urgent.
 
-// ✓ Good - Clear and readable
-const userConfig = loadConfig();
-function calculateTotal(amounts) { }
+**The reader's mental model is the measure.** A name is good when the reader can predict behavior without reading the body. Test by reading only signatures in a file — they should tell a coherent story.
 
-// ✓ Acceptable - Well-known abbreviations
-const htmlElement = document.getElementById('main');
-const apiUrl = process.env.API_URL;
-```
+**Scope determines acceptable brevity.** A 1-line lambda can use `x`; a module-level exported function cannot use `proc`. The longer the scope, the more the name must earn its place.
 
-### Boolean Naming
-```javascript
-// ❌ Bad - Unclear state
-const login = user.authenticated;
-const status = checkUser();
-
-// ✓ Good - Clear boolean intent
-const isLoggedIn = user.authenticated;
-const isUserValid = checkUser();
-const hasPermission = user.roles.includes('admin');
-const canEditPost = isOwner || isAdmin;
-const shouldShowNotification = isEnabled && hasUnread;
-```
+**Consistency beats perfection.** A codebase that uniformly uses `Manager` everywhere is better than a half-migrated one. Flag convention violations only when the inconsistency is within the same module or PR scope.
 
-### Magic Numbers
-```javascript
-// ❌ Bad - Unnamed constants
-if (age > 18) { }
-setTimeout(callback, 3600000);
+## Navigation
 
-// ✓ Good - Named constants
-const LEGAL_AGE = 18;
-const ONE_HOUR_IN_MS = 60 * 60 * 1000;
+**Use this skill when**:
+- A user asks about naming quality in a specific file, module, or PR
+- Names in a diff are ambiguous, inconsistent, or misleading
+- A refactor has produced names that no longer match behavior
+- Setting up naming conventions for a new project
 
-if (age > LEGAL_AGE) { }
-setTimeout(callback, ONE_HOUR_IN_MS);
-```
-
-## Usage Examples
+**Do NOT use this skill when**:
+- The user wants a linter configured (that's a tooling task, not analysis)
+- All names are already idiomatic and no issues are apparent — silence is correct here
+- The naming is unusual but internally consistent with domain language (e.g., physics simulations using single-letter variable names is standard)
 
+**Quick triage:**
 ```
-@naming-analyzer
-@naming-analyzer src/
-@naming-analyzer UserService.js
-@naming-analyzer --conventions
-@naming-analyzer --fix-all
+Is the name actively misleading (wrong behavior implied)?
+├─ Yes → CRITICAL: flag immediately, suggest rename + verify no serialization boundary
+└─ No → Is it at a public API/serialization boundary?
+    ├─ Yes → HIGH caution: suggest rename only with deprecation plan
+    └─ No → Is it vague/abbreviated in a wide scope?
+        ├─ Yes → MEDIUM: suggest rename, safe to apply via IDE refactor
+        └─ No → Minor style: note it, don't block the review
 ```
 
-## Report Format
-
-```markdown
-# Naming Analysis Report
-
-## Summary
-- Items analyzed: 156
-- Issues found: 23
-- Critical: 5 (misleading names)
-- Major: 12 (unclear/vague)
-- Minor: 6 (convention violations)
-
----
-
-## Critical Issues (5)
-
-### src/services/UserService.js:45
-**Current**: `getUser(id)`
-**Issue**: Function name implies read-only but has side effects (updates lastLogin)
-**Severity**: Critical - Misleading
-**Suggestion**: `fetchAndUpdateUserLogin(id)`
-**Reason**: Name should reflect the mutation
-
-### src/utils/helpers.js:23
-**Current**: `validate(x)`
-**Issue**: Generic parameter name, unclear what's being validated
-**Severity**: Critical - Too vague
-**Suggestion**: `validateEmail(emailAddress)`
-**Reason**: Specific names improve clarity
-
----
-
-## Major Issues (12)
-
-### src/components/DataList.jsx:12
-**Current**: `const d = new Date()`
-**Issue**: Single-letter variable in large scope
-**Severity**: Major
-**Suggestion**: `const currentDate = new Date()`
-**Reason**: Clarity and searchability
-
-### src/api/client.js:67
-**Current**: `function proc(data) {}`
-**Issue**: Abbreviated function name
-**Severity**: Major
-**Suggestion**: `function processApiResponse(data) {}`
-**Reason**: Full words are more readable
-
-### src/models/User.js:34
-**Current**: `user.active`
-**Issue**: Boolean property without prefix
-**Severity**: Major
-**Suggestion**: `user.isActive`
-**Reason**: Follow boolean naming convention
-
-### src/utils/format.js:89
-**Current**: `const MAX = 100`
-**Issue**: Generic constant name
-**Severity**: Major
-**Suggestion**: `const MAX_RETRY_ATTEMPTS = 100`
-**Reason**: Specific purpose is clearer
+## Philosophy
 
----
-
-## Minor Issues (6)
+Names are the primary communication channel between the author's intent and the reader's understanding. Every naming suggestion must improve that channel without breaking the contracts that depend on it. Treat serialization boundaries as immutable until a migration plan exists.
 
-### src/config/settings.js:12
-**Current**: `const API_url = '...'`
-**Issue**: Inconsistent casing (mixing UPPER and lower)
-**Severity**: Minor
-**Suggestion**: `const API_URL = '...'` or `const apiUrl = '...'`
-**Reason**: Consistency in convention
+## NEVER
 
-### src/helpers/string.js:45
-**Current**: `function strToNum(s) {}`
-**Issue**: Abbreviated function and parameter
-**Severity**: Minor
-**Suggestion**: `function stringToNumber(value) {}`
-**Reason**: Clarity over brevity
+- **NEVER suggest renaming a serialized field name without flagging the deprecation requirement** — JSON keys, DB column names, proto fields, Redux action strings, and env var names are consumed by external systems that cannot be updated atomically. A rename without a deprecation alias causes runtime breakage in production.
 
----
+- **NEVER rename a framework lifecycle hook or magic method to "improve clarity"** — `__init__`, `componentDidMount`, `middleware`, `beforeSave` are called by reflection or convention; renaming them silently breaks behavior without any compile error.
 
-## Convention Violations
+- **NEVER flag single-letter names in math/physics/ML code as issues** — `W`, `b`, `X`, `y`, `σ` are the established notation for weights, biases, matrices, targets, and activations. Renaming them to `weightMatrix` destroys correspondence with the paper being implemented.
 
-### Inconsistent Boolean Prefixes
-**Locations**: 8 files
-**Issue**: Mixed use of `is`, `has`, `can` vs no prefix
-**Recommendation**: Standardize on boolean prefixes
-- Use `is` for state: `isActive`, `isVisible`
-- Use `has` for possession: `hasPermission`, `hasError`
-- Use `can` for ability: `canEdit`, `canDelete`
-- Use `should` for decisions: `shouldRender`, `shouldValidate`
+- **NEVER suggest generic replacements for vague names** — replacing `data` with `responseData` is barely better; push for domain-specific names (`invoiceLineItems`, `authTokenPayload`). A suggestion that is itself vague adds no value.
 
-### Mixed Naming Conventions
-**Location**: src/legacy/
-**Issue**: Mix of camelCase and snake_case in JavaScript
-**Recommendation**: Convert all to camelCase for consistency
-
----
+- **NEVER auto-apply renames to generated code** — files named `*.generated.ts`, `*_pb2.py`, `*.g.dart`, or anything in a `generated/` directory are overwritten on the next build cycle. Flag the generator template instead.
 
-## Suggested Renaming
+- **NEVER suggest `Manager`, `Helper`, `Utils`, or `Processor` as improvements** — these are already the problem. They are dumping-ground names that signal missing abstraction, not better names.
 
-### High Priority (Misleading or Critical)
-1. `getUser` → `fetchAndUpdateUserLogin` (src/services/UserService.js:45)
-2. `validate` → `validateEmail` (src/utils/helpers.js:23)
-3. `process` → `processPaymentTransaction` (src/payment/processor.js:67)
-
-### Medium Priority (Clarity)
-1. `d` → `currentDate` (7 locations)
-2. `temp` → `previousValue` (4 locations)
-3. `data` → `apiResponse` or more specific (12 locations)
-4. `arr` → `items`, `values`, or more specific (8 locations)
-
-### Low Priority (Convention)
-1. `active` → `isActive` (12 locations)
-2. `error` → `hasError` (6 locations)
-3. `API_url` → `API_URL` (3 locations)
-
----
+- **NEVER recommend renaming across repos in a single suggestion** — cross-repo renames require coordinated deploys. Flag the dependency and stop; the human must orchestrate the migration.
 
-## Naming Patterns to Follow
+## Analysis Protocol
 
-### Functions/Methods
-- Verbs: `get`, `set`, `create`, `update`, `delete`, `fetch`, `calculate`, `validate`
-- Clear action: `sendEmail()`, `parseJSON()`, `formatCurrency()`
+### Step 1: Classify by blast radius before evaluating quality
 
-### Classes
-- Nouns: `UserService`, `PaymentProcessor`, `EmailValidator`
-- Avoid generic: Don't use `Manager`, `Helper`, `Utility` unless necessary
+1. Read the name in context — what does the surrounding code actually do?
+2. Check: is this name used at a serialization boundary? (See `references/safe-rename-checklist.md`)
+3. Classify: misleading / vague / style-only / fine
 
-### Variables
-- Nouns or noun phrases: `user`, `emailAddress`, `totalAmount`
-- Descriptive: `userList` not `list`, `activeUsers` not `users2`
+### Step 2: Apply semantic tests
 
-### Constants
-- All caps with underscores: `MAX_RETRY_ATTEMPTS`, `DEFAULT_TIMEOUT`
-- Include units: `CACHE_DURATION_MS`, `MAX_FILE_SIZE_MB`
+- **CQS test**: Does the name imply read-only but the body mutates state? → Misleading
+- **Abstraction leak test**: Does the name reveal the implementation layer the caller shouldn't see? → Refactor
+- **Scope-length heuristic**: Is the abbreviation acceptable for this scope? → See `references/semantic-analysis-frameworks.md`
+- **Newspaper test**: Reading only signatures in the file, does the module's purpose emerge?
 
-### Booleans
-- Question form: `isValid`, `hasPermission`, `canEdit`
-- Affirmative: `isEnabled` not `isDisabled` (prefer positive)
+### Step 3: Report with severity + safety
 
----
-
-## Refactoring Script
-
-Would you like me to create a refactoring script to apply these changes?
-This will:
-1. Rename all suggested items
-2. Update all references
-3. Maintain git history
-4. Generate migration guide
-
----
+```markdown
+## [CRITICAL | HIGH | MEDIUM | MINOR] — <current name>
 
-## Best Practices
-
-✓ **DO**:
-- Use full words over abbreviations
-- Be specific and descriptive
-- Follow language conventions
-- Use consistent patterns
-- Make booleans obvious
-- Include units in constants
-
-✗ **DON'T**:
-- Use single letters (except in loops: i, j, k)
-- Use vague names (data, info, temp, x)
-- Mix naming conventions
-- Use misleading names
-- Over-abbreviate
-- Use Hungarian notation in modern code
+**Location**: file:line
+**Issue**: <what is wrong semantically, not just stylistically>
+**Serialization boundary**: [YES — requires deprecation] | [NO — safe to rename]
+**Suggested name**: <specific domain-appropriate name>
+**Why**: <non-obvious reason — not "it's clearer">
 ```
 
-## Naming Decision Tree
+## When Things Go Wrong
 
-```
-Is it a boolean?
-├─ Yes → Use is/has/can/should prefix
-└─ No → Is it a function?
-    ├─ Yes → Use verb phrase (action)
-    └─ No → Is it a class?
-        ├─ Yes → Use noun (PascalCase)
-        └─ No → Is it a constant?
-            ├─ Yes → Use UPPER_SNAKE_CASE
-            └─ No → Use descriptive noun (camelCase/snake_case)
-```
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| Renamed a field and API consumers break silently | Serialization boundary missed; JSON key was load-bearing | Revert; add old name as alias with deprecation warning; communicate to consumers before removal |
+| Renamed a method and tests pass but runtime fails | Framework uses the old name via reflection (Django signals, SQLAlchemy events, Flask routes) | Check for decorator registration or string-based dispatch; restore old name or update registration |
+| Suggested name is already used elsewhere in the codebase | Didn't grep before suggesting | Search for the proposed name before recommending; collision is worse than the original |
+| User accepts rename but it makes diffs unreadable | Name is too long for the diff hunk to provide context | Prefer shorter-but-specific over long-and-exhaustive for frequently-diffed code |
+| Renaming `Manager` class creates 20 sub-issues | The class itself is the problem, not its name | Flag the design smell; recommend decomposition before naming |
 
-## Notes
+## References
 
-- Prioritize clarity over brevity
-- Context matters (loop counters can be `i`, `j`)
-- Well-known abbreviations are okay (`html`, `api`, `url`, `id`)
-- Consistency within a project is more important than perfect naming
-- Refactor names as understanding improves
-- Use IDE rename refactoring to safely update all references
+- `references/safe-rename-checklist.md` — Pre-rename safety checks, framework-protected names, deprecation patterns
+- `references/semantic-analysis-frameworks.md` — CQS test, abstraction leak detection, scope-length heuristics, anti-pattern catalog
diff --git a/skills/naming-analyzer/references/safe-rename-checklist.md b/skills/naming-analyzer/references/safe-rename-checklist.md
new file mode 100644
index 0000000..1eccce7
--- /dev/null
+++ b/skills/naming-analyzer/references/safe-rename-checklist.md
@@ -0,0 +1,69 @@
+# Safe Rename Checklist
+
+Use this before applying any rename suggestion to determine blast radius.
+
+## Serialization Boundary Check
+
+| Signal | Risk Level | Required Action |
+|--------|-----------|-----------------|
+| Field appears in JSON/YAML output consumed by external callers | CRITICAL | Deprecation period; alias old name |
+| Field name matches a DB column (ORM mapping) | HIGH | Migration script + rollback plan |
+| Name appears in an OpenAPI/Swagger spec | HIGH | Bump minor version; keep old field as deprecated |
+| Name is a proto field (`snake_case` in .proto file) | CRITICAL | Never rename — proto field numbers + names are wire-stable |
+| Name is a Redux action type string (`"USER_LOGIN_SUCCESS"`) | HIGH | Old name must remain as alias until all reducers updated |
+| Name appears in a test fixture file (JSON/CSV) | MEDIUM | Update fixtures atomically with code |
+| Name only appears in internal code, no serialization | LOW | Rename freely |
+
+## Pre-Rename Commands
+
+```bash
+# Find all serialized uses (JSON keys, DB columns, env vars)
+grep -rn '"old_name"' .          # JSON keys
+grep -rn "'old_name'" .          # Python dict keys / YAML
+grep -rn "old_name:" .           # YAML / struct tags
+grep -rn "old_name" migrations/  # DB migrations
+
+# Check if name is exported in a public package
+grep -rn "^func OldName\|^var OldName\|^const OldName" .
+
+# Find consumer repos (if in a monorepo)
+grep -rn "old_name" ../other-services/
+```
+
+## Deprecation Pattern (when you cannot rename immediately)
+
+```python
+# Python: alias + deprecation warning
+@property
+def old_name(self):
+    import warnings
+    warnings.warn("old_name is deprecated; use new_name", DeprecationWarning, stacklevel=2)
+    return self.new_name
+```
+
+```typescript
+// TypeScript: keep old interface key as deprecated
+interface User {
+  /** @deprecated use displayName */
+  name?: string;
+  displayName: string;
+}
+```
+
+## Cognitive Load Budget
+
+Names live in working memory. Each name a reader must hold costs ~7 bits of cognitive load.
+- Rename when: the existing name actively misleads or requires a comment to understand
+- Do NOT rename when: the name is merely suboptimal and the file is stable/rarely read
+
+## Framework-Specific Protected Names
+
+| Framework | Protected Pattern | Why |
+|-----------|------------------|-----|
+| Django ORM | `Meta.db_table`, field `db_column` | Maps to physical schema |
+| SQLAlchemy | `Column(name=...)` | Physical column name, not Python attr |
+| Pydantic v1 | `Field(alias=...)` | JSON key in serialized output |
+| React | `displayName` on components | Used by DevTools for debugging |
+| Redux | action type strings | String-matched in reducers across files |
+| gRPC/Protobuf | any field in `.proto` | Wire format, field numbers immutable |
+| Terraform | `resource "type" "name"` | State file key — rename = destroy+recreate |
diff --git a/skills/naming-analyzer/references/semantic-analysis-frameworks.md b/skills/naming-analyzer/references/semantic-analysis-frameworks.md
new file mode 100644
index 0000000..8a41ada
--- /dev/null
+++ b/skills/naming-analyzer/references/semantic-analysis-frameworks.md
@@ -0,0 +1,92 @@
+# Semantic Analysis Frameworks for Naming
+
+## The Command-Query Separation (CQS) Name Test
+
+Every function name should pass CQS: it either reads state (query) or changes state (command) — never both.
+
+| Name | Passes CQS? | Fix |
+|------|------------|-----|
+| `getUser()` — but also updates lastLogin | FAIL | `fetchUserAndRecordLogin()` |
+| `isValid()` — but logs to analytics | FAIL | `validate()` + separate `logValidationAttempt()` |
+| `saveAndReturn()` | FAIL by design | split or rename to `persist()` |
+| `calculateTotal()` — pure computation | PASS | keep |
+| `setActive()` — sets flag | PASS | keep |
+
+## Leaky Abstraction Detection
+
+Names that reveal implementation details that the caller shouldn't know:
+
+```
+getFromRedis()        → fetchFromCache()     (caller shouldn't know storage layer)
+saveToMongo()         → persist()            (same)
+parseXMLResponse()    → parseResponse()      (if format may change)
+callStripeAPI()       → chargeCard()         (caller shouldn't know vendor)
+```
+
+Exception: when the implementation IS the interface (e.g., a Redis adapter class named `RedisCache`).
+
+## Temporal Coupling in Names
+
+Names that imply ordering are often a code smell:
+
+```
+initThenProcess()     → smell: two steps coupled in one name
+step1(), step2()      → smell: positional, brittle
+beforeSave()          → smell: lifecycle hook mixed with logic
+```
+
+Better: Extract each step, name for what it does, let the caller sequence them.
+
+## Abstraction Level Mismatch
+
+Names should operate at a single abstraction level. Mixed-level names signal design problems:
+
+```python
+# Mixed: business concept + implementation detail
+def create_user_and_send_welcome_email_via_sendgrid(user_data): ...
+
+# Better: two functions at their own level
+def create_user(user_data): ...
+def send_welcome_email(user): ...  # internally uses sendgrid
+```
+
+## The Newspaper Test for Names
+
+Read just the function/class names in a file — they should tell a coherent story to someone unfamiliar with the codebase. If a reader must open function bodies to understand what the module does, the names have failed.
+
+## Scope-Length Heuristic (Practitioner Rule)
+
+Scope length determines acceptable name brevity:
+
+| Scope | Max acceptable abbreviation |
+|-------|----------------------------|
+| 1-line lambda | `x`, `e`, `_` OK |
+| 5-line loop body | `i`, `k`, `err` OK |
+| 20-line function | `usr`, `cfg` borderline |
+| Module/class level | Full words required |
+| Public API / exported | Full words + domain context required |
+
+## The "Diff Readability" Test
+
+Before suggesting a rename, ask: will this name make git diffs easier or harder to read?
+
+- `data` → `apiResponse`: diff is clearer (context visible in hunks)
+- `userConfigurationSettings` → `userCfgSettings`: diff is harder (the length was adding nothing)
+- `e` → `error` in a module-level handler: diff is much clearer
+
+Optimal name length = maximum clarity at the scope where it's read, not where it's defined.
+
+## Anti-Pattern Catalog: Names That Seem Fine But Aren't
+
+| Name | Hidden Problem |
+|------|---------------|
+| `Manager` suffix | Signals the class does too much; "manages" is not a behavior |
+| `Helper` / `Utils` | Dumping ground; hides that no real abstraction exists |
+| `process()` | Infinitely vague; every function "processes" something |
+| `handle()` | Same — event handlers that grow into 200-line methods all started as `handle()` |
+| `do_thing()` | `do_` prefix adds no information |
+| `flag`, `status`, `mode` | Enums/booleans that should be named by what they represent |
+| `current` prefix | `currentUser` vs `user` — "current" is always implicit in method scope |
+| `new_` prefix | `new_value` — signals a refactor halfway done; merge or rename |
+| `_v2` suffix | `UserService_v2` — versioning belongs in modules, not class names |
+| `temp`, `tmp` | If it's still called `temp` in the final commit, it's not temp |
diff --git a/skills/nist-800-61r3-after-action-reviewer/SKILL.md b/skills/nist-800-61r3-after-action-reviewer/SKILL.md
index 1b03491..5422e7b 100644
--- a/skills/nist-800-61r3-after-action-reviewer/SKILL.md
+++ b/skills/nist-800-61r3-after-action-reviewer/SKILL.md
@@ -1,179 +1,97 @@
 ---
 name: nist-800-61r3-after-action-reviewer
-description: "Evaluate post-incident reports, after-action reports (AARs), and lessons-learned documents against NIST SP 800-61r3's post-incident activity requirements — specifically RS.AN-03, RS.AN-06/07/08, RC.RP-06, and ID.IM-03. Checks root cause completeness, evidence integrity, magnitude validation, stakeholder communications, recovery verification, and whether lessons are actionable. Use when asked to 'review this after-action report', 'evaluate our lessons learned', or 'post-incident review check'."
+description: "Evaluate post-incident reports, after-action reports (AARs), and lessons-learned documents against NIST SP 800-61r3's post-incident activity requirements — specifically RS.AN-03, RS.AN-06/07/08, RC.RP-06, and ID.IM-03. Checks root cause completeness, evidence integrity, magnitude validation, stakeholder communications, recovery verification, and whether lessons are actionable. Use when asked to 'review this after-action report', 'evaluate our lessons learned', 'post-incident review check', or 'does this AAR meet NIST requirements'."
 user-invocable: true
 ---
 
 # NIST SP 800-61r3 After-Action Reviewer
 
-Evaluate post-incident reports, after-action reports (AARs), and lessons-learned documents against NIST SP 800-61r3 (April 2025) post-incident requirements.
+Evaluate post-incident reports, AARs, and lessons-learned documents against NIST SP 800-61r3 (April 2025) post-incident requirements.
 
 ---
 
-## Goal
+## Mindset
 
-SP 800-61r3 defines rigorous requirements for what an after-action report must contain and how the lessons-learned process feeds back into IR program improvement. This skill checks that an AAR captures everything the standard requires — not just "what happened" but "how we know," "what we missed," and "what we're changing."
+1. **Proximate cause is never the root cause.** "User clicked a link" describes what happened. The root cause is always structural: the control that was absent, the process that failed, the architecture that permitted it. Treat any single-sentence root cause as incomplete by default.
+
+2. **Lessons captured ≠ lessons actionable.** A 10-item lessons list with zero owners and zero deadlines is an observation log, not an improvement plan. Score actionability ruthlessly — it's the single best predictor of whether the program actually improves.
+
+3. **Scope validation is not optional.** Unverified scope is the top cause of re-infection. If an AAR covers only the initially identified host without documenting adjacent-asset checks, treat this as a critical finding regardless of how thorough the rest of the document is.
+
+4. **Silence on regulatory notification is non-compliant.** An AAR that doesn't mention regulatory notification assessment is not silent — it's missing a required element. The decision must be documented even when the answer is "not applicable."
+
+5. **Document quality predicts program maturity.** An AAR that reads like a chronology but lacks systemic analysis, owner assignments, or cross-function improvement actions signals a team that responds to incidents but doesn't learn from them.
 
 ---
 
-## Inputs
+## Navigation
 
-- The after-action report, post-incident report, or lessons-learned document (text or file path)
-- Incident type (data breach, ransomware, account takeover, DoS, insider, etc.) — affects scope
-- Was this a major incident? (affects whether senior leadership update requirements apply)
+### When to use this skill
+- User presents a completed or draft after-action report, post-incident report, or lessons-learned document for review
+- User asks whether an AAR "meets NIST," "passes a compliance audit," or "is complete"
+- User wants to improve an AAR before submitting to leadership, auditors, or regulators
+- Checking output from `nist-800-61r3-incident-coordinator` or similar
+
+### When NOT to use this skill
+- User is conducting an active incident — use `nist-800-61r3-incident-coordinator` instead
+- User wants general NIST 800-61r3 guidance without a specific document to review — provide framework overview directly
+- Document is a tabletop exercise report (different standard; not a live-incident AAR)
+
+### Pre-flight decision tree for incomplete documents
+
+```
+Is a document provided?
+├── No → Ask: "Please share the AAR text or file path to review."
+│
+└── Yes → Is it clearly labeled as an AAR, post-incident report, or lessons-learned doc?
+    ├── No → Confirm: "Is this a post-incident report for a [live/past] security incident?"
+    │         If yes, proceed. If tabletop/drill, note different standard applies.
+    │
+    └── Yes → Does the document have at least a basic incident description + some lessons?
+        ├── No (e.g., single paragraph stub) →
+        │   State: "This document is too incomplete to score against RC.RP-06.
+        │   Identifying what's missing: [list missing components A/B/C].
+        │   Recommend completing these sections before formal review."
+        │   Provide the Component A/B/C checklist from references/aar-completeness-checklist.md.
+        │
+        └── Yes → Proceed with full 6-step review workflow.
+```
+
+---
+
+## Philosophy
+
+An after-action report that chronicles what happened but fails to drive change is documentation theater. The standard of review is not "did they write it down" but "will reading this cause the program to be meaningfully different tomorrow."
 
 ---
 
 ## Workflow
 
 ```
-1. Check RC.RP-06 — After-Action Report Completeness
+1. RC.RP-06  — After-Action Report Completeness     [load: references/aar-completeness-checklist.md]
    ↓
-2. Check RS.AN-03 — Root Cause Analysis Quality
+2. RS.AN-03  — Root Cause Analysis Quality          [load: references/root-cause-evidence-checklist.md]
    ↓
-3. Check RS.AN-06/07/08 — Evidence & Records Integrity
+3. RS.AN-06/07/08 — Evidence & Records Integrity    [load: references/root-cause-evidence-checklist.md]
    ↓
-4. Check RS.CO-02/03 — Communication Retrospective
+4. RS.CO-02/03 — Communication Retrospective        [load: references/comms-recovery-lessons-checklist.md]
    ↓
-5. Check RC.RP-04/05 — Recovery Verification
+5. RC.RP-04/05 — Recovery Verification              [load: references/comms-recovery-lessons-checklist.md]
    ↓
-6. Check ID.IM-03 — Lessons-Learned Loop Closure
+6. ID.IM-03  — Lessons-Learned Loop Closure         [load: references/comms-recovery-lessons-checklist.md]
    ↓
-7. Generate AAR Review Report
+7. Generate AAR Review Report (output format below)
 ```
 
-### Step 1: RC.RP-06 — After-Action Report Completeness
-
-**RC.RP-06.R1** states: "Prepare an after-action report that documents the incident itself, the response and recovery actions taken, and lessons learned."
-
-Check that the AAR contains all three mandated components:
-
-**Component A — The Incident**
-- [ ] Incident type and classification (per RS.MA-03 categorization)
-- [ ] Initial detection method and timestamp (links to DE.AE)
-- [ ] Timeline of significant events from first indicator to recovery completion
-- [ ] Systems, data, and services affected
-- [ ] Estimated or confirmed impact (financial, operational, reputational, regulatory)
-- [ ] Threat actor characterization if known (TTPs, attribution confidence)
-
-**Component B — Response and Recovery Actions**
-- [ ] Triage and validation actions (RS.MA-02)
-- [ ] Incident management decisions and rationale (RS.MA-03, RS.MA-04)
-- [ ] Containment measures applied (RS.MI-01)
-- [ ] Eradication actions taken (RS.MI-02)
-- [ ] Recovery steps executed in order (RC.RP-02, RC.RP-04)
-- [ ] Recovery completion declaration (RC.RP-06)
-- [ ] Notifications sent (RS.CO-02, RS.CO-03) with timestamps and recipients
-
-**Component C — Lessons Learned**
-- [ ] What worked well
-- [ ] What did not work or was delayed
-- [ ] Root cause(s) — not just proximate cause but systemic (see Step 2)
-- [ ] Specific, assignable improvement actions (not vague "improve monitoring")
-- [ ] Owner and deadline for each improvement action
-- [ ] Which CSF Function each improvement targets
-
-### Step 2: RS.AN-03 — Root Cause Analysis Quality
-
-SP 800-61r3 RS.AN-03 requires four specific analysis activities:
-
-**RS.AN-03.R1** — "Determine the sequence of events that have occurred during the incident and which assets and resources were involved in each of those events."
-- Check: Is there a complete, timestamped event sequence?
-- Check: Are affected assets enumerated at each stage?
-- Gap signal: Report says "attacker gained access" without explaining the full chain
-
-**RS.AN-03.R2** — "Determine what vulnerabilities, threats, and threat actors were directly or indirectly involved in the incident."
-- Check: Is the specific vulnerability or misconfiguration that enabled the incident identified?
-- Check: Are indirect enablers noted (e.g., missing MFA, stale accounts, excessive permissions)?
-- Gap signal: Report blames "phishing" without identifying which controls failed or were absent
-
-**RS.AN-03.R3** — "Analyze the incident to find the underlying or systemic root causes."
-- Check: Does the AAR go beyond the proximate cause (the exploit/vector) to systemic causes?
-- Systemic cause examples: patch management failure, training gap, monitoring blind spot, inadequate access controls, architectural weakness
-- Technique check: Was a 5-Whys or fishbone analysis performed?
-- Gap signal: Report identifies root cause as "an employee clicked a phishing link" — this is a proximate cause, not systemic
-
-**RS.AN-03.R4** — "Check any deployed cyber deception technology for additional information on attacker behavior."
-- Check: If honeypots, deception tokens, or canary files were deployed — were they checked?
-- Mark N/A if no deception technology is deployed
-
-### Step 3: RS.AN-06/07/08 — Evidence & Records Integrity
-
-**RS.AN-06.R1** — "Safeguard the confidentiality and integrity of incident response records; ensure only authorized personnel have access."
-- Check: Does the AAR reference how IR records were protected?
-- Check: Is there a statement about who had access to incident data?
-- Note: The incident lead is responsible for records safeguarding
-
-**RS.AN-07.R1** — "Collect and retain evidence per evidence preservation procedures and data retention policies; consider factors including the possibility of prosecution."
-- Check: Were forensic images, logs, or artifacts collected and documented?
-- Check: Is there reference to chain of custody procedures?
-- Check: Was prosecution possibility considered and documented?
-- Check: Are retention timelines for collected evidence noted?
-
-**RS.AN-08.R1** — "Look for indicators of compromise, evidence of persistence, and other signs of an incident on both the assets known to be targeted and other potential targets."
-- Check: Was the incident scope validated beyond the initially identified systems?
-- Check: Were adjacent/related assets checked for IoCs?
-- Check: Was persistence mechanism identified and confirmed removed?
-- Gap signal: Report addresses only initially identified host without checking lateral spread
-
-### Step 4: RS.CO-02/03 — Communication Retrospective
-
-Review whether required notifications were actually performed:
-
-**RS.CO-02.R2** — "Follow established procedures concerning incident coordination: what must be reported, to whom, and at what times."
-- Check: Were all required internal notifications made (leadership, legal, HR, asset owners)?
-- Check: Were notification timestamps documented?
-
-**RS.CO-02.R3** — "Perform notifications in compliance with incident notification laws/regulations."
-- Check: Were regulatory/legal notification obligations triggered?
-- Check: If triggered, were they met within required timeframes?
-- Check: Is there documentation of the notification decision (even if decided not required)?
-
-**RS.CO-02.R5** — "Notify law enforcement and regulatory bodies per criteria in IR plan."
-- Check: Was law enforcement notification considered and decision documented?
+Load the referenced checklist file for each step group. Do not reproduce checklist items inline — use the reference files to drive analysis, then report findings in the output format.
 
-**RS.CO-03.R2** — "Regularly update senior leadership on status of major incidents."
-- Check (major incidents only): Were senior leadership updates provided during the incident?
-- Check: Is there a log of leadership communications?
-
-### Step 5: RC.RP-04/05 — Recovery Verification
-
-**RC.RP-04.R1** — "Validate that essential services are restored in the appropriate order."
-- Check: Does the AAR document the restoration sequence?
-- Check: Were critical services restored before non-critical?
-
-**RC.RP-04.R2** — "Work with system owners to confirm successful restoration and return to normal operations."
-- Check: Did system owners sign off on recovery?
-- Check: Is there documentation of owner confirmations?
-
-**RC.RP-04.R3** — "Monitor performance of restored systems to verify adequacy of restoration."
-- Check: Was there a post-recovery monitoring period?
-- Check: Were any anomalies detected post-restoration?
-
-**RC.RP-05.R1** — "Check restored assets for IoCs; remediate root causes before production use."
-- Check: Were restored systems scanned for IoCs before going back online?
-- Check: Were root causes remediated before production restoration?
-
-**RC.RP-05.R2** — "Verify correctness and adequacy of restoration actions before putting restored system online."
-- Check: Was there a pre-production verification step?
-
-### Step 6: ID.IM-03 — Lessons-Learned Loop Closure
-
-**ID.IM-03.N3** — "Creating follow-up reports or holding 'lessons learned' meetings when an incident's recovery efforts are concluding, especially if the incident was major."
-
-**ID.IM-03.N2** — "Improvements that affect IR can be made to the IR program itself (plan, policy, procedures) or to other aspects of cybersecurity risk management (e.g., identifying TTPs not currently blocked by safeguards or flagged by detection technologies)."
+---
 
-Check:
-- [ ] Was a lessons-learned meeting held? (attendees, date documented?)
-- [ ] Do improvement actions target IR program changes (not just technical fixes)?
-- [ ] Do any lessons feed back to Preparation (GV, ID, PR functions)?
-- [ ] Are improvement actions SMART (Specific, Measurable, Assignable, Realistic, Time-bound)?
-- [ ] Is there a tracking mechanism for improvement actions?
+## Inputs
 
-**Lessons-captured vs. lessons-actionable assessment:**
-- Count total lessons identified
-- Count lessons with assigned owner AND deadline
-- Ratio = actionability score
+- The after-action report, post-incident report, or lessons-learned document (text or file path)
+- Incident type (data breach, ransomware, account takeover, DoS, insider, etc.) — affects scope
+- Was this a major incident? (affects whether senior leadership update requirements apply under RS.CO-03.R2)
 
 ---
 
@@ -211,11 +129,10 @@ deadlines. ID.IM-03 requires SMART improvement actions.
 
 [✓] R1 — Event sequence: Complete timeline from initial access to containment
 [~] R2 — Vulnerability identification: Phishing vector identified; no analysis
-of why MFA bypass was possible or why the phishing email bypassed email filtering
-[✗] R3 — Systemic root cause: Report identifies "employee clicked phishing link"
-as root cause — this is proximate, not systemic. No 5-Whys or structural analysis.
-Gap: Missing analysis of: why MFA was not enforced on this account class,
-why email filtering did not catch this variant, whether this reflects a pattern.
+of why MFA bypass was possible or why phishing bypassed email filtering
+[✗] R3 — Systemic root cause: "Employee clicked phishing link" = proximate cause.
+No 5-Whys or structural analysis. Missing: why MFA not enforced, why email
+filtering failed this variant, whether this reflects a pattern.
 [N/A] R4 — No deception technology deployed per §1.3
 
 ━━━ RS.AN-06/07/08 — EVIDENCE & RECORDS ━━━
@@ -226,8 +143,8 @@ were checked for lateral movement or IoC presence
 
 ━━━ RS.CO COMMUNICATIONS RETROSPECTIVE ━━━
 [✓] Internal notifications made with timestamps (§5.1)
-[~] RS.CO-02.R3 — Regulatory notification decision not documented (HIPAA
-applicability not assessed in AAR)
+[~] RS.CO-02.R3 — Regulatory notification: HIPAA applicability not assessed;
+decision not documented (silence = non-compliant)
 [✓] RS.CO-03.R2 — Senior leadership received 3 updates during incident
 
 ━━━ RC.RP RECOVERY VERIFICATION ━━━
@@ -238,19 +155,57 @@ applicability not assessed in AAR)
 Lessons identified: 3
 Lessons with owner + deadline: 0
 Actionability score: 0% — all lessons are observations, none are assigned actions
+Program improvement scope: No lessons target IR plan/policy changes (all technical)
 
 ━━━ SUMMARY ━━━
 Critical gaps: RS.AN-03.R3 (systemic root cause), RC.RP-05.R1 (IoC scan),
-              ID.IM-03 actionability (no assigned owners)
-Significant gaps: RS.AN-08.R1 (scope validation), RC.RP-04.R2 (owner sign-off)
+              ID.IM-03 actionability (0% — no assigned owners or deadlines)
+Significant gaps: RS.AN-08.R1 (scope validation), RC.RP-04.R2 (owner sign-off),
+                 RS.CO-02.R3 (regulatory decision not documented)
 Assessment: AAR captures what happened but will not improve the IR program.
 The absence of systemic root cause analysis means the same incident is likely
-to recur. The 0% actionability on lessons learned means no improvements will
-be tracked or implemented.
+to recur. The 0% actionability on lessons means no improvements will be tracked.
 ```
 
 ---
 
+## NEVER
+
+- **NEVER accept proximate cause as root cause** — "User clicked a link" is a behavior, not a cause. The standard requires systemic analysis (RS.AN-03.R3). Every proximate cause has at least 3–5 structural causes behind it; accepting the proximate answer lets all of them go unfixed.
+
+- **NEVER mark lessons learned as sufficient without owner + deadline** — Unassigned lessons are observations that will never be acted on. An ID.IM-03 finding without a named human and a date is indistinguishable from not having the finding at all.
+
+- **NEVER skip the scope validation check (RS.AN-08.R1)** — Failure to validate scope is the leading cause of re-infection on adjacent systems. Teams that validate only the initially identified host consistently experience follow-on incidents within 90 days.
+
+- **NEVER treat regulatory notification silence as compliant** — RS.CO-02.R3 requires the decision to be documented, not just the action. An AAR that doesn't mention regulatory notification at all fails this requirement even if notification was correctly determined to be unnecessary.
+
+- **NEVER accept verbal sign-offs as documentation** — RC.RP-04.R2 requires system owner confirmation of restoration. Verbal confirmation that was not recorded is legally and procedurally equivalent to no confirmation. Flag as a gap regardless of whether recovery was successful.
+
+- **NEVER omit N/A documentation for deception technology (RS.AN-03.R4)** — Teams without honeypots or deception tokens still must document the absence explicitly. A blank field is ambiguous; an explicit "N/A — no deception technology deployed" is compliant.
+
+- **NEVER score ID.IM-03 on meeting held alone** — The standard requires both a meeting AND actionable follow-up. Teams that hold thorough lessons-learned meetings but don't assign owners have satisfied the meeting requirement and violated the improvement requirement. Score both independently.
+
+- **NEVER let technical-only improvements satisfy ID.IM-03.N2** — The standard explicitly requires that improvements can target the IR program itself (plan, policy, procedures) or other cybersecurity risk management areas. An AAR where every improvement is a patch or a tool purchase and none target process/program changes is non-compliant with the intent of ID.IM-03.
+
+- **NEVER mark evidence handling complete without retention timeline** — RS.AN-07.R1 requires not just collection but retention policy compliance. Evidence collected without documented retention timelines may be improperly destroyed before litigation hold periods expire or preserved past required deletion dates.
+
+- **NEVER skip the communications retrospective for minor incidents** — RS.CO-02.R2 notification requirements apply regardless of severity. Minor incidents still require documented internal notifications. The difference for major incidents is the addition of senior leadership updates (RS.CO-03.R2), not the elimination of other requirements.
+
+---
+
+## When Things Go Wrong
+
+| Situation | Response |
+|-----------|----------|
+| AAR contains root cause analysis but it's a single paragraph with no method cited | Flag RS.AN-03.R3 as partial. Note analysis method is required (5-Whys, fishbone, fault tree). Ask whether a structured analysis was performed verbally and not documented — if so, that also fails the standard since it requires analysis to be in the record. |
+| Lessons list is long (10+) but no owners or deadlines | Score actionability at 0% regardless of lesson count. A long observation list with no assignments is worse than a short actionable list — it creates false confidence that the team is learning. |
+| Document is clearly a draft — author says "owners TBD" | Review for structural completeness but note that the actionability score cannot be assessed until owners and deadlines are assigned. Provide the SMART criteria from references/comms-recovery-lessons-checklist.md as a template. |
+| Incident involved a vendor/third party and the AAR only covers the organization's response | Flag scope gap under RS.AN-08.R1: third-party incident scope validation is required. The AAR should document what visibility the organization had into the vendor environment and what compensating controls or contractual remedies apply. |
+| Document is actually a tabletop/exercise after-action, not a live incident AAR | Note that NIST SP 800-61r3 applies to live incidents. Tabletop AARs are governed by exercise program standards (e.g., HSEEP). Offer to evaluate against tabletop AAR best practices instead, but do not apply live-incident scoring. |
+| Regulatory notification section says "N/A" with no rationale | Flag as non-compliant — RS.CO-02.R3 requires the rationale for N/A determinations. Which regulation was considered? What criteria determined it didn't apply? Document the analysis, not just the conclusion. |
+
+---
+
 ## Deliverable
 
 A section-by-section AAR review with citation-anchored findings, a lessons-actionability score, and specific improvement recommendations. Suitable for:
@@ -258,12 +213,3 @@ A section-by-section AAR review with citation-anchored findings, a lessons-actio
 - Compliance audit support
 - AAR revision guidance
 - Input to `nist-800-61r3-maturity-scorer`
-
----
-
-## NEVER
-
-- **NEVER accept proximate cause as root cause** — "user clicked a link" is never a root cause; the systemic failure enabling that click is
-- **NEVER mark lessons learned as sufficient without owner + deadline** — unassigned lessons are observations, not improvements
-- **NEVER skip the scope validation check (RS.AN-08)** — unvalidated scope is a top cause of incidents recurring on adjacent systems
-- **NEVER omit the communications retrospective** — notification failures have legal and regulatory consequences that surface later
diff --git a/skills/nist-800-61r3-after-action-reviewer/references/aar-completeness-checklist.md b/skills/nist-800-61r3-after-action-reviewer/references/aar-completeness-checklist.md
new file mode 100644
index 0000000..b61b38f
--- /dev/null
+++ b/skills/nist-800-61r3-after-action-reviewer/references/aar-completeness-checklist.md
@@ -0,0 +1,48 @@
+# RC.RP-06 — After-Action Report Completeness Checklist
+
+Load this file when performing Step 1 of the AAR review workflow.
+
+---
+
+## Component A — The Incident
+
+- [ ] Incident type and classification (per RS.MA-03 categorization)
+- [ ] Initial detection method and timestamp (links to DE.AE)
+- [ ] Timeline of significant events from first indicator to recovery completion
+- [ ] Systems, data, and services affected (with asset identifiers)
+- [ ] Estimated or confirmed impact (financial, operational, reputational, regulatory)
+- [ ] Threat actor characterization if known (TTPs, attribution confidence level)
+
+**Gap signal:** Report says "attacker gained access" without complete chain of events — fails RC.RP-06.R1.
+
+---
+
+## Component B — Response and Recovery Actions
+
+- [ ] Triage and validation actions (RS.MA-02)
+- [ ] Incident management decisions and rationale (RS.MA-03, RS.MA-04)
+- [ ] Containment measures applied with timestamps (RS.MI-01)
+- [ ] Eradication actions taken with verification (RS.MI-02)
+- [ ] Recovery steps executed in documented order (RC.RP-02, RC.RP-04)
+- [ ] Recovery completion declaration with date/time (RC.RP-06)
+- [ ] Notifications sent (RS.CO-02, RS.CO-03) — timestamps AND recipients required
+
+**Gap signal:** Recovery noted as "completed" without sequence or owner sign-off — fails RC.RP-04.R1 and RC.RP-04.R2.
+
+---
+
+## Component C — Lessons Learned
+
+- [ ] What worked well (specific, not generic)
+- [ ] What did not work or was delayed (with contributing factors)
+- [ ] Root cause(s) — systemic, not proximate (see RS.AN-03 checklist)
+- [ ] Specific, assignable improvement actions (not vague "improve monitoring")
+- [ ] Owner name and deadline for each improvement action
+- [ ] Which CSF Function/outcome each improvement targets
+
+**Actionability scoring:**
+- Count total lessons identified
+- Count lessons with assigned owner AND specific deadline
+- Actionability ratio = assigned / total (target: 100%; below 50% = critical finding)
+
+**Gap signal:** Lessons like "improve communication" with no owner, no deadline, no measurable outcome — fails ID.IM-03.N3.
diff --git a/skills/nist-800-61r3-after-action-reviewer/references/comms-recovery-lessons-checklist.md b/skills/nist-800-61r3-after-action-reviewer/references/comms-recovery-lessons-checklist.md
new file mode 100644
index 0000000..749e307
--- /dev/null
+++ b/skills/nist-800-61r3-after-action-reviewer/references/comms-recovery-lessons-checklist.md
@@ -0,0 +1,97 @@
+# RS.CO / RC.RP / ID.IM-03 — Communications, Recovery & Lessons Checklists
+
+Load this file when performing Steps 4–6 of the AAR review workflow.
+
+---
+
+## RS.CO-02/03 — Communication Retrospective
+
+### RS.CO-02.R2 — Internal Notifications
+"Follow established procedures concerning incident coordination: what must be reported, to whom, and at what times."
+
+- [ ] Required internal notifications made: leadership, legal, HR, asset owners, privacy officer
+- [ ] Notification timestamps documented for each recipient group
+- [ ] Any missed or delayed notifications acknowledged with reason
+
+### RS.CO-02.R3 — Regulatory Notifications
+"Perform notifications in compliance with incident notification laws/regulations."
+
+- [ ] Regulatory/legal notification obligations assessed (HIPAA, GDPR, PCI-DSS, state breach laws, SEC, CISA, etc.)
+- [ ] If triggered: notifications sent within required timeframes with timestamps
+- [ ] If not triggered: decision explicitly documented with rationale (not just silence)
+
+**Practitioner trap:** An AAR that is silent on regulatory notification is not compliant — the decision must be documented either way.
+
+### RS.CO-02.R5 — Law Enforcement
+"Notify law enforcement and regulatory bodies per criteria in IR plan."
+
+- [ ] Law enforcement notification considered and decision documented
+- [ ] If notified: agency, date, case number documented
+- [ ] If not notified: rationale documented (not critical infrastructure, no criminal nexus, etc.)
+
+### RS.CO-03.R2 — Senior Leadership Updates (Major Incidents)
+"Regularly update senior leadership on status of major incidents."
+
+- [ ] (Major incidents only) Leadership update log exists with dates and content summary
+- [ ] Escalation criteria from IR plan referenced
+
+---
+
+## RC.RP-04/05 — Recovery Verification
+
+### RC.RP-04.R1 — Restoration Sequence
+"Validate that essential services are restored in the appropriate order."
+
+- [ ] Restoration sequence documented with prioritization rationale
+- [ ] Critical services confirmed restored before non-critical
+- [ ] Sequence matches BCP/DRP priority tiers if those exist
+
+### RC.RP-04.R2 — Owner Confirmation
+"Work with system owners to confirm successful restoration and return to normal operations."
+
+- [ ] System owner sign-off documented (written, not just verbal)
+- [ ] Each restored system has a named owner confirmation
+
+### RC.RP-04.R3 — Post-Recovery Monitoring
+"Monitor performance of restored systems to verify adequacy of restoration."
+
+- [ ] Post-recovery monitoring period defined and executed
+- [ ] Anomalies detected post-restoration documented (or explicit "none detected")
+
+### RC.RP-05.R1 — Pre-Production IoC Scan
+"Check restored assets for IoCs; remediate root causes before production use."
+
+- [ ] Restored systems scanned for IoCs before going back online
+- [ ] Root causes confirmed remediated before production restoration
+- [ ] Scan results documented
+
+### RC.RP-05.R2 — Pre-Production Verification
+"Verify correctness and adequacy of restoration actions before putting restored system online."
+
+- [ ] Pre-production verification step explicitly documented
+- [ ] Verification method (automated scan, manual inspection, integrity check) noted
+
+---
+
+## ID.IM-03 — Lessons-Learned Loop Closure
+
+### ID.IM-03.N3 — Meeting and Follow-Up
+"Creating follow-up reports or holding 'lessons learned' meetings when an incident's recovery efforts are concluding, especially if the incident was major."
+
+- [ ] Lessons-learned meeting held: date, attendees, facilitator documented
+- [ ] Follow-up report or this AAR serves as the documented output
+
+### ID.IM-03.N2 — Program Improvement Scope
+"Improvements that affect IR can be made to the IR program itself or to other aspects of cybersecurity risk management."
+
+- [ ] At least one improvement targets IR program changes (plan, policy, procedures) — not only technical fixes
+- [ ] At least one lesson feeds back to Preparation functions (GV, ID, PR)
+- [ ] All improvement actions are SMART: Specific, Measurable, Assignable, Realistic, Time-bound
+- [ ] Tracking mechanism identified for improvement actions (ticket system, roadmap item, etc.)
+
+**Actionability scoring formula:**
+- Actionability % = (lessons with owner AND deadline) / (total lessons) × 100
+- 100% = fully actionable
+- 50–99% = partially actionable (note unassigned items)
+- Below 50% = critical finding (lessons captured but not actioned)
+- 0% = AAR will not improve the program — this is the most common critical finding
diff --git a/skills/nist-800-61r3-after-action-reviewer/references/root-cause-evidence-checklist.md b/skills/nist-800-61r3-after-action-reviewer/references/root-cause-evidence-checklist.md
new file mode 100644
index 0000000..aa34e4c
--- /dev/null
+++ b/skills/nist-800-61r3-after-action-reviewer/references/root-cause-evidence-checklist.md
@@ -0,0 +1,73 @@
+# RS.AN-03 / RS.AN-06/07/08 — Root Cause & Evidence Checklists
+
+Load this file when performing Steps 2–3 of the AAR review workflow.
+
+---
+
+## RS.AN-03 — Root Cause Analysis Quality
+
+### R1 — Event Sequence (RS.AN-03.R1)
+"Determine the sequence of events that have occurred during the incident and which assets and resources were involved in each of those events."
+
+- [ ] Complete, timestamped event sequence exists
+- [ ] Affected assets enumerated at each stage of the sequence
+- [ ] Gaps in timeline are explicitly acknowledged with explanation
+
+**Gap signal:** Report says "attacker gained access" without explaining the full chain — fails R1.
+
+### R2 — Vulnerability and Threat Identification (RS.AN-03.R2)
+"Determine what vulnerabilities, threats, and threat actors were directly or indirectly involved."
+
+- [ ] Specific vulnerability or misconfiguration enabling the incident identified
+- [ ] Indirect enablers noted (missing MFA, stale accounts, excessive permissions, monitoring gaps)
+- [ ] Threat actor capability level characterized (opportunistic vs. targeted)
+
+**Gap signal:** Report blames "phishing" without identifying which controls failed or were absent — fails R2.
+
+### R3 — Systemic Root Cause (RS.AN-03.R3)
+"Analyze the incident to find the underlying or systemic root causes."
+
+- [ ] AAR goes beyond proximate cause to systemic causes
+- [ ] At least one of these systemic categories addressed: patch management, training/awareness, monitoring gaps, access control architecture, process/procedure failure, vendor/supply chain, governance/policy gap
+- [ ] Analysis method noted (5-Whys, fishbone/Ishikawa, fault tree, or equivalent)
+
+**Practitioner trap:** "Employee clicked phishing link" = proximate cause. The systemic causes are: (1) why did email filtering not catch it, (2) why was MFA not enforced on that account class, (3) why did the user not recognize/report it, (4) why did detection not fire sooner. All four can be systemic findings.
+
+**Gap signal:** Single-sentence root cause with no structural analysis — fails R3.
+
+### R4 — Deception Technology Check (RS.AN-03.R4)
+"Check any deployed cyber deception technology for additional information on attacker behavior."
+
+- [ ] Honeypots, deception tokens, or canary files checked (or N/A if not deployed)
+- [ ] N/A documented explicitly with statement that no deception tech is deployed
+
+---
+
+## RS.AN-06/07/08 — Evidence & Records Integrity
+
+### RS.AN-06.R1 — Records Confidentiality
+"Safeguard the confidentiality and integrity of incident response records; ensure only authorized personnel have access."
+
+- [ ] AAR references how IR records were protected during and after incident
+- [ ] Access control for incident data documented (who had access, role-based)
+- [ ] Incident lead identified as accountable for records safeguarding
+
+### RS.AN-07.R1 — Evidence Preservation
+"Collect and retain evidence per evidence preservation procedures and data retention policies; consider factors including the possibility of prosecution."
+
+- [ ] Forensic images, logs, or artifacts collected and itemized
+- [ ] Chain of custody procedures referenced or attached
+- [ ] Prosecution possibility explicitly considered and documented (even if decided no)
+- [ ] Retention timelines for each evidence category documented
+
+**Practitioner trap:** Teams that never intend to prosecute still must document the decision — the standard requires consideration, not prosecution.
+
+### RS.AN-08.R1 — Scope Validation
+"Look for indicators of compromise, evidence of persistence, and other signs of an incident on both the assets known to be targeted and other potential targets."
+
+- [ ] Incident scope validated beyond initially identified systems
+- [ ] Adjacent/related assets checked for IoCs (lateral movement paths enumerated)
+- [ ] Persistence mechanism identified, confirmed removed, and removal verified
+- [ ] Scope expansion findings documented (even if scope confirmed as initially assessed)
+
+**Gap signal:** Report addresses only initial host without checking lateral spread — this is the top predictor of re-infection. Fails RS.AN-08.R1.
diff --git a/skills/nist-800-61r3-csf-mapper/SKILL.md b/skills/nist-800-61r3-csf-mapper/SKILL.md
index 113c743..f608c91 100644
--- a/skills/nist-800-61r3-csf-mapper/SKILL.md
+++ b/skills/nist-800-61r3-csf-mapper/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: nist-800-61r3-csf-mapper
-description: "Map content from any cybersecurity document (playbook, IR plan, policy, incident report) to NIST SP 800-61r3 CSF 2.0 elements. Produces a coverage table showing which Functions, Categories, and Subcategories are addressed, partially addressed, or absent. Use when asked to 'map this to NIST', 'what CSF elements does this cover', or 'tag this against 800-61r3'."
+description: "Map any cybersecurity document (IR plan, playbook, incident report, after-action report, IR policy, risk assessment) to NIST SP 800-61r3 CSF 2.0 elements. Produces a structured coverage table with Addressed/Partial/Not Found status and gap summary. Trigger phrases: 'map this to NIST', 'what CSF elements does this cover', 'tag this against 800-61r3', 'CSF 2.0 coverage', 'map to SP 800-61r3'."
 user-invocable: true
 ---
 
@@ -10,162 +10,160 @@ Map any cybersecurity document to the CSF 2.0 Community Profile defined in NIST
 
 ---
 
-## Goal
+## Mindset
 
-Produce a structured coverage table that shows which CSF 2.0 elements from the SP 800-61r3 Community Profile are:
-- **Addressed** — content explicitly satisfies the element
-- **Partial** — content touches the element but incompletely
-- **Not Found** — element is absent from the document
+1. **Document type determines expected scope — not the other way around.** An IR Plan that only covers RS elements is incomplete. A playbook that covers GV elements is bonus, not expected. Calibrate what "good coverage" means per type before scoring anything.
 
----
+2. **Partial is harder to call than Addressed or Not Found.** Partial means the document touches the element but leaves a practitioner-actionable gap — missing a threshold, missing a named role, missing a frequency. Vague language that merely mentions a concept is Not Found, not Partial.
+
+3. **Table 3 (DE/RS/RC) is always in scope; Table 2 (GV/ID/PR) is conditional.** For a single-incident playbook, mapping all 40+ Table 2 elements is noise. For an IR Program policy review, Table 2 is the primary deliverable. Know which table is load-bearing for the request.
 
-## Inputs
+4. **Subcategory ≠ Category.** Marking DE.CM as Addressed because the document mentions monitoring does not satisfy DE.CM-01, DE.CM-03, DE.CM-06, or DE.CM-09 individually. Always distinguish category-level coverage from subcategory-level evidence.
 
-- The document to analyze (paste text, provide file path, or describe it)
-- Document type (ask if not obvious): IR Plan, Playbook, Incident Report, After-Action Report, IR Policy, Risk Assessment, Other
+5. **Inference creep is the primary failure mode.** The worst mapping errors come from reading intent ("they probably have monitoring because they mention SOC") into coverage status. Only text that is explicitly present earns a status other than Not Found.
 
 ---
 
-## Workflow
+## Navigation
+
+**Use this skill when**:
+- Asked to "map this to NIST", "tag against 800-61r3", "what CSF elements does this cover"
+- Producing a gap input for `nist-800-61r3-gap-analyzer` or a score input for `nist-800-61r3-maturity-scorer`
+- Auditing whether an IR plan, playbook, policy, or report meets CSF 2.0 Community Profile expectations
+
+**Do NOT use this skill when**:
+- The request references SP 800-61r2 phases (Preparation / Detection & Analysis / etc.) — redirect to r3
+- The user wants gap prioritization or remediation — hand off to `nist-800-61r3-gap-analyzer`
+- The document is a vendor product datasheet — CSF mapping requires organizational process content
 
+**Ambiguous input decision tree**:
 ```
-1. Classify Document
-   ↓
-2. Determine Relevant CSF Scope
-   ↓
-3. Scan & Tag Content
-   ↓
-4. Build Coverage Table
-   ↓
-5. Output Summary
+Document provided?
+├── Yes → Classify type (see ../nist-800-61r3-shared/references/csf-element-registry.md §Document Type table)
+│         → Determine Table 3 vs Table 2 scope
+│         → Map and output
+└── No  → Ask: "Paste the document text, provide a file path, or describe what the document contains."
+
+Document type unclear?
+├── Has phases/steps/timeline → likely Playbook or Incident Report
+├── Has policy language/shall/must → likely IR Policy or IR Plan
+├── Has findings/recommendations/timeline of events → likely After-Action Report
+└── Has risk scores/asset lists → likely Risk Assessment
 ```
 
-### Step 1: Classify Document
+---
+
+## Philosophy
 
-Identify the document type before mapping — different types have different expected coverage profiles:
+Every element earns its status from explicit document text — not from what the document implies, what the organization probably does, or what a reasonable reader would assume. The coverage table is forensic evidence of what is written, not an assessment of what is practiced.
+
+---
 
-| Document Type        | Primary CSF Scope Expected |
-|----------------------|---------------------------|
-| IR Plan              | GV.PO, GV.RR, ID.IM-04, RS (all), RC (all) |
-| Playbook             | RS.MA, RS.AN, RS.MI, DE.AE-08, RC.RP |
-| Incident Report      | RS.MA, RS.AN, RS.CO, RS.MI, RC.RP |
-| After-Action Report  | RS.AN-03, RS.AN-06/07, RC.RP-06, ID.IM-03 |
-| IR Policy            | GV.PO, GV.RR, ID.IM-04 (§2.3 elements) |
-| Risk Assessment      | ID.RA (all), GV.RM, ID.AM |
+## NEVER
 
-### Step 2: Determine Relevant CSF Scope
+- **NEVER mark Addressed based on category-level language alone** — "we monitor our environment" satisfies DE.AE at most; it does not satisfy DE.CM-01/03/06/09 individually. Each subcategory requires subcategory-level evidence.
 
-SP 800-61r3 defines two tables:
+- **NEVER use SP 800-61r2 element IDs** — r2 used phase names (Preparation, Detection & Analysis, Containment, Eradication, Recovery, Post-Incident). r3 uses CSF 2.0 Function/Category/Subcategory IDs. Mixing them produces an unmappable output.
 
-**Table 2 — Preparation & Lessons Learned** (GV, ID, PR):
-- GV: Govern (GV.OC, GV.RM, GV.RR, GV.PO, GV.OV, GV.SC)
-- ID: Identify (ID.AM, ID.RA, ID.IM)
-- PR: Protect (PR.AA, PR.AT, PR.DS, PR.PS, PR.IR)
+- **NEVER skip Table 3 for any document type** — even an IR policy must be checked against Table 3 to confirm it references execution elements. A policy with no DE/RS/RC language is a policy gap, not a scope limitation.
 
-**Table 3 — Incident Response** (DE, RS, RC) — all elements are High priority:
-- DE: Detect (DE.CM, DE.AE)
-- RS: Respond (RS.MA, RS.AN, RS.CO, RS.MI)
-- RC: Recover (RC.RP, RC.CO)
+- **NEVER inflate Partial to avoid showing gaps** — gaps are the deliverable's primary value. A coverage table that shows 90% Addressed for a sparse document has failed its purpose; the downstream gap-analyzer depends on honest Partial/Not Found signals.
 
-For most documents, focus first on Table 3 elements (all High priority), then Table 2.
+- **NEVER load the full CSF baseline into context unless needed** — for a focused playbook, Table 3 alone suffices. Load `../nist-800-61r3-shared/references/csf-element-registry.md` only when the document type requires Table 2 coverage or when the user requests a full-scope map. Unnecessary loading wastes context.
 
-### Step 3: Scan & Tag Content
+- **NEVER omit the Evidence/Location column** — without a source reference (section number, paragraph quote, or "—" for Not Found), the table cannot be verified or challenged. An unattributed Addressed claim is indistinguishable from inference creep.
 
-Read the document and for each meaningful section/paragraph:
-1. Identify which CSF Subcategory it most closely satisfies
-2. Assess completeness: does it fully satisfy the element or only partially?
-3. Note the source location (section number, paragraph, or quote)
+---
 
-**Key semantic anchors** — phrases that signal specific CSF elements:
+## When Things Go Wrong
 
-| If document mentions...                        | Maps to          |
-|------------------------------------------------|------------------|
-| Continuous monitoring, anomaly detection       | DE.CM            |
-| Incident declaration criteria, threshold       | DE.AE-08         |
-| Triage, severity assessment, validate          | RS.MA-02         |
-| Incident categorization, incident type         | RS.MA-03         |
-| Escalation, elevation, resource increase       | RS.MA-04         |
-| Recovery initiation criteria                   | RS.MA-05         |
-| Root cause analysis, event sequence            | RS.AN-03         |
-| Evidence collection, chain of custody          | RS.AN-07         |
-| Incident magnitude, IoC scope                  | RS.AN-08         |
-| Stakeholder notification, breach notification  | RS.CO-02         |
-| Information sharing, ISAC, threat intel share  | RS.CO-03         |
-| Containment, isolate, quarantine               | RS.MI-01         |
-| Eradication, persistence removal, patch        | RS.MI-02         |
-| Recovery plan execution, restore operations    | RC.RP-01/02      |
-| Backup integrity, clean restore verification   | RC.RP-03/05      |
-| After-action report, lessons learned           | RC.RP-06, ID.IM-03 |
-| IR policy, policy elements                     | GV.PO, ID.IM-04  |
-| Roles, responsibilities, authority             | GV.RR-02         |
+| Situation | Likely Cause | Recovery |
+|-----------|--------------|----------|
+| Every element comes back Not Found | Document is too abstract (policy without procedures, or architecture diagram without process language) | Note document type mismatch; map what exists; flag that a companion procedures document is needed |
+| Subcategories under same category disagree (one Addressed, siblings Not Found) | Normal and correct — subcategories are independent | Keep the split status; do not normalize upward to the category |
+| User disputes a Partial rating | Criteria disagreement on what "fully satisfies" means | Quote the specific gap (missing threshold, missing role name, missing frequency) that prevents Addressed |
+| Document covers r2 phases, not r3 elements | Legacy IR plan written pre-2024 | Note the r2 structure; attempt best-effort phase-to-subcategory crosswalk; flag document needs update to r3 structure |
+| Document type is a hybrid (e.g., playbook embedded inside IR plan) | Compound document | Split into logical sections; map each section against the expected scope for its type |
 
-### Step 4: Build Coverage Table
+---
 
-Output a table with these columns:
+## Workflow
 
-| CSF ID | Priority | Description (brief) | Coverage | Evidence/Location |
-|--------|----------|---------------------|----------|-------------------|
+### Step 1 — Classify Document
 
-Coverage values: `Addressed` / `Partial` / `Not Found`
+Identify document type. If unclear, use the Navigation decision tree above.
+For expected scope per type, read: `../nist-800-61r3-shared/references/csf-element-registry.md` → §Document Type → Expected CSF Scope table.
 
-Group by Function: DE → RS → RC (Table 3 first), then GV → ID → PR (Table 2).
+### Step 2 — Set Scope
 
-Only include elements that are relevant to the document type (skip Low-priority Table 2 elements unless the document explicitly covers them).
+- **Playbook / Incident Report / After-Action Report**: Table 3 (DE/RS/RC) only, unless document explicitly addresses governance/prep content.
+- **IR Plan / IR Policy / Risk Assessment**: Load full baseline from `../nist-800-61r3-shared/references/csf-element-registry.md`.
+- Default: always start with Table 3 (all High priority).
 
-### Step 5: Output Summary
+### Step 3 — Scan and Tag
 
-After the table, provide:
-1. **Coverage counts**: X/Y High-priority elements addressed
-2. **Top gaps**: The 3–5 most significant missing elements by priority
-3. **Document type assessment**: Does the coverage match what's expected for this document type?
+For each document section, identify the most specific subcategory satisfied. Apply these semantic anchors:
 
----
+| Document language | Maps to |
+|-------------------|---------|
+| Anomaly detection, continuous monitoring | DE.CM |
+| Incident declaration criteria, threshold to declare | DE.AE-08 |
+| Triage, validate, severity scoring | RS.MA-02 |
+| Incident categorization, incident type classification | RS.MA-03 |
+| Escalation criteria, elevation, resource surge | RS.MA-04 |
+| Criteria to begin recovery | RS.MA-05 |
+| Root cause analysis, attack chain reconstruction | RS.AN-03 |
+| Evidence log, chain of custody | RS.AN-06/07 |
+| Incident scope, blast radius, IoC magnitude | RS.AN-08 |
+| Stakeholder notification, breach notification | RS.CO-02 |
+| ISAC sharing, threat intel sharing, external notification | RS.CO-03 |
+| Containment, isolation, quarantine | RS.MI-01 |
+| Eradication, persistence removal, patch | RS.MI-02 |
+| Recovery execution, restore operations | RC.RP-01/02 |
+| Backup integrity check, clean restore verification | RC.RP-03/05 |
+| End-of-incident declaration, lessons learned, after-action | RC.RP-06, ID.IM-03 |
+| IR policy, policy elements, policy review cycle | GV.PO-01/02 |
+| Roles, responsibilities, RACI, authority | GV.RR-02 |
 
-## Output Format
+### Step 4 — Build Coverage Table
 
 ```
 === CSF 2.0 Coverage Map — [Document Name/Type] ===
 Standard: NIST SP 800-61r3 (April 2025)
-Document Type: [classified type]
+Document Type: [type]
 
 TABLE 3 — INCIDENT RESPONSE (All High Priority)
-────────────────────────────────────────────────────────────────────────
-CSF ID      | Description                              | Coverage    | Location
-────────────────────────────────────────────────────────────────────────
-DE.CM       | Continuous monitoring of assets          | Partial     | §2.1
-DE.CM-01    | Network monitoring                       | Addressed   | §2.1.a
-DE.CM-09    | Computing HW/SW/runtime monitoring       | Not Found   | —
-DE.AE       | Adverse event analysis                   | Partial     | —
-DE.AE-02    | Events analyzed to understand activity   | Addressed   | §3.2
-DE.AE-08    | Incident declaration criteria            | Not Found   | —
-RS.MA       | Incident management                      | Partial     | §4
-RS.MA-01    | IR plan executed on incident declaration | Addressed   | §4.1
-RS.MA-02    | Incidents triaged and validated          | Partial     | §4.2 (no severity criteria)
+─────────────────────────────────────────────────────────────
+CSF ID     | Description (brief)               | Coverage  | Location
+─────────────────────────────────────────────────────────────
+DE.CM      | Asset monitoring for anomalies     | Partial   | §2.1
+DE.CM-01   | Network monitoring                 | Addressed | §2.1.a
+DE.CM-09   | Computing HW/SW/runtime monitoring | Not Found | —
+DE.AE-08   | Incident declaration criteria      | Not Found | —
+RS.MA-02   | Triage and validation              | Partial   | §4.2 (no severity criteria)
 ...
 
-TABLE 2 — PREPARATION & LESSONS LEARNED (Mixed Priority)
-[only elements relevant to document type]
-...
+TABLE 2 — PREPARATION (load only if in scope)
+[load ../nist-800-61r3-shared/references/csf-element-registry.md for full element list]
 
 SUMMARY
 ────────
-High-priority (Table 3) coverage: 9/23 elements addressed, 5 partial, 9 not found
-Medium-priority coverage: 4/12 addressed
-Top gaps: DE.AE-08 (incident criteria), RS.MA-02 (triage), RS.AN-03 (root cause), RS.CO-02 (notifications), RC.RP-06 (after-action)
-Assessment: Coverage is below expected for an IR Plan — RS.CO and RC sections are largely absent.
+Table 3 coverage: X/23 Addressed, Y Partial, Z Not Found
+Top gaps: [3–5 highest-priority Not Found elements]
+Assessment: [does coverage match expected profile for this document type?]
 ```
 
----
-
-## Deliverable
+### Step 5 — Flag Downstream
 
-A coverage table + summary that can be handed directly to `nist-800-61r3-gap-analyzer` for prioritized gap analysis or to `nist-800-61r3-maturity-scorer` for scoring.
+State whether output is ready for:
+- `nist-800-61r3-gap-analyzer` (prioritized remediation)
+- `nist-800-61r3-maturity-scorer` (maturity scoring)
 
 ---
 
-## NEVER
+## References
 
-- **NEVER invent coverage** — only mark Addressed if the document explicitly addresses the element
-- **NEVER skip Table 3** — all DE/RS/RC elements are High priority; always include them
-- **NEVER use SP 800-61r2 element IDs** — this skill maps to r3/CSF 2.0 IDs only (no legacy "Preparation → Detection & Analysis" phases)
-- **NEVER map vague language to specific elements** — "we monitor our systems" alone does not satisfy DE.CM-01 through DE.CM-09 individually
+Load `../nist-800-61r3-shared/references/csf-element-registry.md` when:
+- Full Table 2 (GV/ID/PR) subcategory list is needed
+- Document type is IR Plan, IR Policy, or Risk Assessment
+- User requests a complete element-by-element map
diff --git a/skills/nist-800-61r3-csf-mapper/references/csf-element-baseline.md b/skills/nist-800-61r3-csf-mapper/references/csf-element-baseline.md
new file mode 100644
index 0000000..51a0cab
--- /dev/null
+++ b/skills/nist-800-61r3-csf-mapper/references/csf-element-baseline.md
@@ -0,0 +1,12 @@
+# CSF Element Baseline — MOVED
+
+The canonical CSF element taxonomy has been consolidated into the shared reference file:
+
+**See: `../../../nist-800-61r3-shared/references/csf-element-registry.md`**
+
+All sections previously in this file are present in the shared registry:
+- TABLE 3 — Incident Response elements (all HIGH priority)
+- TABLE 2 — Preparation & Lessons Learned elements (MEDIUM/LOW priority)
+- Document Type → Expected CSF Scope table
+- Severity Mapping and Dependency Graph (used by gap-analyzer)
+- Element weights (used by maturity-scorer)
diff --git a/skills/nist-800-61r3-gap-analyzer/SKILL.md b/skills/nist-800-61r3-gap-analyzer/SKILL.md
index 1756cbb..a350cec 100644
--- a/skills/nist-800-61r3-gap-analyzer/SKILL.md
+++ b/skills/nist-800-61r3-gap-analyzer/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: nist-800-61r3-gap-analyzer
-description: "Identify prioritized gaps in any cybersecurity document against the NIST SP 800-61r3 CSF 2.0 Community Profile. Accepts raw document content or a coverage map from nist-800-61r3-csf-mapper and outputs a priority-ranked gap list (High/Medium/Low) with specific SP 800-61r3 citation anchors. Use when asked to 'find gaps in this IR plan', 'what's missing from this playbook', or 'gap analysis against 800-61r3'."
+description: "Identify and prioritize gaps in any cybersecurity document against the NIST SP 800-61r3 CSF 2.0 Community Profile. Accepts raw document content or a coverage map from nist-800-61r3-csf-mapper and outputs a priority-ranked gap list (High/Medium/Low) with specific SP 800-61r3 citation anchors. Use when asked to 'find gaps in this IR plan', 'what's missing from this playbook', or 'gap analysis against 800-61r3'. Trigger keywords: gap analysis, missing coverage, IR plan review, NIST compliance gaps, 800-61r3 gaps."
 user-invocable: true
 ---
 
@@ -10,17 +10,68 @@ Identify and prioritize gaps in any cybersecurity document against the CSF 2.0 C
 
 ---
 
-## Goal
+## Mindset
 
-Produce a prioritized gap list — sorted High → Medium → Low — for every CSF 2.0 element from SP 800-61r3 that is missing or incomplete in the reviewed document. Each gap includes the SP 800-61r3 citation, the specific requirement not met, and a remediation suggestion.
+Expert gap analysts do not mechanically check boxes — they reason about *consequence chains* and *operational reality*:
+
+1. **Think in dependency chains, not isolated elements.** A gap in DE.AE-08 (incident declaration) is not just one missing item — it propagates as implied failure risk to RS.MA-01 through RS.MA-04. Always trace upstream: one missing foundational element can account for three or four apparent gaps. Identify the root gap, note the implied gaps, and remediate the root first.
+
+2. **Score presence by operational evidence, not prose claims.** A policy that says "we monitor our networks" does not satisfy DE.CM-01 — it must describe *what* is monitored, *how*, and at what frequency. Treat unsupported assertions as Partial at best. The question is not "does this mention monitoring?" but "can a new analyst execute this tomorrow with only this document?"
+
+3. **Prioritize remediation by unblocking downstream, not by severity label alone.** When two gaps both score Critical, order the one that unblocks more downstream elements first. A triage procedure (RS.MA-02) unblocks categorization (RS.MA-03), escalation (RS.MA-04), and every downstream analysis step — it ranks above RS.AN-06 even if both are Critical.
+
+4. **Treat "Partial" as a signal, not a conclusion.** When something is Partial, ask: what specifically is present and what is absent? A good gap entry for a Partial element surfaces the specific sub-requirement (R/C/N item) that is missing, not a generic "needs improvement" note. This transforms the output from an audit finding into an actionable work item.
+
+5. **Match remediation specificity to document type.** A tactical runbook gap gets a concrete, one-step remediation ("add urgency SLA table"). A program-level policy gap gets a structural fix ("create a formal supplier IR coordination annex"). Never suggest an enterprise framework overhaul for a single playbook's gap.
 
 ---
 
-## Inputs
+## Navigation
 
-- The document to analyze (text, file path, or description), OR
-- A coverage map previously produced by `nist-800-61r3-csf-mapper`
-- Document type (if not already classified)
+**Use this skill when:**
+- Performing a gap analysis of an IR plan, playbook, runbook, or cybersecurity policy
+- Reviewing an output from `nist-800-61r3-csf-mapper` and need prioritized action items
+- An auditor or customer asks what the document is missing against 800-61r3
+- Building a program improvement backlog from a compliance baseline
+
+**Do NOT use this skill when:**
+- The goal is to *map* what the document covers (use `nist-800-61r3-csf-mapper` first)
+- The goal is to score overall maturity (use `nist-800-61r3-maturity-scorer` after gap analysis)
+- The document is scoped to a single technology or tool — narrow scope means many Table 3 elements structurally cannot apply; note them as Out of Scope rather than gaps
+
+**Document type decision tree:**
+
+```
+What kind of document is being analyzed?
+│
+├── Full IR Program / Policy
+│     → Analyze ALL three priority tiers (High + Medium + Low)
+│     → Expect 15-30+ gaps; group by functional area
+│
+├── Incident Playbook / Runbook (threat-specific or scenario-specific)
+│     → Focus on HIGH priority (Table 3) only
+│     → Many GV/ID/PR elements are Out of Scope — note, don't flag
+│     → Typical output: 3-10 gaps
+│
+├── Subset / Annex / Supporting Procedure
+│     → Identify the parent document scope first
+│     → Flag gaps relative to what THIS document is responsible for
+│     → Note: "element owned by parent document" ≠ gap
+│
+└── Unknown / Composite
+      → Ask user for scope clarification before classifying gaps
+      → If unavailable, default to playbook-tier analysis and note assumption
+```
+
+**Edge case: the document explicitly disclaims scope**
+
+If the document says "this playbook covers detection and containment only," then RC.RP-* elements are Out of Scope. Still note them as "not covered by this document — verify in parent program," but do not classify them as gaps for severity scoring.
+
+---
+
+## Philosophy
+
+A gap analysis is a *consequence map*, not a compliance checklist. Every finding must answer: if this gap is never fixed, what breaks during an actual incident? Findings that cannot answer that question are noise, not signal.
 
 ---
 
@@ -29,11 +80,11 @@ Produce a prioritized gap list — sorted High → Medium → Low — for every
 ```
 1. Obtain Coverage Map (run mapper if not already done)
    ↓
-2. Load SP 800-61r3 Element Baseline
+2. Load SP 800-61r3 Element Baseline (../nist-800-61r3-shared/references/csf-element-registry.md)
    ↓
-3. Identify Gaps
+3. Identify Gaps — check each element, classify Missing vs. Partial
    ↓
-4. Classify & Prioritize
+4. Classify & Prioritize — apply severity and dependency-chain reasoning
    ↓
 5. Generate Gap Report
 ```
@@ -42,103 +93,24 @@ Produce a prioritized gap list — sorted High → Medium → Low — for every
 
 If a coverage map exists from `nist-800-61r3-csf-mapper`, use it directly. Otherwise, perform inline mapping against the document before gap analysis.
 
-### Step 2: SP 800-61r3 Element Baseline
-
-The complete element set from SP 800-61r3, organized by priority:
-
-**HIGH PRIORITY — Table 3 (Incident Response)**
-
-| CSF ID     | Description |
-|------------|-------------|
-| DE         | Detect Function |
-| DE.CM      | Continuous Monitoring |
-| DE.CM-01   | Networks/network services monitored |
-| DE.CM-02   | Physical environment monitored |
-| DE.CM-03   | Personnel activity and technology usage monitored |
-| DE.CM-06   | External service provider activities monitored |
-| DE.CM-09   | Computing HW/SW/runtime/data monitored |
-| DE.AE      | Adverse Event Analysis |
-| DE.AE-02   | Events analyzed to understand associated activities |
-| DE.AE-03   | Information correlated from multiple sources |
-| DE.AE-04   | Estimated impact and scope of adverse events understood |
-| DE.AE-06   | Information on adverse events provided to authorized staff/tools |
-| DE.AE-07   | CTI and contextual information integrated into analysis |
-| DE.AE-08   | Incidents declared when adverse events meet incident criteria |
-| RS         | Respond Function |
-| RS.MA      | Incident Management |
-| RS.MA-01   | IR plan executed on incident declaration |
-| RS.MA-02   | Incident reports triaged and validated |
-| RS.MA-03   | Incidents categorized and prioritized |
-| RS.MA-04   | Incidents escalated or elevated as needed |
-| RS.MA-05   | Recovery initiation criteria applied |
-| RS.AN      | Incident Analysis |
-| RS.AN-03   | Root cause and event sequence established |
-| RS.AN-06   | Investigation actions recorded with integrity |
-| RS.AN-07   | Incident data/metadata collected with integrity preserved |
-| RS.AN-08   | Incident magnitude estimated and validated |
-| RS.CO      | Incident Reporting and Communication |
-| RS.CO-02   | Internal and external stakeholders notified |
-| RS.CO-03   | Information shared with designated stakeholders |
-| RS.MI      | Incident Mitigation |
-| RS.MI-01   | Incidents contained |
-| RS.MI-02   | Incidents eradicated |
-| RC         | Recover Function |
-| RC.RP      | Incident Recovery Plan Execution |
-| RC.RP-01   | Recovery portion of IR plan executed |
-| RC.RP-02   | Recovery actions selected, scoped, prioritized, performed |
-| RC.RP-03   | Backup/restoration asset integrity verified |
-| RC.RP-04   | Critical mission functions considered post-incident |
-| RC.RP-05   | Restored asset integrity verified, normal status confirmed |
-| RC.RP-06   | End of incident recovery declared, documentation completed |
-| RC.CO      | Incident Recovery Communication |
-| RC.CO-03   | Recovery activities/progress communicated to stakeholders |
-| RC.CO-04   | Public updates shared using approved methods/messaging |
-
-**MEDIUM PRIORITY — Table 2 (selected high-value elements)**
-
-| CSF ID     | Description |
-|------------|-------------|
-| GV.OC-03   | Legal, regulatory, contractual requirements understood |
-| GV.RR      | Roles, responsibilities, authorities established |
-| GV.RR-02   | Roles/responsibilities documented and enforced |
-| GV.PO      | Organizational cybersecurity policy established |
-| GV.OV-01   | Risk management strategy outcomes reviewed |
-| GV.SC-08   | Relevant suppliers included in IR planning/response |
-| ID.AM-01   | HW inventories maintained |
-| ID.AM-02   | SW/services inventories maintained |
-| ID.RA-02   | CTI received from sharing forums/sources |
-| ID.RA-05   | Threats/vulns/likelihoods used to inform risk response |
-| ID.RA-06   | Risk responses chosen, tracked, communicated |
-| ID.IM-01   | Improvements identified from evaluations |
-| ID.IM-02   | Improvements from security tests and exercises |
-| ID.IM-03   | Improvements from operational execution |
-| ID.IM-04   | IR plans established, maintained, improved |
-| PR.DS-11   | Backups created, protected, maintained, tested |
-| PR.PS-04   | Log records generated for continuous monitoring |
-
-**LOW PRIORITY — Table 2 (informational)**
-
-All remaining GV, ID, PR elements not listed above.
+### Step 2: Load Element Baseline
+
+Load `../nist-800-61r3-shared/references/csf-element-registry.md` for the full element set, severity mapping, and dependency graph. Do not embed the full table inline in your response — reference it by CSF ID.
 
 ### Step 3: Identify Gaps
 
-For each element in the baseline, determine its status in the coverage map:
-- `Not Found` → Gap
-- `Partial` → Partial Gap (flag as gap with lower severity modifier)
+For each element in the baseline (scoped to document type per Navigation):
+- `Not Found` → Gap (Missing)
+- `Partial` → Gap (Incomplete) — identify the specific absent sub-requirement
 - `Addressed` → No gap
+- `Out of Scope` → Note but do not count in severity totals
 
 ### Step 4: Classify & Prioritize
 
-For each gap, assign:
+Assign severity using the Severity Mapping in `../nist-800-61r3-shared/references/csf-element-registry.md`.
 
-**Severity** (for High-priority elements):
-- **Critical**: Element is absent and directly impacts incident handling (RS.MA-02, DE.AE-08, RS.AN-03, RS.MI-01, RC.RP-06)
-- **Significant**: Element is absent but has workarounds or is partially covered
-- **Minor**: Partial coverage that needs strengthening
-
-**Gap type**:
-- **Missing**: No evidence of the element
-- **Incomplete**: Element addressed but key sub-requirements absent
+Apply the dependency-chain rule: when remediating Critical gaps, sequence the one
+that unblocks the most downstream elements first (see dependency graph in reference).
 
 ### Step 5: Generate Gap Report
 
@@ -151,6 +123,7 @@ For each gap, assign:
 Document: [name/type]
 Standard: NIST SP 800-61r3 (April 2025)
 Analysis date: [date]
+Scope tier: [Full Program | Playbook/Runbook | Subset | Unknown]
 
 ━━━ CRITICAL GAPS (High Priority — Table 3) ━━━
 
@@ -161,6 +134,7 @@ Analysis date: [date]
      whether an incident should be declared." (DE.AE-08.R1)
      Impact: Without defined declaration criteria, incident response may start
      too late or on false positives — undermining RS.MA-02 triage.
+     Downstream risk: RS.MA-01 through RS.MA-04 all depend on a declared incident.
      Remediation: Define specific observable thresholds (e.g., confirmed C2
      beacon, lateral movement detected, data staged for exfil) that constitute
      an incident declaration trigger.
@@ -170,8 +144,8 @@ Analysis date: [date]
      Requirement: "Perform a preliminary review of a new incident report to
      verify that a cybersecurity incident has occurred, then estimate the
      severity and urgency needed to respond to it." (RS.MA-02.R1)
-     Impact: Severity tiers exist but no time-based urgency criteria — teams
-     cannot determine response SLAs.
+     Missing sub-requirement: Urgency/SLA criteria absent; only severity tiers present.
+     Impact: Teams cannot determine response SLAs — P1/P2/P3 differentiation impossible.
      Remediation: Add urgency criteria alongside severity (e.g., P1 = respond
      within 1hr, P2 = 4hr, P3 = 24hr).
 
@@ -187,32 +161,61 @@ Analysis date: [date]
 [M1] ID.IM-04 — IR Plans Maintained and Improved
      ...
 
+━━━ OUT OF SCOPE (noted, not scored) ━━━
+
+RC.CO-04 — Public updates: document scope excludes public communications.
+
 ━━━ SUMMARY ━━━
 
 Critical gaps:    3  (must address before next incident)
 Significant gaps: 5  (address within 30 days)
 Minor gaps:       4  (address in next program review cycle)
 Medium gaps:      6
-Low gaps:         2
+Low gaps:         2  (if full program scope)
+Out of scope:     3
 
 Highest-impact single fix: Adding DE.AE-08 incident declaration criteria would
-immediately improve RS.MA-02, RS.MA-03, and RS.MA-04 effectiveness.
+immediately resolve implied gaps in RS.MA-01 through RS.MA-04 (5 items, one fix).
 ```
 
 ---
 
+## NEVER
+
+- **NEVER mark a gap as Critical unless it is in Table 3 AND directly impacts incident handling** — Critical inflation devalues the finding; a stakeholder who sees 20 Critical items stops prioritizing any of them. Reserve Critical for elements that *gate* other elements.
+- **NEVER suggest remediation that adds bureaucracy** — "implement a governance committee to oversee..." is compliance theater. Remediation must be executable by a practitioner, not a PMO. Write the specific artifact or procedure they need to create.
+- **NEVER conflate Missing with Partial** — partial coverage is operational progress and should be acknowledged. A "Partial" gap entry must name what *is* present, then what specific sub-requirement is absent. Collapsing both to "gap" erases useful signal.
+- **NEVER omit citation anchors** — every finding must cite the specific CSF ID and R/C/N item from SP 800-61r3. An uncited finding cannot be traced to the standard and fails audit defensibility.
+- **NEVER flag elements as gaps when they are explicitly out of scope for the document type** — a threat-specific playbook is not responsible for GV.PO (organizational policy). Mark Out of Scope and explain why; inflating scope inflates the gap count and undermines trust in the analysis.
+- **NEVER skip the dependency-chain analysis for Critical gaps** — listing Critical gaps in arbitrary order misguides remediation sequencing. Always identify which fix unblocks the most downstream elements and say so explicitly.
+- **NEVER treat an assertion in prose as operational evidence** — "We perform continuous monitoring" does not satisfy DE.CM unless the document describes what is monitored, by what mechanism, and at what cadence. Challenge unsupported prose claims.
+
+---
+
+## When Things Go Wrong
+
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| Gap count is suspiciously high (30+ Critical gaps in a short runbook) | Scope was not narrowed for document type — Table 2/3 applied wholesale to a narrow tactical doc | Re-run with Playbook/Runbook scope; reclassify GV/ID/PR elements as Out of Scope |
+| Mapper output marks elements Addressed but analyst suspects false positives | Mapper may have matched on keyword proximity, not operational depth | Re-verify Addressed elements using the "operational evidence" test: can a new analyst execute this from the document alone? Downgrade to Partial if not. |
+| User disputes a Critical finding — "we handle this informally" | Informal practices are not documentable evidence; they disappear when staff turns over | Note the informal practice in the finding; remediation is to document the informal process, not build a new one from scratch |
+| Two Critical gaps rank equally — user wants one to fix first | Dependency chain not yet applied | Check `../nist-800-61r3-shared/references/csf-element-registry.md` dependency graph; the upstream element in the chain goes first |
+
+---
+
+## Reference Loading Guide
+
+| Situation | Load |
+|-----------|------|
+| Performing gap analysis (always) | `../nist-800-61r3-shared/references/csf-element-registry.md` — full element set, severity mapping, dependency graph |
+| Scope is uncertain | Re-read Navigation decision tree above before loading baseline |
+| Dependency sequencing question | See "Element Dependency Graph" section in `../nist-800-61r3-shared/references/csf-element-registry.md` |
+
+---
+
 ## Deliverable
 
 A prioritized, citation-anchored gap report suitable for:
 - Direct input to `nist-800-61r3-maturity-scorer`
 - Action items for IR program improvement backlog
 - Audit findings documentation
-
----
-
-## NEVER
-
-- **NEVER mark a gap as Critical unless it is in Table 3 AND directly impacts incident handling**
-- **NEVER suggest remediation that adds bureaucracy** — all suggestions should be practical, not compliance theater
-- **NEVER conflate Missing with Partial** — partial coverage is progress; acknowledge what exists
-- **NEVER omit citation anchors** — every finding must cite the specific CSF ID and R/C/N item from SP 800-61r3
diff --git a/skills/nist-800-61r3-gap-analyzer/references/csf-element-baseline.md b/skills/nist-800-61r3-gap-analyzer/references/csf-element-baseline.md
new file mode 100644
index 0000000..cd76b46
--- /dev/null
+++ b/skills/nist-800-61r3-gap-analyzer/references/csf-element-baseline.md
@@ -0,0 +1,12 @@
+# CSF Element Baseline — MOVED
+
+The canonical CSF element taxonomy (element list, severity mapping, dependency graph) has been
+consolidated into the shared reference file:
+
+**See: `../../../nist-800-61r3-shared/references/csf-element-registry.md`**
+
+All sections previously in this file are present in the shared registry:
+- HIGH/MEDIUM/LOW element tables with weights
+- Severity Mapping for Table 3 gaps (Critical / Significant / Minor)
+- Element Dependency Graph
+- Document Type → Expected CSF Scope table
diff --git a/skills/nist-800-61r3-incident-coordinator/SKILL.md b/skills/nist-800-61r3-incident-coordinator/SKILL.md
new file mode 100644
index 0000000..667f1e3
--- /dev/null
+++ b/skills/nist-800-61r3-incident-coordinator/SKILL.md
@@ -0,0 +1,345 @@
+---
+name: nist-800-61r3-incident-coordinator
+description: "Coordinate live, active security incidents using the NIST SP 800-61r3 (April 2025) framework. Guides triage, PICERL phase management, stakeholder notification timing, evidence preservation, escalation decisions, and containment/eradication sequencing during an ongoing incident. Use when asked to 'help coordinate this incident', 'walk me through incident response', 'we have an active incident', 'guide containment steps', or 'should we escalate this'. NOT for post-incident review — use nist-800-61r3-after-action-reviewer for that."
+user-invocable: true
+---
+
+# NIST SP 800-61r3 Incident Coordinator
+
+Guide active, live security incident response using NIST SP 800-61r3 (April 2025) framework structures — from initial triage through declared recovery, including phase management, escalation decisions, evidence preservation, and stakeholder communication timing.
+
+---
+
+## Mindset
+
+1. **Stop the bleeding first; understand the wound second.** Containment precedes root cause analysis. Teams that pause containment to conduct thorough forensics while the attacker still has an active foothold consistently achieve worse outcomes than teams that limit the attacker's options first. You can investigate a preserved, contained environment. You cannot un-exfiltrate data while you were still doing log analysis.
+
+2. **Parallel evidence preservation is a real tension, not a luxury.** NIST SP 800-61r3 RS.AN-06 requires preserving evidence, but containment actions (isolation, account revocation, firewall blocks) inevitably alter or terminate forensic artifacts. The sequencing decision — preserve memory/logs before or after isolation — must be made explicitly and documented, not made accidentally by whoever acts fastest.
+
+3. **PICERL phases are states, not a waterfall.** Preparation, Identification, Containment, Eradication, Recovery, and Lessons Learned overlap and recurse in real incidents. An organization can be in Containment on one lateral movement vector while simultaneously in Identification on a newly discovered second foothold. Treat phases as labels for current dominant activity, not as sequential gates.
+
+4. **Communication cadence is a decision, not a default.** Too frequent updates create noise that desensitizes stakeholders and consumes responder time. Too infrequent updates create fear, generate inbound calls that disrupt response, and cause leadership to lose confidence. The right cadence is calibrated to incident velocity: fast-moving incidents need hourly executive pulses; stable, contained incidents need 4-hour updates. State your chosen cadence and the reasoning explicitly.
+
+5. **Escalation to CIRT is a decision point, not a failure.** Internal handlers should not resist CIRT escalation to protect team reputation. The escalation criteria are objective: complexity exceeds team capability, a second concurrent incident is active, legal/regulatory exposure is confirmed, the incident has crossed into a major incident threshold per RS.MA-04. When any criterion is met, escalate — delay is the actual failure.
+
+6. **Legal hold begins at the moment of suspected regulatory exposure, not at confirmed breach.** Organizations routinely wait for incident confirmation before engaging legal counsel. This is backwards: the trigger for legal hold is the moment there is reasonable belief that regulated data (PII, PHI, financial records) *may* have been accessed. RS.AN-07 evidence retention requirements and legal hold are parallel processes, not sequential ones. If legal hold is missed, no amount of good IR work recovers it.
+
+7. **Every containment decision has a business cost and a security cost — document both.** Isolating a production system stops attacker lateral movement but also stops business operations. The correct decision requires knowing both costs and documenting the trade-off explicitly. An undocumented isolation that causes business harm becomes a liability; a documented, authorized isolation becomes a defensible decision.
+
+---
+
+## Navigation
+
+### When to use this skill
+- An active, declared security incident is in progress
+- User needs guidance on sequencing PICERL phases for a specific incident type
+- Triage is complete and user is deciding between Containment strategies
+- Escalation decision needs a structured framework
+- Stakeholder communication timing is in question during an active event
+- Evidence preservation vs. containment trade-off requires a documented decision
+- Legal hold trigger conditions need to be assessed mid-incident
+
+### When NOT to use this skill
+- The incident is over — use `nist-800-61r3-after-action-reviewer` for post-incident review
+- User wants to check whether their IR policy is complete — use `nist-800-61r3-policy-reviewer`
+- User wants to find gaps in a playbook or procedure document — use `nist-800-61r3-gap-analyzer`
+- User is running a tabletop exercise — apply exercise facilitation, not live incident coordination
+- No incident has been declared — if in pre-declaration ambiguity, guide through DE.AE-08 criteria first
+
+### Pre-flight: Is this actually a declared incident?
+
+```
+Has an incident been declared per DE.AE-08 criteria?
+│
+├── No — observable activity only, no declaration yet
+│     → Run through DE.AE-08 decision criteria:
+│         Is there confirmed malicious activity OR high-confidence indicator?
+│         Does the activity meet any severity threshold in your tier definitions?
+│         ↳ If yes → guide declaration (RS.MA-01) then continue coordination
+│         ↳ If no → document as potential incident, set monitoring checkpoint,
+│                   do not initiate containment actions prematurely
+│
+└── Yes — incident declared, severity assigned
+      → Confirm current PICERL phase
+      → Confirm whether major incident threshold (RS.MA-04) applies
+      → Proceed with phase-appropriate workflow below
+```
+
+---
+
+## Philosophy
+
+Active incident response is not documentation work — it is a time-critical sequence of decisions made under uncertainty with incomplete information. The coordinator's job is not to know all the answers but to ensure the right questions are asked in the right order, that decisions are made explicitly rather than by default, and that the organization does not create new problems while solving the original one. Every minute of an active incident, the attacker is either moving or waiting; your decisions are real-time responses to their actions.
+
+---
+
+## PICERL Phase Workflows
+
+### Phase: Identification (RS.MA-02 / RS.MA-03)
+
+**Objective:** Validate the incident is real, assign severity, establish initial scope.
+
+```
+1. Validate the alert or report (RS.MA-02.R1)
+   └── Confirm: true positive vs. false positive
+   └── Estimate scope: single host, subnet, domain, or cross-environment?
+
+2. Assign severity tier (RS.MA-03.R1)
+   └── P1 — critical: data exfiltration confirmed or imminent, ransomware active,
+              major-incident criteria met → escalation review required (RS.MA-04)
+   └── P2 — high: confirmed malicious access, scope uncertain, containment needed
+   └── P3 — medium: suspicious activity, unconfirmed, monitoring + analysis mode
+
+3. Apply major incident criteria (RS.MA-04) — escalate if:
+   └── Estimated business impact exceeds defined threshold
+   └── Regulatory data exposure is plausible
+   └── Multiple systems or environments involved
+   └── Team capability is outpaced by incident complexity
+   └── A second concurrent incident is active
+
+4. Assign and confirm Incident Handler authority (GV.RR-02)
+   └── Who is authorized to: isolate hosts? revoke accounts? engage legal?
+   └── If authority is unclear, STOP and get explicit authorization before acting
+```
+
+### Phase: Containment (RS.MA-05)
+
+**Objective:** Limit attacker's ability to move, escalate, or exfiltrate further.
+
+```
+Evidence Preservation Decision (RS.AN-06) — make this BEFORE containment:
+├── Can memory forensics be captured in <15 minutes?
+│   └── Yes → capture memory, running process list, active network connections FIRST
+│   └── No → proceed to containment; document decision and rationale
+├── Are logs being forwarded to SIEM/immutable store?
+│   └── Yes → proceed to containment
+│   └── No → snapshot/export logs from affected systems BEFORE isolation
+└── Document the preserve/contain sequencing decision explicitly
+
+Containment Actions (sequence by attacker opportunity cost):
+1. Network isolation (highest priority if lateral movement is active)
+2. Credential revocation for compromised accounts
+3. C2 blocking at perimeter (coordinate with network team)
+4. Privileged access review — disable accounts with admin rights near affected systems
+5. Cloud resource isolation if cloud footprint is involved
+
+Containment verification — do NOT proceed to Eradication until:
+└── No new lateral movement events in [defined time window]
+└── All known ingress paths are blocked
+└── Attacker's known access mechanisms are severed
+└── Network/endpoint telemetry confirms quiescence
+```
+
+### Phase: Eradication (RS.MA-05.R2)
+
+**Objective:** Remove all attacker artifacts from confirmed-contained environment.
+
+```
+PREREQUISITE CHECK — eradication starts only after:
+[ ] Containment verified (no active attacker movement)
+[ ] Scope validated (RS.AN-08 — adjacent systems checked)
+[ ] Evidence preserved for forensic use
+
+Eradication sequence:
+1. Remove all identified malware, tools, and persistence mechanisms
+2. Patch or mitigate the exploited vulnerability
+3. Rotate ALL credentials in attacker's confirmed or suspected reach
+4. Rebuild compromised hosts if integrity cannot be verified (preferred over "clean")
+5. Validate eradication completeness with IoC scan before proceeding
+```
+
+### Phase: Recovery (RC.RP-03 / RC.RP-04)
+
+**Objective:** Restore affected systems to verified, operational state.
+
+```
+Recovery sequencing (RC.RP-03):
+1. Restore from known-good backup (verify backup integrity first)
+2. OR rebuild from golden image + apply required configs
+3. Pre-production IoC scan of all systems to be restored (RC.RP-05.R1)
+4. System owner authorization before returning to production (RC.RP-04.R2)
+5. Monitor restored systems with elevated telemetry for [defined window]
+
+Recovery completion criteria (RC.RP-06 prerequisite):
+└── All affected systems restored and owner-verified
+└── Business operations confirmed normal by system owners
+└── No recurrence events detected in monitoring window
+└── Lessons-learned process initiated (hand off to after-action-reviewer)
+```
+
+---
+
+## Stakeholder Notification Timing
+
+### Internal Notifications (RS.CO-02)
+
+| Audience | Trigger | Cadence (active incident) | Cadence (stable/contained) |
+|----------|---------|--------------------------|---------------------------|
+| Incident Handler / SOC | Immediate on declaration | Continuous | Continuous |
+| IR Team Lead | Within 15 minutes of P1/P2 declaration | Per incident activity | Per incident activity |
+| CISO / Security Leadership | Within 1 hour of P1; within 4 hours of P2 | Hourly (P1), every 4hr (P2) | Every 4-8hr |
+| Legal Counsel | Immediately if regulated data exposure possible | As needed | As needed |
+| Executive Leadership | Within 2 hours of major incident (RS.CO-03.R2) | Hourly | Every 4-8hr |
+| IT Operations | Before any containment action affecting infrastructure | Per action | Per action |
+
+### External Notifications (RS.CO-02.R3)
+
+Regulatory notification assessment must be documented for every incident, regardless of outcome. The decision is:
+
+```
+Does this incident involve data that is subject to breach notification law?
+├── Yes (PII, PHI, PCI, FERPA, GLBA, etc.)
+│   └── Legal hold triggered immediately
+│   └── Regulatory clock starts at discovery (not confirmation)
+│   └── Engage legal counsel NOW — they determine notification timing
+│   └── Document: date/time of discovery, scope estimate, counsel engagement
+│
+└── No confirmed regulated data
+    └── Document: what data types were in scope, why notification not required
+    └── Reassess if scope expands
+    └── Do not mark N/A without written rationale
+```
+
+### Communication Cadence Calibration
+
+- **Fast-moving incidents (active exfiltration, spreading ransomware):** Hourly executive pulse; 15-minute responder sync
+- **Contained incidents under active investigation:** 4-hour executive update; responder sync at shift transitions
+- **Stable, low-risk incidents:** 8-hour or end-of-day update sufficient
+- **Rule:** If stakeholders are calling *in* to ask for status, your cadence is too slow. If responders are spending >20% of time on updates, your cadence is too fast.
+
+---
+
+## Escalation Decision Tree (RS.MA-04)
+
+```
+Should this escalate to CIRT / external IR firm?
+
+Criterion 1: Scope complexity
+└── Affected environments: >2 business units OR cloud + on-prem + OT → ESCALATE
+
+Criterion 2: Team capacity
+└── Active incident requires >80% of available IR staff for >4 hours → ESCALATE
+└── A second unrelated incident is declared concurrently → ESCALATE
+
+Criterion 3: Threat sophistication
+└── Evidence of APT TTPs, zero-day exploitation, or signed malware → ESCALATE
+└── Attacker has demonstrated OPSEC (custom tools, living-off-the-land, no detectable C2) → ESCALATE
+
+Criterion 4: Regulatory or legal exposure
+└── Data breach notification law triggered → ESCALATE (legal team + IR firm with legal hold experience)
+└── Law enforcement involvement likely → ESCALATE (FBI, CISA engagement requires case management)
+
+Criterion 5: Business continuity
+└── Incident threatens operational continuity for >24 hours → ESCALATE
+
+→ If any criterion is met: escalate immediately, do not wait to see if the team can manage it.
+→ If no criterion is met: document the assessment and reassess every 4 hours.
+```
+
+---
+
+## Legal Hold Trigger Conditions
+
+Legal hold must be initiated before evidence is altered or destroyed. Triggers include:
+
+- Any confirmed or suspected access to PII, PHI, financial records, or IP
+- Receipt of external demand (ransom note, extortion communication)
+- Any indication that law enforcement may be involved
+- Evidence of data staging or exfiltration, even if exfiltration is unconfirmed
+- Incident scope includes a third party or vendor whose data may be affected
+
+**Legal hold actions (coordinate with legal counsel):**
+1. Suspend any automated log rotation or data deletion for affected systems
+2. Preserve all email communications related to the incident (including internal)
+3. Document chain of custody for all forensic artifacts
+4. Do NOT share evidence copies externally without legal sign-off
+
+---
+
+## NEVER
+
+- **NEVER begin eradication before containment is verified** — if the attacker retains any active access path while you're removing artifacts, they simply reinstall. Every eradication step performed on an uncontained environment must be treated as ineffective and repeated after containment is confirmed. Premature eradication is the leading cause of "re-infection" that is actually the original infection continuing.
+
+- **NEVER make containment decisions without explicit authority designation** — an IR handler who isolates a production payment system without documented authorization exposes the organization to operational liability equal to or greater than the incident itself. Authority to contain must be pre-established in the IR plan (GV.RR-02) and confirmed verbally/in-writing at incident start. If it isn't, STOP and get authorization before acting.
+
+- **NEVER delay legal hold until breach is confirmed** — the legal hold trigger is *reasonable belief* that regulated data may have been accessed, not confirmed proof. Organizations that wait for forensic confirmation before engaging legal counsel routinely destroy evidence that would have been protected, miss regulatory notification clocks, and lose litigation defensibility. Engage legal counsel the moment regulated data is plausibly in scope.
+
+- **NEVER skip the evidence preservation/containment sequencing decision** — acting on containment without first deciding whether to capture forensic artifacts (memory, logs, active connections) is not a decision — it is an accident. Every incident requires an explicit, documented choice: "we preserved X before containing because Y" or "we contained before preserving because Z." Undocumented sequencing is a scope and chain-of-custody failure.
+
+- **NEVER communicate incident details to external parties (including vendors) without legal review** — even well-intentioned sharing of incident scope with a SaaS vendor or managed service provider can constitute unauthorized disclosure of a breach, complicate law enforcement investigations, or trigger unintended contractual notice obligations. All external communication must route through legal during an active incident.
+
+- **NEVER validate containment based on absence of alerts alone** — alert silence is not containment confirmation. Sophisticated attackers suppress telemetry; misconfigured detection tools have blind spots; containment decisions based only on "we haven't seen new alerts" routinely fail. Containment is verified by positive evidence: confirmed network isolation, account revocation receipts, firewall rule confirmation, and endpoint telemetry showing no outbound connections from isolated hosts.
+
+- **NEVER bypass the scope validation step (RS.AN-08) before declaring recovery** — teams that validate only the initially identified asset and declare recovery without checking adjacent systems are not in recovery, they are in a false sense of recovery. Lateral movement is the norm in advanced incidents, not the exception. Document what was checked, how, and the result before any recovery declaration.
+
+---
+
+## When Things Go Wrong
+
+| Situation | Response |
+|-----------|----------|
+| Containment was executed before evidence was preserved | Document the decision retroactively with rationale; flag RS.AN-06 as compromised in the eventual AAR; salvage what evidence remains (SIEM logs, network flow data, cloud audit trails); note that forensic completeness is reduced and conclusions may be limited. |
+| Legal counsel is unavailable at incident start | Designate an authorized internal contact (General Counsel backup, outside counsel on retainer) in the IR plan *before* incidents. If no one is available: assume legal hold applies, suspend all log rotation and deletion, do not make external disclosures, and document that counsel was sought but unavailable and when they were eventually reached. |
+| Attacker re-establishes access after eradication | Restart at Containment — do not continue the current eradication pass. The re-access event means at least one persistence mechanism or access path was missed. Treat this as a new scope event: re-examine all systems, not just the one where re-access was detected. Document the timeline split (incident resumes at Containment, [timestamp]). |
+| Executive demands recovery before containment is verified | This is a governance and authority decision, not an IR decision. Document the request, the responder's assessment, and the executive's direction in writing. If the executive overrides containment verification, note that RS.MA-05 was not fully completed and that the organization accepted that risk. Ensure the after-action-reviewer captures this as a process finding. |
+| Stakeholder communication is generating more inbound calls than outbound updates can handle | Establish a single status page or bridge line for stakeholders; shift from push-notifications to pull-access (stakeholders check status board vs. receiving individual updates). Assign one communicator role solely responsible for stakeholder management so responders are not interrupted. |
+| Scope expands mid-incident (new systems discovered in attacker's reach) | Do not restart the incident — extend it. Document the scope expansion as a new finding within the same incident, re-run the Identification phase for the newly in-scope assets, and reassess severity. If the expansion triggers major incident criteria, escalate immediately. Notify stakeholders of scope change at the next communication checkpoint. |
+| Two simultaneous incidents with overlapping responder pool | Invoke CIRT escalation criteria immediately. Triage which incident has higher business impact and assign primary responders there. Assign secondary team or escalate the lower-priority incident. Do not attempt to run both incidents with the same team at full capacity — partial attention on two incidents consistently produces worse outcomes than full attention on one with escalation on the other. |
+
+---
+
+## Incident Coordination Checklist (Quick Reference)
+
+```
+IDENTIFICATION
+[ ] DE.AE-08 — Incident declared, criteria documented
+[ ] RS.MA-02 — Severity and urgency assigned
+[ ] RS.MA-03 — Incident categorized (type, scope estimate)
+[ ] RS.MA-04 — Major incident criteria evaluated
+[ ] GV.RR-02 — Handler authority confirmed
+
+CONTAINMENT
+[ ] RS.AN-06 — Evidence preservation decision documented (before/after isolation)
+[ ] Legal hold assessed and triggered if applicable
+[ ] Containment actions executed and logged with timestamps
+[ ] Containment verification completed (positive evidence, not absence of alerts)
+[ ] Stakeholder notification sent (internal per RS.CO-02, external if RS.CO-02.R3 applies)
+
+ERADICATION
+[ ] Scope validation completed (RS.AN-08 — adjacent systems checked)
+[ ] All artifacts removed; persistence mechanisms identified and severed
+[ ] Vulnerability patched or mitigated
+[ ] Credentials rotated for all accounts in attacker's reach
+
+RECOVERY
+[ ] Systems restored from verified clean state (RC.RP-03)
+[ ] Pre-production IoC scan completed (RC.RP-05.R1)
+[ ] System owner sign-off documented (RC.RP-04.R2)
+[ ] Elevated monitoring in place for restored systems
+[ ] Lessons-learned process initiated → hand off to nist-800-61r3-after-action-reviewer
+```
+
+---
+
+## Related Skills
+
+| Skill | When to use |
+|-------|-------------|
+| `nist-800-61r3-after-action-reviewer` | After the incident is closed — review the AAR for completeness and actionability |
+| `nist-800-61r3-policy-reviewer` | Review the IR policy and plan that governs this incident coordination |
+| `nist-800-61r3-gap-analyzer` | Identify what is missing from your IR playbooks before the next incident |
+| `nist-800-61r3-maturity-scorer` | Score program maturity using this incident as evidence input |
+
+---
+
+## Deliverable
+
+Real-time coordination guidance covering:
+- PICERL phase status and sequencing decisions
+- Evidence preservation vs. containment trade-off documentation
+- Stakeholder notification timing and cadence decisions
+- Escalation assessment with documented criteria
+- Legal hold trigger evaluation
+- Containment verification before eradication sign-off
+- Recovery readiness checklist
+- Handoff package for `nist-800-61r3-after-action-reviewer`
diff --git a/skills/nist-800-61r3-maturity-scorer/SKILL.md b/skills/nist-800-61r3-maturity-scorer/SKILL.md
index e59c492..f23ab4e 100644
--- a/skills/nist-800-61r3-maturity-scorer/SKILL.md
+++ b/skills/nist-800-61r3-maturity-scorer/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: nist-800-61r3-maturity-scorer
-description: "Score an organization's IR program maturity against the NIST SP 800-61r3 CSF 2.0 Community Profile using a weighted scoring model tied to element priority (High/Medium/Low). Produces a per-function heatmap, weighted overall score, maturity level (1-5), and the highest-leverage improvement actions. Use when asked to 'score this against NIST', 'maturity assessment', 'how mature is our IR program', or 'NIST compliance score'."
+description: "Score an organization's IR program maturity against the NIST SP 800-61r3 CSF 2.0 Community Profile using a weighted scoring model tied to element priority (High/Medium/Low). Produces a per-function heatmap, weighted overall score, maturity level (1-5), and the highest-leverage improvement actions. Use when asked to 'score this against NIST', 'maturity assessment', 'how mature is our IR program', 'NIST compliance score', 'IR program score', or 'benchmark our security program'."
 user-invocable: true
 ---
 
@@ -10,6 +10,59 @@ Score an organization's IR program maturity against the complete CSF 2.0 Communi
 
 ---
 
+## Mindset
+
+Expert scoring heuristics that distinguish accurate assessments from flattering ones:
+
+1. **Weight the operational core, not the paperwork.** Table 3 elements (DE/RS/RC) get 3x weight because an org that can't detect or respond is genuinely dangerous regardless of governance maturity. A perfect GV score with a weak DE score is Level 2 at best.
+
+2. **Evidence beats claims.** "We have a process" is 0.5. "Here is the documented process with test results from the last 12 months" is 1.0. Self-attestation without artifacts cannot score above 0.5 on any element.
+
+3. **The improvement loop breaks the ceiling.** No program reaches Level 4 without ID.IM-01 through ID.IM-04 fully addressed. Orgs that don't measure their IR performance cannot improve it — they are, by definition, not Managed.
+
+4. **Parent elements are aggregates, not bonuses.** DE.CM scores 1.0 only if all DE.CM-* children are 1.0. Never score a parent element independently of its children.
+
+5. **Partial is not progress.** A 0.5 on a weight-3 element is 1.5 points. Upgrading it to 1.0 adds 1.5 points more. Always surface these "half-finished" high-weight elements as the cheapest wins available.
+
+---
+
+## Navigation
+
+**Use this skill when:**
+- Asked for a maturity score, maturity level, or capability rating against NIST 800-61r3
+- Benchmarking an IR program across assessment cycles (trend tracking)
+- Preparing board/executive reporting on IR program state
+- Input is a coverage map from `nist-800-61r3-csf-mapper` or gap report from `nist-800-61r3-gap-analyzer`
+
+**Do NOT use this skill when:**
+- The ask is about mapping coverage (use `nist-800-61r3-csf-mapper`)
+- The ask is about finding specific gaps (use `nist-800-61r3-gap-analyzer`)
+- The ask is about auditing specific recommendations (use `nist-800-61r3-recommendation-auditor`)
+
+**Input decision tree:**
+
+```
+Input available?
+├── Raw document only
+│   └── Perform inline mapping first (Step 1), then score
+├── Coverage map from csf-mapper
+│   └── Read coverage scores directly → skip to Step 2
+├── Gap report from gap-analyzer
+│   └── Infer coverage from gaps → assign 0.0 for gaps, 1.0 for non-gaps, 0.5 for partials
+├── Multiple inputs
+│   └── Program mode: merge coverage maps, take minimum score per element across sources
+└── No coverage data at all
+    └── Prompt user for at minimum: IR policy, IR plan, detection tool inventory, incident logs
+```
+
+---
+
+## Philosophy
+
+The score is a navigation tool, not a destination. Its purpose is to direct finite improvement resources toward the elements with the highest operational impact — not to satisfy an auditor. A Level 3 organization that knows exactly which five elements to fix next is more valuable than a Level 4 organization that achieved the score through paperwork compliance.
+
+---
+
 ## Goal
 
 Produce a quantitative maturity score grounded in SP 800-61r3's own priority system. High-priority elements (all Table 3 elements) have the most weight; Medium-priority elements (Table 2 selected) have moderate weight; Low-priority elements contribute minimally. Output a score, maturity level, per-function heatmap, and the top improvement actions by ROI.
@@ -75,80 +128,9 @@ Function_score = Σ(element_coverage × element_weight) / Σ(element_weight)
 Overall = Σ(all element_coverage × element_weight) / Σ(all element_weight)
 ```
 
-### Step 3: SP 800-61r3 Element Registry with Weights
-
-**HIGH WEIGHT (3) — Table 3 Elements**
-
-| CSF ID | Weight | Description |
-|--------|--------|-------------|
-| DE.CM | 3 | Continuous Monitoring (parent) |
-| DE.CM-01 | 3 | Network monitoring |
-| DE.CM-02 | 3 | Physical environment monitoring |
-| DE.CM-03 | 3 | Personnel activity monitoring |
-| DE.CM-06 | 3 | External service provider monitoring |
-| DE.CM-09 | 3 | Computing HW/SW/runtime monitoring |
-| DE.AE | 3 | Adverse Event Analysis (parent) |
-| DE.AE-02 | 3 | Events analyzed |
-| DE.AE-03 | 3 | Information correlated from multiple sources |
-| DE.AE-04 | 3 | Impact and scope estimated |
-| DE.AE-06 | 3 | Info provided to authorized staff/tools |
-| DE.AE-07 | 3 | CTI integrated into analysis |
-| DE.AE-08 | 3 | Incident declaration criteria |
-| RS.MA | 3 | Incident Management (parent) |
-| RS.MA-01 | 3 | IR plan executed on declaration |
-| RS.MA-02 | 3 | Incidents triaged and validated |
-| RS.MA-03 | 3 | Incidents categorized and prioritized |
-| RS.MA-04 | 3 | Incidents escalated/elevated |
-| RS.MA-05 | 3 | Recovery initiation criteria applied |
-| RS.AN | 3 | Incident Analysis (parent) |
-| RS.AN-03 | 3 | Root cause established |
-| RS.AN-06 | 3 | Investigation actions recorded with integrity |
-| RS.AN-07 | 3 | Incident data collected with integrity |
-| RS.AN-08 | 3 | Incident magnitude estimated |
-| RS.CO | 3 | Incident Communication (parent) |
-| RS.CO-02 | 3 | Stakeholders notified |
-| RS.CO-03 | 3 | Information shared with stakeholders |
-| RS.MI | 3 | Incident Mitigation (parent) |
-| RS.MI-01 | 3 | Incidents contained |
-| RS.MI-02 | 3 | Incidents eradicated |
-| RC.RP | 3 | Recovery Plan Execution (parent) |
-| RC.RP-01 | 3 | Recovery plan executed |
-| RC.RP-02 | 3 | Recovery actions performed |
-| RC.RP-03 | 3 | Backup integrity verified |
-| RC.RP-04 | 3 | Critical functions considered post-incident |
-| RC.RP-05 | 3 | Restored asset integrity verified |
-| RC.RP-06 | 3 | Recovery declared, documentation completed |
-| RC.CO | 3 | Recovery Communication (parent) |
-| RC.CO-03 | 3 | Recovery progress communicated |
-| RC.CO-04 | 3 | Public updates shared |
-
-**MEDIUM WEIGHT (2) — Table 2 Selected**
-
-| CSF ID | Weight |
-|--------|--------|
-| GV.RR | 2 |
-| GV.RR-02 | 2 |
-| GV.PO | 2 |
-| GV.OC-03 | 2 |
-| GV.SC-08 | 2 |
-| GV.OV-01 | 2 |
-| ID.RA-02 | 2 |
-| ID.RA-05 | 2 |
-| ID.RA-06 | 2 |
-| ID.AM-01 | 2 |
-| ID.AM-02 | 2 |
-| ID.IM-01 | 2 |
-| ID.IM-02 | 2 |
-| ID.IM-03 | 2 |
-| ID.IM-04 | 2 |
-| PR.DS-11 | 2 |
-| PR.PS-04 | 2 |
-
-**LOW WEIGHT (1) — Table 2 Remaining**
-
-All remaining GV, ID, PR elements not listed above (GV.OC-01/02/04/05, GV.RM-01–07, GV.RR-01/03/04, GV.PO-01/02, GV.OV-02/03, GV.SC-01–10, ID.AM-03–08, ID.RA-01/03/04/07/08/09/10, PR.AA, PR.AT, PR.DS, PR.PS, PR.IR)
-
-### Step 4: Per-Function Scores
+Full element registry with weights: `../nist-800-61r3-shared/references/csf-element-registry.md`
+
+### Step 3: Per-Function Scores
 
 Compute a score for each of the 6 CSF Functions independently, using only the elements belonging to that function.
 
@@ -162,23 +144,23 @@ ID (Identify)[████░░░░░░]  45%
 PR (Protect) [████░░░░░░]  40%
 ```
 
-### Step 5: Maturity Level Assignment
+### Step 4: Maturity Level Assignment
 
 Map the overall weighted score to a maturity level:
 
 | Score | Level | Name | Characterization |
 |-------|-------|------|-----------------|
 | 0–29% | 1 | Initial | No consistent IR practices; ad hoc and reactive |
-| 30–49% | 2 | Developing | Basic IR exists but is undocumented or inconsistently applied |
-| 50–69% | 3 | Defined | Documented IR program exists; key processes established |
+| 30–49% | 2 | Developing | Basic IR exists but undocumented or inconsistently applied |
+| 50–69% | 3 | Defined | Documented IR program; key processes established |
 | 70–84% | 4 | Managed | Measured IR program; performance tracked; continuous improvement in place |
 | 85–100% | 5 | Optimizing | IR integrated into all cybersecurity risk management; lessons feed back continuously |
 
 These levels correspond loosely to CSF 2.0 Implementation Tiers 1–4 and the maturity concepts from ID.IM.
 
-### Step 6: High-ROI Improvements
+### Step 5: High-ROI Improvements
 
-Identify the top 5 improvements with highest score impact:
+Identify the top 5 improvements with highest score impact.
 
 For each Not Met or Partial element, compute:
 - **Score uplift**: (1.0 - current_coverage) × weight / total_weighted_points
@@ -193,8 +175,6 @@ Common high-ROI improvements (based on typical IR programs):
 4. **RS.CO-02** (stakeholder notifications) — weight 3; legal/regulatory exposure reduction
 5. **PR.PS-04** (log records) — weight 2; enables DE.AE-02/03/07 detection capabilities
 
-### Step 7: Generate Maturity Report
-
 ---
 
 ## Output Format
@@ -237,9 +217,9 @@ Table 2 (Prep) score: 47%  ← preparation and prevention readiness
 
 ━━━ ELEMENT COVERAGE DETAIL ━━━
 
-High Priority (Table 3):  24/39 elements addressed (3 partial)
+High Priority (Table 3):  24/40 elements addressed (3 partial)
 Medium Priority (Table 2): 9/17 elements addressed (2 partial)
-Low Priority (Table 2):   11/24 elements addressed
+Low Priority (Table 2):   11/48 elements addressed
 
 ━━━ TOP 5 HIGH-ROI IMPROVEMENTS ━━━
 
@@ -251,10 +231,10 @@ Rank  Element        Score Uplift  Dependencies  Action
  2    RS.AN-03        +2.8%         3 unlocked    Implement RCA methodology
                                                   (5-Whys or fishbone) in all
                                                   AAR procedures
- 3    ID.IM-04.R2     +1.6%         2 unlocked    Establish formal IR plan
+ 3    ID.IM-04        +1.6%         2 unlocked    Establish formal IR plan
                                                   review cycle (annual +
                                                   post-major-incident trigger)
- 4    RS.CO-02.R3     +2.4%         0 unlocked    Document regulatory
+ 4    RS.CO-02        +2.4%         0 unlocked    Document regulatory
                                                   notification obligations by
                                                   jurisdiction and sector
  5    PR.PS-04        +1.8%         4 unlocked    Implement centralized log
@@ -281,19 +261,30 @@ Change: N/A
 
 ---
 
-## Deliverable
+## NEVER
 
-A scored maturity report with function-level heatmap, maturity level assignment, and prioritized improvement roadmap. Suitable for:
-- Executive reporting on IR program state
-- Security program roadmap planning
-- Board/leadership cybersecurity briefings
-- Benchmarking across assessment cycles
+- **NEVER report a score without showing the per-function breakdown** — an overall score hides inverse function scores that cancel each other out; a 65% overall can mask a 30% RS score that indicates an IR program that cannot respond
+- **NEVER assign Maturity Level 4 or 5 without verifying the ID.IM improvement loop (ID.IM-01 through ID.IM-04)** — measurement and continuous improvement are definitional to "Managed"; their absence makes the level assignment technically invalid regardless of score arithmetic
+- **NEVER accept self-attestation as 1.0 coverage** — "we have a process" is 0.5 until documented evidence exists; inflating coverage rewards mature-sounding language over actual capability
+- **NEVER round up partial coverage** — 0.5 is 0.5; upgrading a weight-3 element from 0.5 to 1.0 adds 1.5 points to the score; collapsing this difference destroys the uplift calculation
+- **NEVER score parent elements independently of their children** — DE.CM is 1.0 only when all DE.CM-* children are 1.0; scoring the parent header as a bonus double-counts coverage
+- **NEVER use the score as a compliance checkbox** — maturity scores measure program capability, not regulatory compliance; state explicitly that Level 4 does not equal FISMA compliance, SOC 2 readiness, or any specific regulatory posture
+- **NEVER omit the Table 3 sub-score** — the Table 3 (IR operational) score is the single most important indicator of actual incident response readiness and must always be surfaced separately from the overall score
 
 ---
 
-## NEVER
+## When Things Go Wrong
+
+| Situation | What It Usually Means | What to Do |
+|-----------|----------------------|------------|
+| Overall score is Level 4+ but Table 3 sub-score is below 70% | Governance and prep are strong, but actual IR execution is weak — the program looks good on paper but cannot respond | Flag the inversion explicitly; the Table 3 score is the operative one for operational risk |
+| All elements score 0.5 (org claims partial for everything) | Assessor or org is hedging; 0.5 across the board usually means no artifacts were reviewed | Push for specific evidence artifacts; if none exist, those elements are 0.0 |
+| Score drops significantly vs. prior assessment | Either the prior assessment was inflated, or a capability genuinely regressed (staff turnover, tool decommission) | Compare element-level deltas; regression in DE.CM or RS.MA elements is the most operationally dangerous |
+| Input is a polished IR policy document only | Policy alone covers GV and some ID elements; DE/RS/RC will score near 0.0 without operational evidence | State clearly that a policy-only assessment reflects program design intent, not operational capability |
+| ID.IM elements all score 1.0 but program is clearly Level 2 | ID.IM is self-referential — an org can write improvement policies without implementing them | Require evidence of at least one completed improvement cycle (before/after data from an AAR) |
+
+---
+
+## References
 
-- **NEVER report a score without showing the per-function breakdown** — an overall score without function detail masks where the real problems are
-- **NEVER assign Maturity Level 4 or 5 without verifying the ID.IM improvement loop** — a program that doesn't measure itself cannot be Managed or Optimizing
-- **NEVER use the score as a compliance checkbox** — maturity scores measure program capability, not regulatory compliance; explicitly state this limitation
-- **NEVER round up partial coverage** — 0.5 is 0.5; a half-addressed element is not the same as a fully addressed one
+- `../nist-800-61r3-shared/references/csf-element-registry.md` — Full weighted element registry (High/Medium/Low), element counts, max points table. Load when computing element-level scores or presenting coverage detail.
diff --git a/skills/nist-800-61r3-maturity-scorer/references/element-registry.md b/skills/nist-800-61r3-maturity-scorer/references/element-registry.md
new file mode 100644
index 0000000..ad94c84
--- /dev/null
+++ b/skills/nist-800-61r3-maturity-scorer/references/element-registry.md
@@ -0,0 +1,13 @@
+# Element Registry — MOVED
+
+The canonical CSF element registry (element list with weights, scoring summary) has been
+consolidated into the shared reference file:
+
+**See: `../../../nist-800-61r3-shared/references/csf-element-registry.md`**
+
+All sections previously in this file are present in the shared registry:
+- HIGH WEIGHT (3) — Table 3 elements with full descriptions
+- MEDIUM WEIGHT (2) — Table 2 selected elements
+- LOW WEIGHT (1) — Table 2 remaining elements
+- Scoring Summary table (element counts, max weighted points)
+- Parent-element scoring note
diff --git a/skills/nist-800-61r3-policy-reviewer/SKILL.md b/skills/nist-800-61r3-policy-reviewer/SKILL.md
index a0e62c4..350bffc 100644
--- a/skills/nist-800-61r3-policy-reviewer/SKILL.md
+++ b/skills/nist-800-61r3-policy-reviewer/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: nist-800-61r3-policy-reviewer
-description: "Evaluate IR policy and procedure documents against NIST SP 800-61r3 Section 2.3 required policy elements and the GV.PO/GV.RR/ID.IM-04 CSF categories. Checks for all 8 required policy elements, role authority designations, and plan maintenance requirements. Use when asked to 'review this IR policy', 'check our incident response policy', 'is this policy complete', or 'policy completeness check'."
+description: "Evaluate IR policy and procedure documents against NIST SP 800-61r3 Section 2.3 required policy elements and the GV.PO/GV.RR/ID.IM-04 CSF categories. Checks for all 8 required policy elements, role authority designations, and plan maintenance requirements. Use when asked to 'review this IR policy', 'check our incident response policy', 'is this policy complete', 'policy completeness check', or 'audit our IR policy'."
 user-invocable: true
 ---
 
@@ -10,155 +10,93 @@ Evaluate IR policies, plans, and procedure documents against the requirements de
 
 ---
 
-## Goal
+## Mindset
 
-SP 800-61r3 Section 2.3 explicitly defines the required elements of an IR policy. This skill checks that a policy document contains all required elements, properly designates roles and authorities, and establishes the plan maintenance framework needed to stay current.
+1. **Explicit over implied.** A policy element is either explicitly stated or it is Missing — "we do this operationally" does not count. The document must speak for itself, because during an actual incident nobody reads the CSIRT wiki.
+
+2. **Authority gaps are the most dangerous finding.** Organizations routinely list roles without designating *who can act*. A handler who doesn't know if they're authorized to yank a production server from the network will hesitate. That hesitation has measurable cost.
+
+3. **Generic templates are a warning signal, not a pass.** If §3 of the scope says "all company assets and personnel" and nothing in the document references cloud environments, OT/ICS, remote workers, or contractors, the policy was never tailored — it's theater. Flag it.
+
+4. **Performance measures absent = no improvement loop.** Element 7 is the most commonly missing element. Its absence means the org cannot demonstrate IR program effectiveness and cannot satisfy ID.IM-01.R1. Treat it as a critical gap regardless of how good everything else looks.
+
+5. **The definitions section propagates errors.** If Element 4 is wrong or absent, every handler interprets "incident" differently. Misclassifications cascade into wrong escalation paths, wrong notifications, and wrong recovery decisions. Always check definitions first after scope.
 
 ---
 
-## Inputs
+## Navigation
+
+### When to use this skill
+- Reviewing an IR policy, IR plan, or IR procedures document for NIST SP 800-61r3 compliance
+- Auditing whether a policy is ready for operational use
+- Generating findings for an IR policy revision task list
+- Feeding input to `nist-800-61r3-maturity-scorer`
+
+### When NOT to use this skill
+- Reviewing an IR *plan's operational procedures* or playbooks (use `nist-800-61r3-lifecycle-reviewer` instead)
+- Assessing incident response *capability maturity* across the PICERL lifecycle (use `nist-800-61r3-maturity-scorer`)
+- Reviewing a tabletop exercise design (different skill)
+
+### Pre-Flight Decision Tree
+
+```
+Is the document provided a policy/plan/procedure?
+├─ No → Ask the user for the IR policy document before proceeding
+└─ Yes
+   │
+   Is the document more than a paragraph long?
+   ├─ No (stub/template placeholder only) → Flag as "Policy shell only — no reviewable content"
+   └─ Yes
+      │
+      What is the organization type?
+      ├─ Federal agency   → Check FISMA + OMB notification requirements (GV.OC-03)
+      ├─ Healthcare       → Check HIPAA Breach Notification Rule timing
+      ├─ Financial        → Check GLBA / NY DFS Part 500 / SEC Rule 10-D1
+      ├─ Critical infra   → Check CISA CIRCIA 72-hour / 24-hour ransom payment rule
+      └─ Private sector   → Check applicable state breach laws; skip FISMA
+         │
+         Does the org use MSSPs, CSPs, or ISPs?
+         ├─ Yes → Element 8 (shared responsibility) is required, not optional
+         └─ No  → Element 8 is N/A; note in report
+
+Run all 8 element checks → Roles & Authorities deep check → Plan framework → Generate report
+```
+
+---
 
-- The IR policy, IR plan, or procedures document to review (text or file path)
-- Organization type (federal agency, private sector, MSSP) — affects regulatory notification requirements
+## Philosophy
+
+A policy that cannot be picked up by a new hire on day one of an incident and followed without explanation is not a policy — it is a compliance artifact. Every gap in this checklist represents a decision that will be made ad hoc under pressure, with no documented authority backing it.
 
 ---
 
 ## Workflow
 
 ```
-1. Load Section 2.3 Policy Checklist
+1. Pre-flight: confirm document + org type (decision tree above)
+   ↓
+2. Load references/policy-checklist.md
    ↓
-2. Check 8 Required Policy Elements
+3. Check all 8 Required Policy Elements (§2.3)
    ↓
-3. Check Roles & Authorities (GV.RR-02)
+4. Check Roles & Authorities deep check (GV.RR-02.R1, R2)
    ↓
-4. Check Plan Framework (ID.IM-04)
+5. Check Plan Framework (ID.IM-04 R1–R4)
    ↓
-5. Check Shared Responsibility & Supply Chain (GV.SC-08)
+6. Check Regulatory/Notification requirements (GV.OC-03) — org-type dependent
    ↓
-6. Generate Policy Review Report
+7. Generate Policy Review Report (format below)
 ```
 
-### Step 1: Section 2.3 Policy Checklist
-
-SP 800-61r3 §2.3 states that most IR policies include the same key elements. These are the required policy elements:
-
-| # | Element | CSF Anchor |
-|---|---------|-----------|
-| 1 | Statement of management commitment | GV.RR-01 |
-| 2 | Purpose and objectives of the policy | GV.PO |
-| 3 | Scope of the policy (to whom/what it applies, under what circumstances) | GV.PO |
-| 4 | Definition of events, cybersecurity incidents, investigations, and related terms | DE.AE-08 |
-| 5 | Roles, responsibilities, and authorities (including who has authority to confiscate/disconnect/shut down assets) | GV.RR-02 |
-| 6 | Guidelines for prioritizing incidents, estimating severity, initiating recovery, maintaining/restoring operations, and other key actions | RS.MA-03, RS.MA-05 |
-| 7 | Performance measures | ID.IM-01 |
-| 8 | Shared responsibility model with third parties (MSSPs, CSPs, ISPs) if applicable | GV.SC-08, GV.RR-02 |
-
-### Step 2: Check 8 Required Policy Elements
-
-For each element, determine:
-- **Present** — policy explicitly addresses this element
-- **Partial** — element is referenced but incomplete (e.g., roles listed but no authority designations)
-- **Missing** — no evidence of the element
-
-**Detailed checks per element:**
-
-**Element 1 — Management Commitment**
-- Is there a signed/dated endorsement from executive leadership?
-- Does it reference resource allocation for IR?
-- CSF: GV.RR-01 — "Organizational leadership is responsible and accountable for cybersecurity risk"
-
-**Element 2 — Purpose and Objectives**
-- Does the policy state why it exists?
-- Does it reference reducing incident impact and improving response effectiveness?
-- CSF: GV.PO
-
-**Element 3 — Scope**
-- Who does this apply to? (employees, contractors, third parties, specific systems)
-- What environments are covered? (cloud, on-prem, OT/ICS, mobile)
-- Are exclusions documented?
-- CSF: GV.PO
-
-**Element 4 — Definitions**
-- Are "event," "adverse event," and "cybersecurity incident" defined per SP 800-61r3 Appendix B?
-- Is "incident" distinguished from "event" — incidents jeopardize CIA of information or constitute policy violations
-- Are investigation-related terms defined?
-- CSF: DE.AE-08
-
-**Element 5 — Roles, Responsibilities, and Authorities**
-- Are incident response roles explicitly named (incident lead, handlers, legal, HR, comms, leadership)?
-- Does the policy designate which roles have authority to:
-  - Confiscate assets
-  - Disconnect systems from the network
-  - Shut down technology assets
-- Are third-party roles (MSSP, CSP, law enforcement) addressed?
-- CSF: GV.RR-02.R1, GV.RR-02.R2
-
-**Element 6 — Prioritization, Severity, and Recovery Guidelines**
-- Are incident severity/priority tiers defined (e.g., P1–P4 or Critical/High/Medium/Low)?
-- Are risk evaluation factors listed? SP 800-61r3 RS.MA.N2 suggests: asset criticality, functional impact, data impact, stage of observed activity, threat actor characterization, recoverability
-- Are recovery initiation criteria defined? (RS.MA-05.R1)
-- Are time-based response SLAs included?
-- CSF: RS.MA-03.R1, RS.MA-05.R1
-
-**Element 7 — Performance Measures**
-- Are IR program performance metrics defined?
-- Examples: mean time to detect, mean time to respond, mean time to recover, % incidents meeting SLAs, % staff trained
-- Is there a review cycle for measuring performance against these metrics?
-- CSF: ID.IM-01.R1 — "Periodically evaluate IR program performance to identify problems"
-
-**Element 8 — Shared Responsibility Model**
-- If the organization uses MSSPs/CSPs/ISPs: are their responsibilities documented?
-- Are contracts/SLAs referenced?
-- Are information flow and coordination authorities defined?
-- Are restrictions on third-party actions documented (e.g., cannot share sanitized incident info with other customers)?
-- CSF: GV.SC-08, GV.RR-02.R3
-
-### Step 3: Roles & Authorities Deep Check (GV.RR-02)
-
-Check specifically for GV.RR-02.R1, R2, R3:
-
-**R1 — All IR roles documented in organizational policies:**
-- [ ] Incident Response Team / SOC roles documented
-- [ ] Legal counsel role in IR documented
-- [ ] HR role documented (GV.RR-02 + RS.CO-03.R3)
-- [ ] Public affairs / media relations role documented
-- [ ] Physical security and facilities role documented
-- [ ] Asset owners role documented
-- [ ] Leadership decision authority documented
-
-**R2 — All appropriate individuals designated the authority to fulfill IR responsibilities:**
-- [ ] Named or role-titled individuals with authority to isolate/disconnect systems
-- [ ] Named or role-titled individuals with authority to invoke business continuity plans
-- [ ] Named or role-titled individuals with authority to engage law enforcement
-- [ ] Escalation/elevation triggers and authority chain documented (RS.MA-04)
-
-### Step 4: Plan Framework Check (ID.IM-04)
-
-SP 800-61r3 ID.IM-04 has four recommendations for all cybersecurity plans:
-
-- **R1** — Synchronize business continuity plans with incident response plans
-  - Check: Does the policy reference BCP/DRP alignment?
-- **R2** — Review and update all cybersecurity plans periodically or when significant improvements are identified
-  - Check: Is there a review/update cycle defined? (Annual, post-incident trigger?)
-- **R3** — Base each plan on the organization's unique requirements, mission, size, structure, and functions
-  - Check: Is the policy tailored to the org or clearly a generic template?
-- **R4** — Each plan identifies the resources and management support needed
-  - Check: Are resource requirements (staff, tools, budget) referenced?
-
-### Step 5: Regulatory & Notification Check (GV.OC-03)
-
-Based on organization type, verify the policy addresses applicable notification requirements:
-
-**GV.OC-03.R1** — Cybersecurity requirements include all IR-related requirements (incident notification, data breach reporting)
-
-Check:
-- [ ] Notification obligations referenced (FISMA, HIPAA, PCI-DSS, GDPR, state breach laws — as applicable)
-- [ ] Breach notification timelines addressed (RS.CO-02.R3)
-- [ ] Law enforcement notification criteria defined (RS.CO-02.R5)
-- [ ] Regulatory body notification criteria defined
-
-### Step 6: Generate Policy Review Report
+Full element-by-element checklists, GV.RR-02 role checks, ID.IM-04 plan framework checks, and org-type regulatory tables are in:
+`references/policy-checklist.md`
+
+---
+
+## Inputs
+
+- The IR policy, IR plan, or procedures document (text or file path)
+- Organization type (federal agency, private sector, healthcare, financial, MSSP) — determines regulatory notification requirements
 
 ---
 
@@ -167,7 +105,7 @@ Check:
 ```
 === NIST SP 800-61r3 IR Policy Review ===
 Document: [policy name/version]
-Organization type: [federal / private / MSSP]
+Organization type: [federal / private / healthcare / financial / MSSP]
 Standard: NIST SP 800-61r3 §2.3 + CSF 2.0 GV/ID elements
 
 ━━━ SECTION 2.3 REQUIRED ELEMENTS CHECKLIST ━━━
@@ -186,15 +124,13 @@ Standard: NIST SP 800-61r3 §2.3 + CSF 2.0 GV/ID elements
 [~] Element 5 — Roles, Responsibilities, and Authorities
     Partial: Roles listed (IR Team, Legal, HR) but authority to disconnect
     systems is not designated to any specific role.
-    Required by: GV.RR-02.R2 — "All appropriate individuals should be
-    designated the authority necessary to fulfill their IR responsibilities"
+    Required by: GV.RR-02.R2
     Fix: Add explicit authority table specifying who can isolate/disconnect/
     shut down each asset class.
 
 [✓] Element 6 — Prioritization Guidelines
     Found: Four-tier severity model (P1-P4) with response SLAs; §3.2
-    Note: Recovery initiation criteria (RS.MA-05.R1) are absent — policy
-    defines when to start responding but not when to start recovering.
+    Note: Recovery initiation criteria (RS.MA-05.R1) are absent.
 
 [✗] Element 7 — Performance Measures
     Missing: No IR performance metrics defined anywhere in the document.
@@ -231,18 +167,39 @@ Estimated effort to remediate: LOW — most gaps are additions, not rewrites.
 
 ---
 
+## NEVER
+
+- **NEVER accept "implied" policy elements** — if an element is not explicit in the document, mark it Missing. "We do this" verbal assurance does not make a policy compliant; auditors and incident handlers both read the document, not the culture.
+
+- **NEVER skip the authority designations check** — GV.RR-02.R2 is the single most commonly missed requirement in real-world policy reviews. Listing roles without designating authority is operationally equivalent to having no policy: the handler still can't act without making an unauthorized judgment call.
+
+- **NEVER mark Element 4 (definitions) as Partial when terms are absent** — if "event" and "incident" are not explicitly defined, downstream classification, escalation, and notification decisions will all be inconsistent. There is no meaningful "partial" state for a definitions section that omits the primary terms.
+
+- **NEVER recommend creating a separate document for each gap** — all 8 elements belong in the policy document itself. Suggesting a separate "definitions annex" or "roles addendum" is a maintenance antipattern: separate documents drift apart, and handlers only read the policy they were trained on.
+
+- **NEVER skip the definitions check when a policy has a severity model** — organizations frequently define a four-tier severity model but never define what constitutes an "incident" vs. an "event." The severity model is then applied inconsistently: one analyst escalates a phishing email to P1; another logs it as a ticket.
+
+- **NEVER flag Element 8 (shared responsibility) as N/A without asking** — organizations routinely forget that their HR system, email, or endpoint management is SaaS. If the org runs any cloud services, Element 8 likely applies. Ask explicitly before marking N/A.
+
+- **NEVER confuse a policy review with a maturity assessment** — this skill checks whether required elements exist, not whether they work well. A policy can pass all 8 elements and still represent a low-maturity program. Direct maturity questions to `nist-800-61r3-maturity-scorer`.
+
+---
+
+## When Things Go Wrong
+
+| Situation | What it usually means | Recommended action |
+|-----------|----------------------|-------------------|
+| Policy is a vendor template with the org's name substituted | ID.IM-04.R3 failure; likely also fails Elements 3, 6, 8 | Flag the entire document as "not tailored" before element-by-element review; estimate remediation effort as HIGH |
+| Document is labeled "IR Plan" but contains policy-level content | Org conflated policy and plan — common in SMBs | Review the elements present regardless of label; note the structural issue in the report |
+| Policy references "Appendix X for roles" but appendix is not provided | Cannot complete GV.RR-02 check | Note the missing appendix; mark Element 5 as Incomplete Submission, not Partial |
+| Policy was last updated more than 3 years ago | ID.IM-04.R2 violation likely; technology/regulatory references will be stale | Flag the date; check if CIRCIA (effective 2024), GDPR, or current CSF 2.0 references are missing |
+| Org claims MSSP handles all IR and refuses to provide MSSP contract | Element 8 cannot be verified; GV.SC-08 likely fails | Document the gap as Unverifiable; note that undocumented MSSP authority scope is itself a compliance risk |
+
+---
+
 ## Deliverable
 
 A checklist-based policy review report with element-by-element findings, direct SP 800-61r3 citations, and specific remediation guidance. Suitable for:
 - IR policy audit findings
 - Input to `nist-800-61r3-maturity-scorer`
 - Policy revision task list
-
----
-
-## NEVER
-
-- **NEVER accept "implied" policy elements** — if an element is not explicit in the document, it is Missing
-- **NEVER skip the authority designations check** — GV.RR-02.R2 is one of the most commonly missed requirements
-- **NEVER recommend creating a separate document for each gap** — all elements should be in the policy; suggest additions to the existing document
-- **NEVER skip the definitions check** — undefined terms (especially "event" vs. "incident") cause real operational confusion during actual incidents
diff --git a/skills/nist-800-61r3-policy-reviewer/references/policy-checklist.md b/skills/nist-800-61r3-policy-reviewer/references/policy-checklist.md
new file mode 100644
index 0000000..2b51392
--- /dev/null
+++ b/skills/nist-800-61r3-policy-reviewer/references/policy-checklist.md
@@ -0,0 +1,123 @@
+# NIST SP 800-61r3 §2.3 Policy Element Checklist
+
+Full checklist used by the policy reviewer. Load this when conducting element-by-element reviews.
+
+---
+
+## Section 2.3 — Required Policy Elements
+
+| # | Element | CSF Anchor | Common Failure Mode |
+|---|---------|-----------|-------------------|
+| 1 | Statement of management commitment | GV.RR-01 | Signature present but no resource commitment referenced |
+| 2 | Purpose and objectives | GV.PO | Generic boilerplate with no org-specific mission tie |
+| 3 | Scope (who/what/when it applies) | GV.PO | Omits cloud environments or contractor systems |
+| 4 | Definitions: event, adverse event, cybersecurity incident, investigation | DE.AE-08 | "Incident" used generically without CIA-jeopardy distinction |
+| 5 | Roles, responsibilities, and authorities (including confiscation/disconnect/shutdown) | GV.RR-02 | Roles named but authority to act never explicitly designated |
+| 6 | Prioritization, severity, recovery, and key action guidelines | RS.MA-03, RS.MA-05 | Severity tiers defined but recovery initiation criteria absent |
+| 7 | Performance measures | ID.IM-01 | Entirely absent — most common critical gap |
+| 8 | Shared responsibility with third parties (if applicable) | GV.SC-08, GV.RR-02 | Acknowledged but no authority limits on MSSP actions |
+
+### Element-by-Element Detailed Checks
+
+**Element 1 — Management Commitment**
+- Is there a signed/dated endorsement from executive leadership?
+- Does it reference resource allocation for IR (FTE, budget, tools)?
+- CSF: GV.RR-01 — "Organizational leadership is responsible and accountable for cybersecurity risk"
+
+**Element 2 — Purpose and Objectives**
+- Does the policy state why it exists (impact reduction, effectiveness improvement)?
+- Does it align to the org's mission or regulatory obligations?
+
+**Element 3 — Scope**
+- Who is in scope? (employees, contractors, third parties)
+- What environments? (cloud, on-prem, OT/ICS, mobile, remote work)
+- Are exclusions documented and reasoned?
+
+**Element 4 — Definitions**
+- Are "event," "adverse event," and "cybersecurity incident" defined per SP 800-61r3 Appendix B?
+- Is "incident" clearly distinguished from "event" (incidents jeopardize CIA or constitute policy violations)?
+- Are investigation-related terms (evidence, chain of custody) defined?
+- CSF: DE.AE-08
+
+**Element 5 — Roles, Responsibilities, and Authorities**
+- Are IR roles explicitly named: incident lead, handlers, legal, HR, comms, leadership?
+- Does the policy designate which specific roles have authority to:
+  - Confiscate assets
+  - Disconnect systems from the network
+  - Shut down technology assets
+  - Engage law enforcement
+  - Invoke BCP/DRP
+- Are third-party roles (MSSP, CSP, law enforcement) addressed?
+- CSF: GV.RR-02.R1, GV.RR-02.R2
+
+**Element 6 — Prioritization, Severity, and Recovery Guidelines**
+- Are severity/priority tiers defined (e.g., P1–P4 or Critical/High/Medium/Low)?
+- Are risk evaluation factors listed? SP 800-61r3 RS.MA.N2 factors: asset criticality, functional impact, data impact, stage of observed activity, threat actor characterization, recoverability
+- Are recovery initiation criteria defined? (RS.MA-05.R1)
+- Are time-based response SLAs included?
+
+**Element 7 — Performance Measures**
+- Are IR program performance metrics defined?
+- Minimum acceptable metrics: MTTD, MTTR, % incidents within SLA, % staff IR-trained annually
+- Is there a review cycle for measuring performance against these metrics?
+- CSF: ID.IM-01.R1 — "Periodically evaluate IR program performance to identify problems"
+
+**Element 8 — Shared Responsibility Model**
+- If the org uses MSSPs/CSPs/ISPs: are their responsibilities documented?
+- Are contracts/SLAs referenced?
+- Are information flow and coordination authorities defined?
+- Are restrictions on third-party actions documented (e.g., MSSP cannot share sanitized incident data with other customers)?
+- CSF: GV.SC-08, GV.RR-02.R3
+
+---
+
+## Roles & Authorities Deep Check (GV.RR-02)
+
+**R1 — All IR roles documented in organizational policies:**
+- [ ] Incident Response Team / SOC roles documented
+- [ ] Legal counsel role in IR documented
+- [ ] HR role documented — specifically scoped to insider threat (RS.CO-03.R3)
+- [ ] Public affairs / media relations role documented
+- [ ] Physical security and facilities role documented
+- [ ] Asset owners role documented
+- [ ] Leadership decision authority documented
+
+**R2 — All appropriate individuals designated the authority to fulfill IR responsibilities:**
+- [ ] Named or role-titled individuals with authority to isolate/disconnect systems
+- [ ] Named or role-titled individuals with authority to invoke BCP
+- [ ] Named or role-titled individuals with authority to engage law enforcement
+- [ ] Escalation/elevation triggers and authority chain documented (RS.MA-04)
+
+---
+
+## Plan Framework Check (ID.IM-04)
+
+SP 800-61r3 ID.IM-04 has four requirements for all cybersecurity plans:
+
+| Req | Check | Common Finding |
+|-----|-------|----------------|
+| R1 | BCP/DRP aligned with IR plan | "Referenced" but not formally linked or exercised together |
+| R2 | Plans reviewed/updated periodically or post-significant-change | Annual cycle stated but no post-incident trigger |
+| R3 | Plans tailored to org's unique requirements, mission, size | Generic NIST template with org name substituted |
+| R4 | Plans identify resources and management support needed | "Resources as needed" boilerplate — no specifics |
+
+---
+
+## Regulatory & Notification Check (GV.OC-03)
+
+GV.OC-03.R1 — Cybersecurity requirements include all IR-related requirements (incident notification, data breach reporting).
+
+**By organization type:**
+
+| Org Type | Key Regulations to Check |
+|----------|--------------------------|
+| Federal agency | FISMA 2014, OMB M-22-01 (72-hour CISA notification), US-CERT mandatory reporting |
+| Healthcare | HIPAA Breach Notification Rule (60-day), HHS reporting |
+| Financial | GLBA, NY DFS Part 500 (72-hour), SEC Rule 10-D1 |
+| Critical infrastructure | CISA CIRCIA (72-hour report / 24-hour ransom payment) |
+| EU operations | GDPR Article 33 (72-hour supervisory authority) |
+| Payment card | PCI-DSS 12.10.7 — immediate notification to brands |
+
+- [ ] Breach notification timelines addressed (RS.CO-02.R3)
+- [ ] Law enforcement notification criteria defined (RS.CO-02.R5)
+- [ ] Regulatory body notification criteria defined
diff --git a/skills/nist-800-61r3-recommendation-auditor/SKILL.md b/skills/nist-800-61r3-recommendation-auditor/SKILL.md
index f8c2868..7b2c2f3 100644
--- a/skills/nist-800-61r3-recommendation-auditor/SKILL.md
+++ b/skills/nist-800-61r3-recommendation-auditor/SKILL.md
@@ -1,197 +1,94 @@
 ---
 name: nist-800-61r3-recommendation-auditor
-description: "Audit whether a cybersecurity document satisfies the R-tagged (Recommendation) items from NIST SP 800-61r3's CSF 2.0 Community Profile. R-items are the things SP 800-61r3 says organizations *should* do — the highest-signal findings. Outputs a Met/Partial/Not Met assessment per R-item with direct quotes from the standard. Use when asked to 'check all the recommendations', 'audit R items', or 'what recommendations aren't met'."
+description: "Audit whether a cybersecurity document satisfies the R-tagged (Recommendation) items from NIST SP 800-61r3's CSF 2.0 Community Profile (April 2025). R-items are the highest-obligation findings — 'should do' requirements. Outputs Met/Partial/Not Met per R-item with direct quotes. Use when asked to 'check all the recommendations', 'audit R items', 'what recommendations aren't met', 'NIST 800-61 compliance check', or 'SP 800-61r3 gap analysis'."
 user-invocable: true
 ---
 
 # NIST SP 800-61r3 Recommendation Auditor
 
-Audit whether a document satisfies the R-tagged recommendations from the SP 800-61r3 CSF 2.0 Community Profile (April 2025).
+Audit whether a document satisfies the R-tagged recommendations from NIST SP 800-61r3's CSF 2.0 Community Profile (April 2025). The full 89-item R-item catalog is in `references/r-item-catalog.md` — load it before beginning any audit.
 
 ---
 
-## Goal
+## Mindset
 
-SP 800-61r3 uses three annotation types in its Community Profile tables:
-- **R** = Recommendation: "the organization **should** do this"
-- **C** = Consideration: "the organization **should consider** doing this"
-- **N** = Note: informational only
+Expert auditors apply these heuristics before touching a single R-item:
 
-This skill focuses on **R-items** (mandatory-intent recommendations) — these are the highest-signal findings in an audit. C-items are assessed secondarily. N-items are informational only and not scored.
+1. **Burden of proof is on the document, not the auditor.** If the document doesn't explicitly state a requirement is addressed, it is Not Met. Assume absence of evidence is evidence of absence — don't infer from organizational context.
+
+2. **Partial beats Not Met only when sub-requirements are enumerable.** An R-item like `RS.MA-02.R1` requires *both* severity AND urgency. If the document has one but not the other, that's Partial. If it has neither, it's Not Met. Don't split Partials on vague grounds.
+
+3. **Scope discipline prevents false negatives.** An IR Plan must satisfy all Table 3 R-items — not just the ones obviously addressed. Unsatisfied R-items that aren't in scope for the document type are N/A, not Not Met. Document-type scope is in `references/r-item-catalog.md`.
+
+4. **Sequence matters: detect → triage → contain → analyze → communicate → recover.** R-items are structured along this kill-chain-aligned lifecycle. Gaps in early-phase items (DE.AE, RS.MA) cascade — an organization that can't declare an incident (DE.AE-08.R1) can't satisfy any RS.MA item correctly.
+
+5. **Cross-reference for coherence gaps.** If a document claims RS.CO-02.R2 is met ("we follow established reporting procedures") but those procedures aren't named or referenced, flag as Partial — citing a phantom document is not evidence.
 
 ---
 
-## Inputs
+## Navigation
 
-- Document to audit (text, file path, or description)
-- Document type (ask if unclear)
-- Scope: Full audit (all R-items) or targeted (specific CSF Function)
+**Use this skill when:**
+- A user provides an IR plan, playbook, incident report, after-action report, or policy document and asks for gap analysis against NIST SP 800-61r3
+- The user explicitly mentions "R-items", "SP 800-61r3 recommendations", "CSF 2.0 Community Profile", or "audit"
+- Feeding results into `nist-800-61r3-maturity-scorer`
+
+**Do NOT use this skill when:**
+- The document is pre-2025 and references NIST SP 800-61 Rev. 2 only — the R/C/N annotation system did not exist in Rev. 2; use a different mapping
+- The user wants NIST CSF 2.0 *Core* compliance (not 800-61r3 Community Profile) — different catalog
+- The document is a vendor product brochure — R-items apply to *organizational* processes, not tool capabilities
+
+**Scope decision tree:**
+
+```
+Is scope specified? ──Yes──► Use specified CSF Functions
+       │
+       No
+       │
+What document type?
+  IR Plan          ──► All Table 3 + Table 2 governance items
+  Playbook         ──► RS.MA + RS.AN + RS.MI + RC.RP + DE.AE-08.R1
+  Incident Report  ──► RS.MA-02-05 + RS.AN-03/06/07/08 + RS.CO-02/03 + RC.RP-06
+  After-Action     ──► RS.AN-03 + RC.RP-06.R1 + RS.AN-06.R1 + RS.AN-07.R1
+  IR Policy        ──► GV.PO.R1 + GV.RR-02 + ID.IM-04 + GV.OC-03.R1
+  Unknown          ──► Ask user to clarify; default to Table 3 full scope
+```
+
+---
+
+## Philosophy
+
+The auditor's single function is to determine whether written, documented evidence satisfies each NIST recommendation — not whether the organization *probably* does the right thing. A well-intentioned organization with no written artifact scores Not Met, because the R-item requires the practice to be expressible, communicable, and auditable.
 
 ---
 
 ## Workflow
 
+**Before auditing: load the catalog**
+
+Read `references/r-item-catalog.md` to get the full 89-item R-item list and the document-type scope map. Do not rely on memory for R-item text — always quote from the catalog verbatim.
+
 ```
-1. Load R-item Catalog
+1. Load R-item Catalog (references/r-item-catalog.md)
    ↓
-2. Classify Document & Set Scope
+2. Classify Document & Set Scope (scope map in catalog)
    ↓
-3. Audit Each R-item
+3. Audit Each In-Scope R-item (Met / Partial / Not Met / N/A)
    ↓
-4. Score Results
+4. Score Results (formula in catalog)
    ↓
-5. Generate Audit Report
+5. Generate Audit Report (format below)
 ```
 
-### Step 1: R-item Catalog
-
-The complete R-item catalog from SP 800-61r3, grouped by priority tier:
-
-**HIGH PRIORITY — Table 3 (all R-items here are critical)**
-
-**DE.CM (Continuous Monitoring)**
-- `DE.CM.R1` — Continuous monitoring must cover: networks/network services; computing HW/SW/runtime/data; physical environment; personnel activity and technology usage; external service provider activities
-- `DE.CM.R2` — Tune monitoring technologies to reduce false positives and false negatives to acceptable levels
-- `DE.CM-01.R1` — Monitoring must include wired/wireless networks, network communications/flows, network services (DNS, BGP), and unauthorized/rogue networks
-- `DE.CM-02.R1` — Physical environment monitoring must include all access attempts, movement of people/equipment, and tampering signs
-- `DE.CM-03.R1` — Personnel activity monitoring must include anomalous user activity, authentication attempts, and deception technology
-- `DE.CM-06.R1` — External provider monitoring must include remote/on-site admin, maintenance activities, deviations from expected cloud/ISP behavior
-- `DE.CM-09.R1` — Monitor email, web, file sharing, collaboration services for malware/phishing/exfil
-- `DE.CM-09.R2` — Monitor authentication attempts for credential attacks and unauthorized credential use
-- `DE.CM-09.R3` — Monitor software/hardware configurations for deviations from security baselines
-- `DE.CM-09.R4` — Monitor hardware/software including cybersecurity protection mechanisms for tampering/failure/compromise
-- `DE.CM-09.R5` — Monitor endpoints for cyber health issues (missing patches, malware, unauthorized software)
-
-**DE.AE (Adverse Event Analysis)**
-- `DE.AE.R1` — Rely on technical solutions to filter large event datasets to human-viewable subsets
-- `DE.AE.R2` — Proactively find incidents earlier in the attack life cycle
-- `DE.AE-02.R1` — Use SIEM/SOAR to continuously monitor log events for known malicious/suspicious activity
-- `DE.AE-02.R2` — Use up-to-date CTI in log analysis tools to improve detection accuracy
-- `DE.AE-02.R3` — Regularly conduct manual reviews for technologies that cannot be sufficiently auto-monitored
-- `DE.AE-03.R1` — Constantly transfer log data from sources to a small number of log servers
-- `DE.AE-03.R2` — Use event correlation technology (SIEM/SOAR) to gather related data from multiple sources
-- `DE.AE-03.R3` — Use CTI to correlate events among log sources
-- `DE.AE-04.R1` — Estimate impact/scope through automated (SIEM/SOAR) and/or manual means; review and refine estimates
-- `DE.AE-06.R1` — Generate alerts and provide to cybersecurity/IR tools and staff (SOC, incident responders)
-- `DE.AE-06.R2` — Make log analysis findings accessible to incident responders at all times
-- `DE.AE-07.R1` — Integrate up-to-date CTI and asset inventories into adverse event analysis
-- `DE.AE-07.R2` — Rapidly acquire and analyze vulnerability disclosures from suppliers/vendors/advisories
-- `DE.AE-08.R1` — Apply incident criteria to analyzed activity; consider known false positives to determine incident declaration
-
-**RS.MA (Incident Management)**
-- `RS.MA.R1` — Do not handle incidents on first-come, first-served basis — use risk evaluation factors
-- `RS.MA.R2` — Base triage/prioritization/escalation/recovery on risk evaluation factors
-- `RS.MA.R3` — Track incident response status with: summary, IoCs, status/timeframe per action, next steps
-- `RS.MA-01.R1` — Detection technologies should automatically report confirmed incidents
-- `RS.MA-02.R1` — Perform preliminary review to verify incident occurred, estimate severity and urgency
-- `RS.MA-02.R2` — Have mechanisms for third parties to report possible incidents; monitor reports seriously
-- `RS.MA-03.R1` — Perform detailed review to categorize by incident type (data breach, ransomware, account takeover, DoS)
-- `RS.MA-03.R2` — Prioritize based on scope, likely impact, time-critical nature, resource availability
-- `RS.MA-03.R3` — Select incident response strategies balancing recovery speed vs. investigation need
-- `RS.MA-04.R1` — Track and validate status of all ongoing incidents to identify escalation needs
-- `RS.MA-05.R1` — Apply incident recovery criteria to determine when recovery should be initiated
-- `RS.MA-05.R2` — Consider operational disruption of recovery activities when deciding recovery timing
-
-**RS.AN (Incident Analysis)**
-- `RS.AN-03.R1` — Determine sequence of events and which assets/resources were involved in each event
-- `RS.AN-03.R2` — Determine which vulnerabilities, threats, and threat actors were involved (direct/indirect)
-- `RS.AN-03.R3` — Analyze incident for underlying or systemic root causes
-- `RS.AN-03.R4` — Check any deployed cyber deception technology for additional attacker behavior info
-- `RS.AN-06.R1` — Safeguard confidentiality and integrity of IR records; restrict to authorized personnel only
-- `RS.AN-07.R1` — Collect and retain evidence per evidence preservation procedures and data retention policies; consider prosecution possibility
-- `RS.AN-08.R1` — Look for IoCs, evidence of persistence, and other signs on both targeted and potential target assets
-
-**RS.CO (Incident Reporting and Communication)**
-- `RS.CO.R1` — Have mechanisms in place in advance to coordinate with affected parties about incidents
-- `RS.CO-02.R1` — When incident analyzed/prioritized, coordinate with appropriate individuals inside and outside the organization
-- `RS.CO-02.R2` — Follow established procedures: what must be reported, to whom, at what times
-- `RS.CO-02.R3` — Perform notifications in compliance with incident notification laws/regulations for organization's sectors and geographies
-- `RS.CO-02.R4` — Notify affected third parties of data breaches per regulatory, legal, contractual requirements
-- `RS.CO-02.R5` — Notify law enforcement and regulatory bodies per criteria in IR plan and management approval
-- `RS.CO-03.R1` — Securely share information consistent with response plans and information sharing agreements
-- `RS.CO-03.R2` — Regularly update senior leadership on status of major incidents
-- `RS.CO-03.R3` — Notify human resources when malicious insider activity has occurred
-- `RS.CO-03.R4` — Establish and follow media communications procedures for IR that comply with org policies
-
-**RS.MI (Incident Mitigation)**
-- `RS.MI-01.R1` — Allow incident handlers to manually select and perform containment actions in addition to or instead of automated measures
-- `RS.MI-02.R1` — Identify all affected hosts and services so all flaws and weaknesses can be remediated
-
-**RC.RP (Recovery Plan Execution)**
-- `RC.RP-01.R1` — Begin recovery procedures during or after incident response processes
-- `RC.RP-01.R2` — Inform all individuals with recovery responsibilities about plans and required authorizations
-- `RC.RP-02.R1` — Recovery actions must account for timeliness, precision, and reliability
-- `RC.RP-02.R2` — Select recovery actions based on IR plan criteria and available resources
-- `RC.RP-02.R3` — Change planned recovery actions based on reassessment of needs/resources
-- `RC.RP-03.R1` — Check restoration assets for indicators of compromise, file corruption, and integrity issues before use
-- `RC.RP-04.R1` — Validate that essential services are restored in the appropriate order
-- `RC.RP-04.R2` — Work with system owners to confirm successful restoration and return to normal operations
-- `RC.RP-04.R3` — Monitor performance of restored systems to verify adequacy of restoration
-- `RC.RP-05.R1` — Check restored assets for IoCs; remediate root causes before production use
-- `RC.RP-05.R2` — Verify correctness and adequacy of restoration actions before putting restored system online
-- `RC.RP-06.R1` — Prepare an after-action report documenting the incident, response/recovery actions, and lessons learned
-
-**RC.CO (Recovery Communication)**
-- `RC.CO-03.R1` — Securely share recovery information including restoration progress
-- `RC.CO-03.R2` — Regularly update senior leadership on recovery status and restoration progress for major incidents
-- `RC.CO-03.R3` — Follow contract rules for incident information sharing between org and suppliers
-- `RC.CO-03.R4` — Coordinate crisis communication between the organization and its critical suppliers
-- `RC.CO-04.R1` — Follow org's breach notification procedures for recovering from a data breach incident
-- `RC.CO-04.R2` — Explain the steps being taken to recover and to prevent recurrence
-
-**MEDIUM PRIORITY — Table 2 (selected)**
-
-- `GV.OC-03.R1` — Cybersecurity requirements include all IR-related requirements (incident notification, data breach reporting)
-- `GV.RM-03.R1` — Incident-related decision-making informed by other risk types (privacy, operational, safety, reputational, AI)
-- `GV.RM-06` — Have a standardized method for calculating, documenting, categorizing, and prioritizing cybersecurity risks
-- `GV.RR.R1` — Cybersecurity roles/responsibilities/authorities should include incident response
-- `GV.RR-02.R1` — All roles/responsibilities involving IR should be documented in organizational policies
-- `GV.RR-02.R2` — All appropriate individuals/parties should be designated the authority to fulfill IR responsibilities
-- `GV.PO.R1` — Cybersecurity policies should include an incident response policy
-- `GV.SC-05.R1` — Supply chain risk management requirements include cybersecurity performance, vulnerability/threat/incident disclosure/sharing
-- `GV.SC-08` — Relevant suppliers included in incident planning, response, and recovery activities
-- `ID.AM-01.R1` — Make current, auto-updated inventories of internal/external hardware available for vulnerability finding, monitoring, and shadow IT detection
-- `ID.AM-02.R1` — Make current, auto-updated inventories of internal/external SW/services available
-- `ID.RA-01.R1` — Understand current vulnerabilities to make informed risk decisions (all types: FW, misconfigs, design weaknesses, physical, integrity violations)
-- `ID.RA-03.R1` — Identify internal/external threats during routine operations and from CTI
-- `ID.RA-05.R1` — Use existing risk estimation mechanisms for incident response purposes
-- `ID.IM-01.R1` — Periodically evaluate IR program performance to identify problems and deficiencies
-- `ID.IM-04.R1` — Synchronize business continuity plans with incident response plans
-- `ID.IM-04.R2` — Review and update all cybersecurity plans periodically or when significant improvements are needed
-- `ID.IM-04.R3` — Base each cybersecurity plan on the organization's unique requirements, mission, size, structure
-- `ID.IM-04.R4` — Each plan identifies resources and management support needed
-- `PR.AT-02.R1` — Role-based training includes incident-related responsibilities
-- `PR.DS-11` — Backups created, protected, maintained, and tested (particularly important for recovery)
-
-### Step 2: Classify Document & Set Scope
-
-Determine relevant R-item scope based on document type:
-
-| Document Type       | Primary R-item Scope |
-|---------------------|----------------------|
-| IR Plan             | All Table 3 R-items + ID.IM-04.R1–R4 + GV.PO.R1 + GV.RR-02.R1/R2 |
-| Playbook            | RS.MA, RS.AN, RS.MI, RC.RP R-items + DE.AE-08.R1 |
-| Incident Report     | RS.MA-02–05, RS.AN-03, RS.AN-06/07/08, RS.CO-02/03, RC.RP-06 |
-| After-Action Report | RS.AN-03, RC.RP-06.R1, ID.IM-03 (no R), RS.AN-06.R1, RS.AN-07.R1 |
-| IR Policy           | GV.PO.R1, GV.RR-02.R1/R2, ID.IM-04.R1–R4, GV.OC-03.R1 |
-
-### Step 3: Audit Each R-item
+### Auditing Each R-item
 
 For each in-scope R-item:
-1. Search document for evidence satisfying the requirement
+1. Search document for direct evidence satisfying the requirement
 2. Assign status:
-   - **Met** — document clearly satisfies the R-item requirement
-   - **Partial** — document addresses the concept but misses key sub-requirements
+   - **Met** — document clearly satisfies all sub-requirements; quote the supporting text
+   - **Partial** — document addresses the concept but misses named sub-requirements; specify the gap
    - **Not Met** — no evidence found
-   - **N/A** — not applicable to this document type
-
-### Step 4: Score Results
-
-Count Met/Partial/Not Met across priority tiers. Compute:
-- High-priority R-items: Met + Partial/2 ÷ Total
-- Weighted score: (High met × 3) + (Medium met × 2) ÷ total weighted
-
-### Step 5: Generate Audit Report
+   - **N/A** — not applicable to this document type per scope map
 
 ---
 
@@ -209,41 +106,29 @@ R-items audited: [N]
     Requirement: "Apply incident criteria to known and assumed characteristics
     of analyzed activity, and consider known false positives to determine
     whether an incident should be declared."
-    Finding: No incident declaration criteria or thresholds defined in document.
+    Finding: No incident declaration criteria or thresholds defined.
     Severity: Critical — incidents may be declared inconsistently or too late.
 
-[✗] RS.MA-03.R1 — Incident Categorization
-    Requirement: "Perform a more detailed review to categorize incidents by
-    type (e.g., data breach, ransomware, account takeover, denial of service)."
-    Finding: Document references 'incident types' but provides no categorization
-    taxonomy or guidance for categorizing a specific incident.
-    Severity: Significant.
-
 ━━━ PARTIAL — High Priority (Table 3) ━━━
 
 [~] RS.MA-02.R1 — Triage and Validation
     Requirement: "Perform preliminary review... estimate severity AND urgency."
-    Finding: Severity tiers (P1–P4) are defined, but no urgency or time-based
-    response SLA criteria are present.
+    Finding: Severity tiers (P1–P4) defined but no urgency/SLA criteria present.
     Gap: Add time-to-respond criteria alongside severity classification.
 
 ━━━ MET — High Priority (Table 3) ━━━
 
 [✓] RS.MI-01.R1 — Manual Containment Selection
-    Evidence: §5.2 "Incident handlers may override automated containment and
-    manually select containment actions based on incident context."
-
-...
+    Evidence: §5.2 "Incident handlers may override automated containment."
 
 ━━━ MEDIUM PRIORITY (Table 2) ━━━
 [✗] GV.PO.R1 — IR policy not referenced in document
 [✓] ID.IM-04.R2 — Review cycle stated as annual in §1.3
-...
 
 ━━━ SCORECARD ━━━
 High-priority R-items:   Met: 14  Partial: 6  Not Met: 8  (Total: 28)
 Medium-priority R-items: Met: 8   Partial: 3  Not Met: 7  (Total: 18)
-Recommendation compliance: 57% (High), 49% (Medium)
+High compliance: 57%  |  Weighted composite: 53%
 
 Top 3 highest-impact Not Met items:
 1. DE.AE-08.R1 — Incident declaration criteria (blocks all RS.MA)
@@ -253,6 +138,31 @@ Top 3 highest-impact Not Met items:
 
 ---
 
+## NEVER
+
+- **NEVER mark an R-item Met without a direct quote from the document** — inference, organizational reputation, or "they probably do this" does not count; a reviewer must be able to turn to the page and see it
+- **NEVER conflate C-items (should consider) with R-items (should do)** — they have different obligation weights; mixing them inflates compliance scores and masks real gaps; if both apply to an element, surface the R-item first
+- **NEVER omit verbatim R-item text from SP 800-61r3 in any Not Met or Partial finding** — without the exact requirement, remediation authors can't write to the standard; the finding becomes unactionable
+- **NEVER assess a document at a scope broader than its stated purpose** — an incident playbook is Not Applicable for GV.PO.R1 (IR Policy), not Not Met; marking it Not Met creates false negatives that distort program-level scoring
+- **NEVER skip RS.MA.R1 (risk-based prioritization) when RS.MA-03.R2 or RS.MA-03.R3 appears met** — organizations frequently document categorization criteria but fail to document the risk *weighting* mechanism; these are different requirements
+- **NEVER accept a reference to an external document as Met without verifying the external document was provided** — "see our SOP-IR-001" is Partial at best; the R-item is met only if the SOP is in scope and its content is evidenced
+- **NEVER conflate RC.RP-05.R1 (pre-production IoC check) with RS.AN-08.R1 (ongoing persistence check)** — similar language, different lifecycle stage; conflating them produces duplicate Mets and hidden Not Mets
+
+---
+
+## When Things Go Wrong
+
+| Situation | Diagnosis | Response |
+|-----------|-----------|----------|
+| Document is >100 pages (large IR program, full policy suite) | Full audit will be noisy; analyst loses thread | Ask user which CSF Function or Table to prioritize; audit one section at a time and aggregate |
+| Document is ambiguous — could be an IR plan or a playbook | Scope map produces different R-item sets; wrong call invalidates the audit | Ask: "Is this a program-level plan or a procedural runbook?" before setting scope |
+| R-item text partially satisfied by two different sections | Both sections are fragmented; individually each is Partial | Mark Partial, quote both sections, note that combined they approach Met but neither section is independently complete |
+| User says "just give me a quick pass, skip the details" | Temptation to skip Not Met reasoning | Still assign Met/Partial/Not Met for every in-scope item; compress the rationale to one line, but do not omit the status |
+| Document uses different terminology (e.g., "severity matrix" vs. NIST's "risk evaluation factors") | Terminology mismatch may mask genuine compliance | Map the term, quote it, mark Met or Partial based on whether the *content* satisfies the requirement — terminology difference alone is not a finding |
+| Only a summary or excerpt is provided (not the full document) | Scope is necessarily incomplete | State clearly in the report header: "Audit based on excerpt — findings reflect visible content only; full document may satisfy additional R-items" |
+
+---
+
 ## Deliverable
 
 A citation-anchored audit report with Met/Partial/Not Met assessments for all in-scope R-items, suitable for:
@@ -262,9 +172,8 @@ A citation-anchored audit report with Met/Partial/Not Met assessments for all in
 
 ---
 
-## NEVER
+## Reference Files
 
-- **NEVER mark an R-item Met without direct documentary evidence** — inference does not count
-- **NEVER omit the verbatim R-item text from SP 800-61r3** — every finding must include the exact requirement
-- **NEVER conflate C-items with R-items** — C-items are "should consider" (lower obligation); track them separately
-- **NEVER assess C-items unless R-items for the same element are fully covered** — prioritize the mandatory recommendations first
+| File | Purpose | Load When |
+|------|---------|-----------|
+| `references/r-item-catalog.md` | Complete 89-item R-item catalog with scope maps and scoring formula | Always — load before step 1 |
diff --git a/skills/nist-800-61r3-recommendation-auditor/references/r-item-catalog.md b/skills/nist-800-61r3-recommendation-auditor/references/r-item-catalog.md
new file mode 100644
index 0000000..61cb673
--- /dev/null
+++ b/skills/nist-800-61r3-recommendation-auditor/references/r-item-catalog.md
@@ -0,0 +1,189 @@
+# NIST SP 800-61r3 R-Item Catalog (April 2025)
+
+Full catalog of Recommendation (R) items from the SP 800-61r3 CSF 2.0 Community Profile.
+R-items represent "the organization **should** do this" — highest obligation level.
+C-items ("should consider") are tracked separately; N-items are informational only.
+
+---
+
+## HIGH PRIORITY — Table 3 (Respond/Recover functions)
+
+### DE.CM (Continuous Monitoring)
+
+| ID | Requirement |
+|----|-------------|
+| `DE.CM.R1` | Continuous monitoring must cover: networks/network services; computing HW/SW/runtime/data; physical environment; personnel activity and technology usage; external service provider activities |
+| `DE.CM.R2` | Tune monitoring technologies to reduce false positives and false negatives to acceptable levels |
+| `DE.CM-01.R1` | Monitoring must include wired/wireless networks, network communications/flows, network services (DNS, BGP), and unauthorized/rogue networks |
+| `DE.CM-02.R1` | Physical environment monitoring must include all access attempts, movement of people/equipment, and tampering signs |
+| `DE.CM-03.R1` | Personnel activity monitoring must include anomalous user activity, authentication attempts, and deception technology |
+| `DE.CM-06.R1` | External provider monitoring must include remote/on-site admin, maintenance activities, deviations from expected cloud/ISP behavior |
+| `DE.CM-09.R1` | Monitor email, web, file sharing, collaboration services for malware/phishing/exfil |
+| `DE.CM-09.R2` | Monitor authentication attempts for credential attacks and unauthorized credential use |
+| `DE.CM-09.R3` | Monitor software/hardware configurations for deviations from security baselines |
+| `DE.CM-09.R4` | Monitor hardware/software including cybersecurity protection mechanisms for tampering/failure/compromise |
+| `DE.CM-09.R5` | Monitor endpoints for cyber health issues (missing patches, malware, unauthorized software) |
+
+### DE.AE (Adverse Event Analysis)
+
+| ID | Requirement |
+|----|-------------|
+| `DE.AE.R1` | Rely on technical solutions to filter large event datasets to human-viewable subsets |
+| `DE.AE.R2` | Proactively find incidents earlier in the attack life cycle |
+| `DE.AE-02.R1` | Use SIEM/SOAR to continuously monitor log events for known malicious/suspicious activity |
+| `DE.AE-02.R2` | Use up-to-date CTI in log analysis tools to improve detection accuracy |
+| `DE.AE-02.R3` | Regularly conduct manual reviews for technologies that cannot be sufficiently auto-monitored |
+| `DE.AE-03.R1` | Constantly transfer log data from sources to a small number of log servers |
+| `DE.AE-03.R2` | Use event correlation technology (SIEM/SOAR) to gather related data from multiple sources |
+| `DE.AE-03.R3` | Use CTI to correlate events among log sources |
+| `DE.AE-04.R1` | Estimate impact/scope through automated (SIEM/SOAR) and/or manual means; review and refine estimates |
+| `DE.AE-06.R1` | Generate alerts and provide to cybersecurity/IR tools and staff (SOC, incident responders) |
+| `DE.AE-06.R2` | Make log analysis findings accessible to incident responders at all times |
+| `DE.AE-07.R1` | Integrate up-to-date CTI and asset inventories into adverse event analysis |
+| `DE.AE-07.R2` | Rapidly acquire and analyze vulnerability disclosures from suppliers/vendors/advisories |
+| `DE.AE-08.R1` | Apply incident criteria to analyzed activity; consider known false positives to determine incident declaration |
+
+### RS.MA (Incident Management)
+
+| ID | Requirement |
+|----|-------------|
+| `RS.MA.R1` | Do not handle incidents on first-come, first-served basis — use risk evaluation factors |
+| `RS.MA.R2` | Base triage/prioritization/escalation/recovery on risk evaluation factors |
+| `RS.MA.R3` | Track incident response status with: summary, IoCs, status/timeframe per action, next steps |
+| `RS.MA-01.R1` | Detection technologies should automatically report confirmed incidents |
+| `RS.MA-02.R1` | Perform preliminary review to verify incident occurred, estimate severity and urgency |
+| `RS.MA-02.R2` | Have mechanisms for third parties to report possible incidents; monitor reports seriously |
+| `RS.MA-03.R1` | Perform detailed review to categorize by incident type (data breach, ransomware, account takeover, DoS) |
+| `RS.MA-03.R2` | Prioritize based on scope, likely impact, time-critical nature, resource availability |
+| `RS.MA-03.R3` | Select incident response strategies balancing recovery speed vs. investigation need |
+| `RS.MA-04.R1` | Track and validate status of all ongoing incidents to identify escalation needs |
+| `RS.MA-05.R1` | Apply incident recovery criteria to determine when recovery should be initiated |
+| `RS.MA-05.R2` | Consider operational disruption of recovery activities when deciding recovery timing |
+
+### RS.AN (Incident Analysis)
+
+| ID | Requirement |
+|----|-------------|
+| `RS.AN-03.R1` | Determine sequence of events and which assets/resources were involved in each event |
+| `RS.AN-03.R2` | Determine which vulnerabilities, threats, and threat actors were involved (direct/indirect) |
+| `RS.AN-03.R3` | Analyze incident for underlying or systemic root causes |
+| `RS.AN-03.R4` | Check any deployed cyber deception technology for additional attacker behavior info |
+| `RS.AN-06.R1` | Safeguard confidentiality and integrity of IR records; restrict to authorized personnel only |
+| `RS.AN-07.R1` | Collect and retain evidence per evidence preservation procedures and data retention policies; consider prosecution possibility |
+| `RS.AN-08.R1` | Look for IoCs, evidence of persistence, and other signs on both targeted and potential target assets |
+
+### RS.CO (Incident Reporting and Communication)
+
+| ID | Requirement |
+|----|-------------|
+| `RS.CO.R1` | Have mechanisms in place in advance to coordinate with affected parties about incidents |
+| `RS.CO-02.R1` | When incident analyzed/prioritized, coordinate with appropriate individuals inside and outside the organization |
+| `RS.CO-02.R2` | Follow established procedures: what must be reported, to whom, at what times |
+| `RS.CO-02.R3` | Perform notifications in compliance with incident notification laws/regulations for organization's sectors and geographies |
+| `RS.CO-02.R4` | Notify affected third parties of data breaches per regulatory, legal, contractual requirements |
+| `RS.CO-02.R5` | Notify law enforcement and regulatory bodies per criteria in IR plan and management approval |
+| `RS.CO-03.R1` | Securely share information consistent with response plans and information sharing agreements |
+| `RS.CO-03.R2` | Regularly update senior leadership on status of major incidents |
+| `RS.CO-03.R3` | Notify human resources when malicious insider activity has occurred |
+| `RS.CO-03.R4` | Establish and follow media communications procedures for IR that comply with org policies |
+
+### RS.MI (Incident Mitigation)
+
+| ID | Requirement |
+|----|-------------|
+| `RS.MI-01.R1` | Allow incident handlers to manually select and perform containment actions in addition to or instead of automated measures |
+| `RS.MI-02.R1` | Identify all affected hosts and services so all flaws and weaknesses can be remediated |
+
+### RC.RP (Recovery Plan Execution)
+
+| ID | Requirement |
+|----|-------------|
+| `RC.RP-01.R1` | Begin recovery procedures during or after incident response processes |
+| `RC.RP-01.R2` | Inform all individuals with recovery responsibilities about plans and required authorizations |
+| `RC.RP-02.R1` | Recovery actions must account for timeliness, precision, and reliability |
+| `RC.RP-02.R2` | Select recovery actions based on IR plan criteria and available resources |
+| `RC.RP-02.R3` | Change planned recovery actions based on reassessment of needs/resources |
+| `RC.RP-03.R1` | Check restoration assets for indicators of compromise, file corruption, and integrity issues before use |
+| `RC.RP-04.R1` | Validate that essential services are restored in the appropriate order |
+| `RC.RP-04.R2` | Work with system owners to confirm successful restoration and return to normal operations |
+| `RC.RP-04.R3` | Monitor performance of restored systems to verify adequacy of restoration |
+| `RC.RP-05.R1` | Check restored assets for IoCs; remediate root causes before production use |
+| `RC.RP-05.R2` | Verify correctness and adequacy of restoration actions before putting restored system online |
+| `RC.RP-06.R1` | Prepare an after-action report documenting the incident, response/recovery actions, and lessons learned |
+
+### RC.CO (Recovery Communication)
+
+| ID | Requirement |
+|----|-------------|
+| `RC.CO-03.R1` | Securely share recovery information including restoration progress |
+| `RC.CO-03.R2` | Regularly update senior leadership on recovery status and restoration progress for major incidents |
+| `RC.CO-03.R3` | Follow contract rules for incident information sharing between org and suppliers |
+| `RC.CO-03.R4` | Coordinate crisis communication between the organization and its critical suppliers |
+| `RC.CO-04.R1` | Follow org's breach notification procedures for recovering from a data breach incident |
+| `RC.CO-04.R2` | Explain the steps being taken to recover and to prevent recurrence |
+
+---
+
+## MEDIUM PRIORITY — Table 2 (Govern/Identify/Protect functions)
+
+| ID | Requirement |
+|----|-------------|
+| `GV.OC-03.R1` | Cybersecurity requirements include all IR-related requirements (incident notification, data breach reporting) |
+| `GV.RM-03.R1` | Incident-related decision-making informed by other risk types (privacy, operational, safety, reputational, AI) |
+| `GV.RM-06` | Have a standardized method for calculating, documenting, categorizing, and prioritizing cybersecurity risks |
+| `GV.RR.R1` | Cybersecurity roles/responsibilities/authorities should include incident response |
+| `GV.RR-02.R1` | All roles/responsibilities involving IR should be documented in organizational policies |
+| `GV.RR-02.R2` | All appropriate individuals/parties should be designated the authority to fulfill IR responsibilities |
+| `GV.PO.R1` | Cybersecurity policies should include an incident response policy |
+| `GV.SC-05.R1` | Supply chain risk management requirements include cybersecurity performance, vulnerability/threat/incident disclosure/sharing |
+| `GV.SC-08` | Relevant suppliers included in incident planning, response, and recovery activities |
+| `ID.AM-01.R1` | Make current, auto-updated inventories of internal/external hardware available for vulnerability finding, monitoring, and shadow IT detection |
+| `ID.AM-02.R1` | Make current, auto-updated inventories of internal/external SW/services available |
+| `ID.RA-01.R1` | Understand current vulnerabilities to make informed risk decisions (all types: FW, misconfigs, design weaknesses, physical, integrity violations) |
+| `ID.RA-03.R1` | Identify internal/external threats during routine operations and from CTI |
+| `ID.RA-05.R1` | Use existing risk estimation mechanisms for incident response purposes |
+| `ID.IM-01.R1` | Periodically evaluate IR program performance to identify problems and deficiencies |
+| `ID.IM-04.R1` | Synchronize business continuity plans with incident response plans |
+| `ID.IM-04.R2` | Review and update all cybersecurity plans periodically or when significant improvements are needed |
+| `ID.IM-04.R3` | Base each cybersecurity plan on the organization's unique requirements, mission, size, structure |
+| `ID.IM-04.R4` | Each plan identifies resources and management support needed |
+| `PR.AT-02.R1` | Role-based training includes incident-related responsibilities |
+| `PR.DS-11` | Backups created, protected, maintained, and tested (particularly important for recovery) |
+
+---
+
+## Quick Count
+
+- **High-priority R-items (Table 3):** 68 items across 8 control families
+- **Medium-priority R-items (Table 2):** 21 items across 6 control families
+- **Total auditable R-items:** 89
+
+---
+
+## Document-Type Scope Map
+
+| Document Type | Primary R-item Scope |
+|---------------|----------------------|
+| IR Plan | All Table 3 R-items + ID.IM-04.R1–R4 + GV.PO.R1 + GV.RR-02.R1/R2 |
+| Playbook | RS.MA, RS.AN, RS.MI, RC.RP R-items + DE.AE-08.R1 |
+| Incident Report | RS.MA-02–05, RS.AN-03, RS.AN-06/07/08, RS.CO-02/03, RC.RP-06 |
+| After-Action Report | RS.AN-03, RC.RP-06.R1, RS.AN-06.R1, RS.AN-07.R1 |
+| IR Policy | GV.PO.R1, GV.RR-02.R1/R2, ID.IM-04.R1–R4, GV.OC-03.R1 |
+| Detection Runbook | DE.CM-01–09, DE.AE-02/03/04/06/07/08 |
+| BCP/DRP | RC.RP-01–06, ID.IM-04.R1, ID.IM-04.R4 |
+
+---
+
+## Scoring Formula
+
+**High-priority compliance rate:**
+`(Met + Partial×0.5) ÷ Total_High × 100`
+
+**Weighted composite:**
+`(High_met × 3 + Med_met × 2) ÷ (High_total × 3 + Med_total × 2) × 100`
+
+**Tier thresholds:**
+- 90%+ High = Strong compliance
+- 75–89% High = Adequate with gaps
+- 60–74% High = Significant gaps
+- <60% High = Non-compliant
diff --git a/skills/nist-800-61r3-shared/references/csf-element-registry.md b/skills/nist-800-61r3-shared/references/csf-element-registry.md
new file mode 100644
index 0000000..e6942a6
--- /dev/null
+++ b/skills/nist-800-61r3-shared/references/csf-element-registry.md
@@ -0,0 +1,209 @@
+# NIST SP 800-61r3 — Canonical CSF 2.0 Element Registry
+
+Full element set from NIST SP 800-61r3 (April 2025), Tables 2 and 3.
+This is the single authoritative reference for all skills in the nist-800-61r3 cluster.
+
+Consuming skills:
+- `nist-800-61r3-csf-mapper` — uses element list and Document Type scope table
+- `nist-800-61r3-gap-analyzer` — uses element list, severity mapping, dependency graph
+- `nist-800-61r3-maturity-scorer` — uses element list with weights and scoring summary
+
+---
+
+## TABLE 3 — Incident Response (HIGH Priority / Weight 3)
+
+These are the primary IR operational elements. A missing or partial status for any of these
+is a gap. The weight column is used by the maturity scorer; severity mapping is used by the
+gap analyzer.
+
+### DE — Detect
+
+| CSF ID | Weight | Description |
+|--------|--------|-------------|
+| DE.CM | 3 | Assets are monitored to find anomalies, IoCs, and other potential adverse events |
+| DE.CM-01 | 3 | Networks and network services are monitored to find potentially adverse events |
+| DE.CM-02 | 3 | Physical environment is monitored |
+| DE.CM-03 | 3 | Personnel activity and technology usage are monitored to find potentially adverse events |
+| DE.CM-06 | 3 | External service provider activities and services are monitored to find potentially adverse events |
+| DE.CM-09 | 3 | Computing hardware and software, runtime environments, and their data are monitored |
+| DE.AE | 3 | Anomalies, indicators of compromise, and other potentially adverse events are analyzed to characterize the events and detect cybersecurity incidents |
+| DE.AE-02 | 3 | Potentially adverse events are analyzed to better understand associated activities |
+| DE.AE-03 | 3 | Information correlated from multiple sources |
+| DE.AE-04 | 3 | The estimated impact and scope of adverse events are understood |
+| DE.AE-06 | 3 | Information on adverse events is provided to authorized staff and tools |
+| DE.AE-07 | 3 | Cyber threat intelligence and other contextual information are integrated into the analysis of adverse events |
+| DE.AE-08 | 3 | Incidents are declared when adverse events meet the defined incident criteria |
+
+### RS — Respond
+
+| CSF ID | Weight | Description |
+|--------|--------|-------------|
+| RS.MA | 3 | Incidents are contained, eradicated, and recovered from |
+| RS.MA-01 | 3 | The incident response plan is executed in coordination with relevant third parties once an incident is declared |
+| RS.MA-02 | 3 | Incident reports are triaged to validate the incident, categorize the incident, and prioritize incident handling |
+| RS.MA-03 | 3 | Incidents are categorized and prioritized |
+| RS.MA-04 | 3 | Incidents are escalated or elevated as needed |
+| RS.MA-05 | 3 | Criteria for initiating incident recovery are applied |
+| RS.AN | 3 | Investigations are conducted to understand what occurred and support recovery and law enforcement |
+| RS.AN-03 | 3 | Analysis is performed to establish what has occurred during an incident and the root cause of the incident |
+| RS.AN-06 | 3 | Actions performed during an investigation are recorded to preserve the integrity of the evidence and investigation |
+| RS.AN-07 | 3 | Collected incident data is preserved to support legal action |
+| RS.AN-08 | 3 | The magnitude of an incident is understood |
+| RS.CO | 3 | Response activities are coordinated with internal and external stakeholders |
+| RS.CO-02 | 3 | Internal and external stakeholders are notified of incidents |
+| RS.CO-03 | 3 | Information is shared with designated internal and external stakeholders |
+| RS.MI | 3 | Containment and mitigation activities are performed |
+| RS.MI-01 | 3 | Incidents are contained |
+| RS.MI-02 | 3 | Incidents are eradicated |
+
+### RC — Recover
+
+| CSF ID | Weight | Description |
+|--------|--------|-------------|
+| RC.RP | 3 | Restoration activities are performed to ensure operational availability |
+| RC.RP-01 | 3 | The recovery portion of the incident response plan is executed once initiated from the incident response process |
+| RC.RP-02 | 3 | Recovery actions are selected, scoped, prioritized, and performed |
+| RC.RP-03 | 3 | The integrity of backups and other restoration assets is verified before using them for restoration |
+| RC.RP-04 | 3 | Critical mission functions and cybersecurity risk management are considered to establish post-incident operational norms |
+| RC.RP-05 | 3 | The integrity of restored assets is verified, systems and services are restored, and normal operating status is confirmed |
+| RC.RP-06 | 3 | The end of incident recovery is declared based on criteria, and incident-related documentation is completed |
+| RC.CO | 3 | Restoration activities are coordinated with internal and external parties |
+| RC.CO-03 | 3 | Recovery activities and progress in restoring operational capabilities are communicated to designated internal and external stakeholders |
+| RC.CO-04 | 3 | Public updates on incident recovery are shared using approved methods and messaging |
+
+**Table 3 element count: 40**
+
+---
+
+## TABLE 2 — Preparation & Lessons Learned (MEDIUM / LOW Priority)
+
+Weight 2 = Medium (directly enables Table 3 operational capability).
+Weight 1 = Low (foundational; surface only on full-scope requests or program-level reviews).
+
+### GV — Govern
+
+| CSF ID | Weight | Priority | Description |
+|--------|--------|----------|-------------|
+| GV.OC-01 | 1 | M | The organizational mission is understood and informs cybersecurity risk management |
+| GV.OC-03 | 2 | M | Legal, regulatory, and contractual requirements are understood and managed |
+| GV.OC-05 | 1 | M | Outcomes, capabilities, and services that the organization depends on are understood |
+| GV.RM-02 | 1 | M | Risk appetite and risk tolerance statements are established, communicated, and maintained |
+| GV.RM-04 | 1 | M | Strategic direction that describes appropriate risk response options is established and communicated |
+| GV.RM-06 | 1 | M | Policies, processes, and procedures for responding to cybersecurity risks are established and communicated |
+| GV.RM-07 | 1 | L | Strategic opportunities (i.e., positive risks) are characterized and included in organizational cybersecurity risk discussions |
+| GV.RR | 2 | H | Roles, Responsibilities, Authorities (parent) |
+| GV.RR-01 | 1 | H | Organizational leadership is responsible and accountable for cybersecurity risk |
+| GV.RR-02 | 2 | H | Roles, responsibilities, and authorities related to cybersecurity risk management are established, communicated, understood, and enforced |
+| GV.RR-03 | 1 | M | Adequate resources are dedicated to cybersecurity risk management |
+| GV.RR-04 | 1 | M | Cybersecurity risk management is included in human resources practices |
+| GV.PO | 2 | H | Policy (parent) |
+| GV.PO-01 | 1 | H | Policy for managing cybersecurity risks is established based on organizational context |
+| GV.PO-02 | 1 | H | Policy for managing cybersecurity risks is reviewed, updated, communicated, and enforced |
+| GV.OV-01 | 2 | M | Cybersecurity risk management strategy outcomes are reviewed to inform and adjust strategy and direction |
+| GV.OV-02 | 1 | M | The cybersecurity risk management strategy is reviewed and adjusted to ensure coverage of organizational requirements and risks |
+| GV.OV-03 | 1 | M | Organizational cybersecurity risk management performance is evaluated and reviewed for adjustments needed |
+| GV.SC-07 | 1 | L | The risks posed by a supplier, their products and services, and other third parties are understood and monitored |
+| GV.SC-08 | 2 | M | Relevant cybersecurity requirements are included in third-party agreements |
+| GV.SC-09 | 1 | L | Supply chain security practices are integrated into cybersecurity and enterprise risk management frameworks |
+
+### ID — Identify
+
+| CSF ID | Weight | Priority | Description |
+|--------|--------|----------|-------------|
+| ID.AM-01 | 2 | M | Hardware asset inventories are maintained |
+| ID.AM-02 | 2 | M | Inventories of software, services, and systems managed by the organization are maintained |
+| ID.AM-03 | 1 | M | Representations of the organization's authorized network communication and data flows are maintained |
+| ID.AM-05 | 1 | L | Assets are prioritized based on classification, criticality, resources, and impact on the mission |
+| ID.AM-07 | 1 | M | Inventories of data and corresponding metadata for designated data types are maintained |
+| ID.AM-08 | 1 | M | Systems, hardware, software, services, and data are managed throughout their life cycles |
+| ID.RA-01 | 1 | M | Vulnerabilities in assets are identified, validated, and recorded |
+| ID.RA-02 | 2 | M | Cyber threat intelligence is received from information-sharing forums and sources |
+| ID.RA-04 | 1 | H | Appropriate risk responses are identified and prioritized |
+| ID.RA-05 | 2 | M | Threats, vulnerabilities, likelihoods, and impacts are used to understand inherent risk and inform risk response prioritization |
+| ID.RA-06 | 2 | H | Risk responses are chosen, prioritized, planned, tracked, and communicated |
+| ID.RA-07 | 1 | M | Changes and exceptions are managed, assessed for risk impact, recorded, and tracked |
+| ID.RA-10 | 1 | M | Critical suppliers are assessed prior to acquisition |
+| ID.IM-01 | 2 | H | Improvements are identified from evaluations |
+| ID.IM-02 | 2 | H | Improvements are identified from security tests and exercises, including those done in coordination with suppliers and relevant third parties |
+| ID.IM-03 | 2 | H | Improvements are identified from execution of operational processes, including incident response |
+| ID.IM-04 | 2 | H | Incident response plans and other cybersecurity plans that affect operations are established, communicated, tested, and improved |
+
+### PR — Protect
+
+| CSF ID | Weight | Priority | Description |
+|--------|--------|----------|-------------|
+| PR.AA-01 | 1 | M | Identities and credentials for authorized users, services, and hardware are managed |
+| PR.AA-05 | 1 | M | Access permissions, entitlements, and authorizations are defined in a policy, managed, enforced, and reviewed |
+| PR.AT-01 | 1 | M | Personnel are provided with awareness and training so that they can perform their cybersecurity-related tasks |
+| PR.DS-01 | 1 | M | The confidentiality, integrity, and availability of data-at-rest are protected |
+| PR.DS-02 | 1 | M | The confidentiality, integrity, and availability of data-in-transit are protected |
+| PR.DS-11 | 2 | M | Data backups are created, protected, maintained, and tested |
+| PR.IR-01 | 1 | H | Networks and environments are protected from unauthorized logical access and usage |
+| PR.IR-04 | 1 | H | Adequate resource capacity to ensure availability is maintained |
+| PR.PS-01 | 1 | M | Configuration management practices are established and applied |
+| PR.PS-04 | 2 | M | Logs and other security data are generated and retained |
+
+---
+
+## Scoring Summary (Maturity Scorer)
+
+| Priority | Weight | Elements | Max Weighted Points |
+|----------|--------|----------|---------------------|
+| High (Table 3) | 3 | 40 | 120 |
+| Medium (Table 2 selected) | 2 | 17 | 34 |
+| Low (Table 2 remaining) | 1 | ~48 | ~48 |
+| **Total** | | **~105** | **~202** |
+
+*Note: Parent elements (DE.CM, DE.AE, RS.MA, RS.AN, RS.CO, RS.MI, RC.RP, RC.CO) are scored
+separately from their children. Score parent at 1.0 only if all children are fully addressed.*
+
+---
+
+## Severity Mapping for Table 3 Gaps (Gap Analyzer)
+
+**Critical** — element absent AND directly gates another element's execution:
+- DE.AE-08 (no declaration criteria → RS.MA-* cannot fire properly)
+- RS.MA-02 (no triage/validation → severity/urgency undefined for all downstream)
+- RS.AN-03 (no root cause → eradication is guesswork, reinfection likely)
+- RS.MI-01 (no containment → incident scope grows uncontrolled)
+- RC.RP-06 (no closure declaration → legal/regulatory exposure, cost tracking fails)
+
+**Significant** — element absent but peer elements provide partial coverage or workarounds exist.
+
+**Minor** — element present but a sub-requirement (R/C/N item) is incomplete.
+
+---
+
+## Element Dependency Graph (Gap Analyzer)
+
+Understanding upstream/downstream dependency is critical for remediation ordering:
+
+```
+DE.CM → DE.AE-02 → DE.AE-03 → DE.AE-04 → DE.AE-08
+                                              ↓
+                                           RS.MA-01
+                                              ↓
+                              RS.MA-02 → RS.MA-03 → RS.MA-04
+                                                        ↓
+                                    RS.AN-03 → RS.AN-06/07/08
+                                                        ↓
+                                          RS.MI-01 → RS.MI-02
+                                                        ↓
+                              RC.RP-01 → RC.RP-02 → RC.RP-06
+```
+
+A gap upstream propagates as implied risk to every downstream element — note this in
+the remediation narrative ("fixing DE.AE-08 also resolves implied gaps in RS.MA-01 through RS.MA-04").
+
+---
+
+## Document Type — Expected CSF Scope (CSF Mapper)
+
+| Document Type | Must Have (Addressed/Partial) | Optional Coverage |
+|---------------|-------------------------------|-------------------|
+| IR Plan | GV.PO, GV.RR-02, ID.IM-04, all RS, all RC | DE, ID.IM-01/02/03 |
+| Playbook | RS.MA, RS.AN, RS.MI, DE.AE-08, RC.RP-01/02 | RS.CO, RC.CO |
+| Incident Report | RS.MA, RS.AN-03/06/07/08, RS.CO-02, RC.RP-06 | GV.RR, RS.MI |
+| After-Action Report | RS.AN-03, RC.RP-06, ID.IM-03, ID.IM-01 | ID.IM-02, ID.IM-04 (update) |
+| IR Policy | GV.PO-01/02, GV.RR-01/02, ID.IM-04 | GV.RM |
+| Risk Assessment | ID.RA (all), GV.RM, ID.AM | GV.OV |
diff --git a/skills/openapi-to-typescript/SKILL.md b/skills/openapi-to-typescript/SKILL.md
index 3f53745..d6a9300 100644
--- a/skills/openapi-to-typescript/SKILL.md
+++ b/skills/openapi-to-typescript/SKILL.md
@@ -1,344 +1,85 @@
 ---
 name: openapi-to-typescript
-description: Converts OpenAPI 3.0 JSON/YAML to TypeScript interfaces and type guards. This skill should be used when the user asks to generate types from OpenAPI, convert schema to TS, create API interfaces, or generate TypeScript types from an API specification.
+description: Converts OpenAPI 3.0 JSON/YAML to TypeScript interfaces, discriminated unions, and type guards. Use when asked to generate types from OpenAPI, convert schema to TS, create API interfaces, generate TypeScript types from a spec, or turn swagger/openapi into TypeScript.
 ---
 
-# OpenAPI to TypeScript
+## Mindset
 
-Converts OpenAPI 3.0 specifications to TypeScript interfaces and type guards.
+- **Specs lie about their own combiners.** `anyOf` is frequently used where `oneOf` was intended, and vice versa. Read the `discriminator` field and the actual schema structure — not just the combiner keyword — to determine the correct TypeScript output.
+- **Circular references are not errors.** TypeScript interfaces support recursive property types natively. Never break a circular ref by inlining or by reaching for `unknown` — generate the interface first, add properties second.
+- **`readOnly`/`writeOnly` creates two distinct shapes.** A single `$ref` schema with `readOnly` fields must yield different request and response interfaces. Generators that ignore this produce runtime bugs at the boundary.
+- **The `required` array governs optionality, not `nullable`.** A field absent from `required[]` is optional (`?`). A field with `nullable: true` is `T | null`. These are orthogonal — apply both when both are present.
+- **anyOf with a null variant is nullable, not a union.** `anyOf: [SchemaA, {type: "null"}]` means `SchemaA | null`, not an open-ended union type.
 
-**Input:** OpenAPI file (JSON or YAML)
-**Output:** TypeScript file with interfaces and type guards
+## Navigation
 
-## When to Use
+**Use this skill when**: user asks to generate TypeScript types from OpenAPI/Swagger, convert a spec to TS, create API interfaces from a JSON/YAML schema, or mentions "openapi to typescript", "swagger types", "generate interfaces from spec".
 
-- "generate types from openapi"
-- "convert openapi to typescript"
-- "create API interfaces"
-- "generate types from spec"
+**Do NOT use this skill when**:
+- The spec is OpenAPI 2.0 (Swagger) — flag the version gap; the `nullable` and component model differ significantly
+- The user wants a full API client (fetch functions, axios wrappers) — this skill generates types only
+- The user wants Zod/io-ts/valibot schemas — those are runtime validators with different trade-offs than type guards
 
-## Workflow
-
-1. Request the OpenAPI file path (if not provided)
-2. Read and validate the file (must be OpenAPI 3.0.x)
-3. Extract schemas from `components/schemas`
-4. Extract endpoints from `paths` (request/response types)
-5. Generate TypeScript (interfaces + type guards)
-6. Ask where to save (default: `types/api.ts` in current directory)
-7. Write the file
-
-## OpenAPI Validation
-
-Check before processing:
-
-```
-- Field "openapi" must exist and start with "3.0"
-- Field "paths" must exist
-- Field "components.schemas" must exist (if there are types)
+**Quick decision tree for ambiguous input:**
 ```
-
-If invalid, report the error and stop.
-
-## Type Mapping
-
-### Primitives
-
-| OpenAPI     | TypeScript   |
-|-------------|--------------|
-| `string`    | `string`     |
-| `number`    | `number`     |
-| `integer`   | `number`     |
-| `boolean`   | `boolean`    |
-| `null`      | `null`       |
-
-### Format Modifiers
-
-| Format        | TypeScript              |
-|---------------|-------------------------|
-| `uuid`        | `string` (comment UUID) |
-| `date`        | `string` (comment date) |
-| `date-time`   | `string` (comment ISO)  |
-| `email`       | `string` (comment email)|
-| `uri`         | `string` (comment URI)  |
-
-### Complex Types
-
-**Object:**
-```typescript
-// OpenAPI: type: object, properties: {id, name}, required: [id]
-interface Example {
-  id: string;      // required: no ?
-  name?: string;   // optional: with ?
-}
-```
-
-**Array:**
-```typescript
-// OpenAPI: type: array, items: {type: string}
-type Names = string[];
-```
-
-**Enum:**
-```typescript
-// OpenAPI: type: string, enum: [active, draft]
-type Status = "active" | "draft";
-```
-
-**oneOf (Union):**
-```typescript
-// OpenAPI: oneOf: [{$ref: Cat}, {$ref: Dog}]
-type Pet = Cat | Dog;
+Has discriminator.propertyName?
+  YES → discriminated union + narrowing helper (not structural type guard)
+  NO  → oneOf/anyOf → plain union | null check for null-only anyOf variants
+Is it allOf with inline properties?
+  YES → prefer `extends Base` over intersection `&` (cleaner IDE errors)
+  NO  → intersection type is fine
+Circular ref detected?
+  YES → use interface (not type alias) — interfaces resolve forward refs
 ```
 
-**allOf (Intersection/Extends):**
-```typescript
-// OpenAPI: allOf: [{$ref: Base}, {type: object, properties: ...}]
-interface Extended extends Base {
-  extraField: string;
-}
-```
-
-## Code Generation
-
-### File Header
-
-```typescript
-/**
- * Auto-generated from: {source_file}
- * Generated at: {timestamp}
- *
- * DO NOT EDIT MANUALLY - Regenerate from OpenAPI schema
- */
-```
-
-### Interfaces (from components/schemas)
-
-For each schema in `components/schemas`:
-
-```typescript
-export interface Product {
-  /** Product unique identifier */
-  id: string;
-
-  /** Product title */
-  title: string;
-
-  /** Product price */
-  price: number;
-
-  /** Created timestamp */
-  created_at?: string;
-}
-```
-
-- Use OpenAPI description as JSDoc
-- Fields in `required[]` have no `?`
-- Fields outside `required[]` have `?`
-
-### Request/Response Types (from paths)
-
-For each endpoint in `paths`:
-
-```typescript
-// GET /products - query params
-export interface GetProductsRequest {
-  page?: number;
-  limit?: number;
-}
-
-// GET /products - response 200
-export type GetProductsResponse = ProductList;
-
-// POST /products - request body
-export interface CreateProductRequest {
-  title: string;
-  price: number;
-}
-
-// POST /products - response 201
-export type CreateProductResponse = Product;
-```
-
-Naming convention:
-- `{Method}{Path}Request` for params/body
-- `{Method}{Path}Response` for response
-
-### Type Guards
-
-For each main interface, generate a type guard:
-
-```typescript
-export function isProduct(value: unknown): value is Product {
-  return (
-    typeof value === 'object' &&
-    value !== null &&
-    'id' in value &&
-    typeof (value as any).id === 'string' &&
-    'title' in value &&
-    typeof (value as any).title === 'string' &&
-    'price' in value &&
-    typeof (value as any).price === 'number'
-  );
-}
-```
-
-Type guard rules:
-- Check `typeof value === 'object' && value !== null`
-- For each required field: check `'field' in value`
-- For primitive fields: check `typeof`
-- For arrays: check `Array.isArray()`
-- For enums: check `.includes()`
-
-### Error Type (always include)
-
-```typescript
-export interface ApiError {
-  status: number;
-  error: string;
-  detail?: string;
-}
-
-export function isApiError(value: unknown): value is ApiError {
-  return (
-    typeof value === 'object' &&
-    value !== null &&
-    'status' in value &&
-    typeof (value as any).status === 'number' &&
-    'error' in value &&
-    typeof (value as any).error === 'string'
-  );
-}
-```
-
-## $ref Resolution
-
-When encountering `{"$ref": "#/components/schemas/Product"}`:
-1. Extract the schema name (`Product`)
-2. Use the type directly (don't resolve inline)
-
-```typescript
-// OpenAPI: items: {$ref: "#/components/schemas/Product"}
-// TypeScript:
-items: Product[]  // reference, not inline
-```
-
-## Complete Example
-
-**Input (OpenAPI):**
-```json
-{
-  "openapi": "3.0.0",
-  "components": {
-    "schemas": {
-      "User": {
-        "type": "object",
-        "properties": {
-          "id": {"type": "string", "format": "uuid"},
-          "email": {"type": "string", "format": "email"},
-          "role": {"type": "string", "enum": ["admin", "user"]}
-        },
-        "required": ["id", "email", "role"]
-      }
-    }
-  },
-  "paths": {
-    "/users/{id}": {
-      "get": {
-        "parameters": [{"name": "id", "in": "path", "required": true}],
-        "responses": {
-          "200": {
-            "content": {
-              "application/json": {
-                "schema": {"$ref": "#/components/schemas/User"}
-              }
-            }
-          }
-        }
-      }
-    }
-  }
-}
-```
-
-**Output (TypeScript):**
-```typescript
-/**
- * Auto-generated from: api.openapi.json
- * Generated at: 2025-01-15T10:30:00Z
- *
- * DO NOT EDIT MANUALLY - Regenerate from OpenAPI schema
- */
-
-// ============================================================================
-// Types
-// ============================================================================
-
-export type UserRole = "admin" | "user";
-
-export interface User {
-  /** UUID */
-  id: string;
-
-  /** Email */
-  email: string;
-
-  role: UserRole;
-}
-
-// ============================================================================
-// Request/Response Types
-// ============================================================================
-
-export interface GetUserByIdRequest {
-  id: string;
-}
-
-export type GetUserByIdResponse = User;
-
-// ============================================================================
-// Type Guards
-// ============================================================================
-
-export function isUser(value: unknown): value is User {
-  return (
-    typeof value === 'object' &&
-    value !== null &&
-    'id' in value &&
-    typeof (value as any).id === 'string' &&
-    'email' in value &&
-    typeof (value as any).email === 'string' &&
-    'role' in value &&
-    ['admin', 'user'].includes((value as any).role)
-  );
-}
-
-// ============================================================================
-// Error Types
-// ============================================================================
-
-export interface ApiError {
-  status: number;
-  error: string;
-  detail?: string;
-}
-
-export function isApiError(value: unknown): value is ApiError {
-  return (
-    typeof value === 'object' &&
-    value !== null &&
-    'status' in value &&
-    typeof (value as any).status === 'number' &&
-    'error' in value &&
-    typeof (value as any).error === 'string'
-  );
-}
-```
-
-## Common Errors
-
-| Error | Action |
-|-------|--------|
-| OpenAPI version != 3.0.x | Report that only 3.0 is supported |
-| $ref not found | List missing refs |
-| Unknown type | Use `unknown` and warn |
-| Circular reference | Use type alias with lazy reference |
+## Philosophy
+
+Generate types that serve TypeScript consumers first, OpenAPI spec authors second. A type that compiles but misleads (wrong optionality, missing discriminants, bloated guards) is worse than a type that requires a one-line manual edit.
+
+## NEVER
+
+- **NEVER generate structural type guards for discriminated unions** — when a `discriminator.propertyName` exists, check only that field; structural checks across all properties break when subtypes share field names and produce false positives.
+- **NEVER use `any` for unresolved or schema-less properties** — use `unknown`; `any` silently disables type checking for the entire call chain downstream, while `unknown` forces explicit narrowing at the consumer.
+- **NEVER resolve `$ref` inline when it creates a circular expansion** — detect circularity via a visited-set during traversal; output the name reference and move on; inlining causes infinite recursion in the generator and a broken output file.
+- **NEVER generate a single interface for schemas with mixed `readOnly`/`writeOnly` fields** — a `readOnly: true` field must be omitted from request types and present in response types; conflating them produces types that accept illegal write payloads silently.
+- **NEVER produce an index signature `[key: string]: T` without reconciling named property types** — TypeScript requires all named properties to be assignable to the index signature value type; incompatible named fields cause a compile error that will be blamed on the generator.
+- **NEVER treat `anyOf` with a single non-null variant as a union** — `anyOf: [Schema, {type: null}]` is the OAS 3.0 nullable pattern; output `Schema | null`, not `Schema | null | never`.
+- **NEVER skip the `openapi` version check before processing** — OAS 3.0.x and 3.1 use incompatible nullable conventions (`nullable: true` vs `type: [string, null]`); processing with the wrong convention silently drops nullability.
+
+## When Things Go Wrong
+
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| TypeScript error: index signature incompatible with named property | Generated `[key: string]: T` but a named field has type `U` where `U` is not assignable to `T` | Widen index type to `T \| U`, or remove index signature if `additionalProperties: false` |
+| Type guard always returns false at runtime | Structural guard on a discriminated union — sibling schemas share the checked field names | Re-generate the guard using only `discriminator.propertyName` value check |
+| Infinite loop in generator | Undetected circular `$ref` | Track visited schema names in a `Set`; on revisit, emit the name reference and return |
+| Compiled output has no `null` variant but API returns null | `nullable: true` on OAS 3.0 field was ignored, or OAS 3.1 `type: [string, null]` not parsed | Check spec version; re-apply nullable logic per version |
+| `extends` clause causes TS error: base has incompatible property | `allOf` base and inline object both declare the same field with different types | Merge the field definitions; use the more specific type and add a JSDoc note |
+| Large union type causes IDE slowdown | Enum with 30+ values generated as `"A" \| "B" \| ...` | Switch to `const` object pattern: `export const Enum = {...} as const; export type Enum = typeof Enum[keyof typeof Enum]` |
+
+## Core Workflow
+
+1. Read the file; detect `openapi:` version — abort with clear message if not 3.0.x
+2. Build a schema registry from `components/schemas` (name → schema object)
+3. Detect circular refs: traverse each schema with a visited-set; record circular pairs
+4. Process schemas in dependency order (leaf schemas first) — see [edge cases](references/edge-cases.md) for combiner and circular ref handling
+5. Generate request/response interfaces from `paths`; split `readOnly`/`writeOnly` fields correctly
+6. Write output with header comment; default path `types/api.ts`
+
+## Key Type Mappings
+
+| OpenAPI construct | TypeScript output |
+|-------------------|-------------------|
+| `type: string/number/boolean/null` | `string` / `number` / `boolean` / `null` |
+| `type: integer` | `number` |
+| `nullable: true` (OAS 3.0) | `T \| null` |
+| `type: [T, null]` (OAS 3.1) | `T \| null` |
+| `enum: [...]` (≤20 values) | `"a" \| "b" \| ...` |
+| `enum: [...]` (>20 values) | `const` object + derived type |
+| `oneOf` / `anyOf` | `A \| B` (check for discriminator first) |
+| `allOf` with `$ref` + inline | `interface Foo extends Base { extraField: T }` |
+| `additionalProperties: true` | `[key: string]: unknown` |
+| `additionalProperties: {type: T}` | `[key: string]: T` |
+| No `type` field | `unknown` |
+
+See [references/edge-cases.md](references/edge-cases.md) for: anyOf/oneOf/allOf detailed rules, discriminated union output, circular reference resolution, readOnly/writeOnly splitting, path parameter merging, and large enum patterns.
diff --git a/skills/openapi-to-typescript/references/edge-cases.md b/skills/openapi-to-typescript/references/edge-cases.md
new file mode 100644
index 0000000..0a5a828
--- /dev/null
+++ b/skills/openapi-to-typescript/references/edge-cases.md
@@ -0,0 +1,157 @@
+# OpenAPI-to-TypeScript: Edge Cases & Advanced Patterns
+
+## anyOf vs oneOf vs allOf — The Real Distinction
+
+OpenAPI authors often misuse these. Claude must interpret **intent** not just semantics:
+
+| Combiner | OpenAPI intent | TypeScript output |
+|----------|---------------|-------------------|
+| `oneOf`  | Exactly one matches (discriminated or plain union) | `A \| B` |
+| `anyOf`  | One or more match (validation rule, not a type) | `A \| B` (same as oneOf — TypeScript has no "one-or-more" type) |
+| `allOf`  | All schemas must apply (composition/inheritance) | `extends Base` or intersection `A & B` |
+
+**anyOf warning**: Specs that use `anyOf` to mean "nullable" are very common. `anyOf: [SchemaA, {type: "null"}]` → `SchemaA | null`. Do NOT generate `A | B | null` when the null variant is the only non-schema member.
+
+## Discriminated Unions (oneOf + discriminator)
+
+When a `discriminator.propertyName` is present, generate a proper tagged union:
+
+```yaml
+# OpenAPI
+oneOf:
+  - $ref: '#/components/schemas/Cat'
+  - $ref: '#/components/schemas/Dog'
+discriminator:
+  propertyName: petType
+  mapping:
+    cat: '#/components/schemas/Cat'
+    dog: '#/components/schemas/Dog'
+```
+
+```typescript
+// Correct output — NOT a plain union
+export type Pet = Cat | Dog;
+
+// Type guard uses the discriminant — NOT structural checks
+export function isPet(value: unknown): value is Pet {
+  if (typeof value !== 'object' || value === null) return false;
+  const petType = (value as any).petType;
+  return petType === 'cat' || petType === 'dog';
+}
+
+// Narrowing helper (generate alongside type guard)
+export function narrowPet(value: Pet): value is Cat {
+  return value.petType === 'cat';
+}
+```
+
+Never generate a structural type guard for a discriminated union — it becomes O(n×fields) and breaks when subschemas share field names.
+
+## Circular References
+
+Detection: a schema is circular when resolving it encounters its own `$ref` before completion. Common patterns:
+
+```yaml
+# Self-referential tree node
+TreeNode:
+  type: object
+  properties:
+    children:
+      type: array
+      items:
+        $ref: '#/components/schemas/TreeNode'
+
+# Mutually recursive (Category → Product → Category)
+Category:
+  properties:
+    featuredProduct:
+      $ref: '#/components/schemas/Product'
+Product:
+  properties:
+    category:
+      $ref: '#/components/schemas/Category'
+```
+
+**Resolution strategy** (in order):
+1. Direct self-reference in array → `children?: TreeNode[]` (TypeScript handles recursive interfaces natively — no alias needed)
+2. Direct self-reference in object property → use `interface` (not `type` alias) — interfaces support recursive property types
+3. Mutually recursive across 2+ schemas → generate all interfaces first, then add properties — TypeScript resolves forward references in interfaces
+
+**Never** break circularity by inlining expanded types — this causes infinite expansion.
+
+## additionalProperties
+
+| OpenAPI | TypeScript |
+|---------|-----------|
+| `additionalProperties: true` | `[key: string]: unknown` index signature |
+| `additionalProperties: false` | Exact interface, no index signature |
+| `additionalProperties: {type: string}` | `[key: string]: string` |
+| `additionalProperties: {$ref: X}` | `[key: string]: X` |
+| absent | Omit index signature (TypeScript default is open) |
+
+Warning: an index signature `[key: string]: T` makes ALL named properties also satisfy `T`. If a named property has an incompatible type, TypeScript will error. Use `[key: string]: T | NamedPropType` to reconcile.
+
+## nullable (OpenAPI 3.0 vs 3.1)
+
+OpenAPI 3.0.x uses `nullable: true` (not a type system feature):
+```yaml
+# OAS 3.0
+name:
+  type: string
+  nullable: true
+```
+→ `name?: string | null`
+
+OpenAPI 3.1 uses JSON Schema `type: [string, null]` or `oneOf: [{type: string}, {type: null}]`.
+Detect version from `openapi:` field before processing nullable.
+
+## Path Parameter Collision in Request Types
+
+When a path has both path params and query params, generate separate interfaces or a merged one:
+
+```typescript
+// GET /users/{id}/posts?page=1
+// Option A: merged (simpler for consumers)
+export interface GetUserPostsRequest {
+  id: string;      // path param
+  page?: number;   // query param
+}
+
+// Option B: split (more faithful to OpenAPI structure)
+export interface GetUserPostsPathParams { id: string; }
+export interface GetUserPostsQueryParams { page?: number; }
+```
+
+Default to merged (Option A) unless the spec has >5 path params + >3 query params (then split improves readability).
+
+## readOnly / writeOnly Properties
+
+| Marker | Request interface | Response interface |
+|--------|------------------|--------------------|
+| `readOnly: true` | Omit the field | Include the field |
+| `writeOnly: true` | Include the field | Omit the field |
+
+Most generators ignore this. Implementing it correctly requires generating separate request/response schemas even when they share a `$ref`.
+
+## Schema with no type field
+
+OpenAPI allows schemas with no `type` (accepts any JSON value):
+```yaml
+metadata:
+  description: Arbitrary metadata
+```
+→ `metadata?: unknown` (not `any` — preserves strict mode compatibility)
+
+## Large Enum Lists
+
+When an enum has >20 values, generate a `const` object + derived type instead of a union:
+
+```typescript
+// For large enums
+export const CountryCode = {
+  US: 'US', GB: 'GB', DE: 'DE', // ... 200+ values
+} as const;
+export type CountryCode = typeof CountryCode[keyof typeof CountryCode];
+```
+
+This enables exhaustiveness checking via `Object.values(CountryCode)` and avoids IDE slowdowns from giant union types.
diff --git a/skills/perplexity/SKILL.md b/skills/perplexity/SKILL.md
index dbf0d69..8b78c17 100644
--- a/skills/perplexity/SKILL.md
+++ b/skills/perplexity/SKILL.md
@@ -1,128 +1,110 @@
 ---
 name: perplexity
-description: Web search and research using Perplexity AI. Use when user says "search", "find", "look up", "ask", "research", or "what's the latest" for generic queries. NOT for library/framework docs (use Context7) or workspace questions.
+description: Web search and real-time research using Perplexity AI. Use when user says "search", "find", "look up", "ask perplexity", "research", or "what's the latest" for generic queries. NOT for library/framework docs (use Context7), workspace questions (use Nx MCP), or gt CLI (use Graphite MCP).
 ---
 
-# Perplexity Tools
+# Perplexity Skill
 
-Use ONLY when user says "search", "find", "look up", "ask", "research", or "what's the latest" for generic queries. NOT for library/framework docs (use Context7), gt CLI (use Graphite MCP), or workspace questions (use Nx MCP).
+## Mindset
 
-## Quick Reference
+- **Perplexity is a precision instrument, not a firehose** — every extra result and token costs context budget that every subsequent tool call competes for. Constraint by default, expand only on explicit need.
+- **Query phrasing determines result quality more than any parameter** — Perplexity's LLM reranks results against the query intent, not raw keywords. A tightly framed question ("postgres advisory lock timeout behavior 2024") beats a vague one ("postgres locks") every time.
+- **Search returns sources you evaluate; Ask synthesizes a position** — treat them as distinct tools, not interchangeable alternatives. Choosing wrong wastes a round-trip.
+- **The tool-selection chain is a hard prerequisite** — Perplexity is reached only after specialized MCPs (Context7, Graphite, Nx) are ruled out. Jumping straight to Perplexity for docs questions returns inferior, outdated answers.
+- **perplexity_research is a trap, not a feature** — it looks like "more thorough" but it consumes 30–50k tokens with no structural advantage over the researcher agent, which provides citations, planning, and synthesis control.
 
-**Which Perplexity tool?**
-- Need search results/URLs? → **Perplexity Search**
-- Need conversational answer? → **Perplexity Ask**
-- Need deep research? → **Researcher agent** (`/research <topic>`)
+## Navigation
 
-**NOT Perplexity - use these instead:**
-- Library/framework docs → **Context7 MCP**
-- Graphite `gt` CLI → **Graphite MCP**
-- THIS workspace → **Nx MCP**
-- Specific URL → **URL Crawler**
+**Use this skill when**:
+- User says "search", "find", "look up", "research", "what's the latest", "ask perplexity"
+- Topic is general web knowledge, current events, or best-practice surveys
+- Need to discover resources (tutorials, articles, benchmarks) not pinned to a codebase
 
-## Perplexity Search
+**Do NOT use this skill when**:
+- Library/framework API docs → **Context7 MCP** (canonical, versioned, structured)
+- Any `gt` command or Graphite workflow → **Graphite MCP**
+- Questions about THIS workspace's build/project config → **Nx MCP**
+- A specific URL needs scraping → **URL Crawler**
+- Deep multi-source synthesis with citations → **Researcher agent** (`/research`)
 
-**When to use:**
-- Generic searches, finding resources
-- Current best practices, recent information
-- Tutorial/blog post discovery
-- User says "search for...", "find...", "look up..."
-
-**Default parameters (ALWAYS USE):**
-```typescript
-mcp__perplexity__perplexity_search({
-  query: "your search query",
-  max_results: 3,           // Default is 10 - too many!
-  max_tokens_per_page: 512  // Reduce per-result content
-})
+**Quick decision tree**:
 ```
-
-**When to increase limits:**
-Only if:
-- User explicitly needs comprehensive results
-- Initial search found nothing useful
-- Complex topic needs multiple sources
-
-```typescript
-// Increased limits (use sparingly)
-mcp__perplexity__perplexity_search({
-  query: "complex topic",
-  max_results: 5,
-  max_tokens_per_page: 1024
-})
+Does query mention "gt " or "graphite"?  → Graphite MCP
+Does query ask for framework/lib API?    → Context7 MCP
+Is query about THIS repo/build config?  → Nx MCP
+Does user want a formal research report? → /research agent
+Need URLs/source links?                  → perplexity_search
+Need a synthesized explanation?          → perplexity_ask
 ```
 
-## Perplexity Ask
-
-**When to use:**
-- Need conversational explanation, not search results
-- Synthesize information from web
-- Explain concepts with current context
+## Philosophy
 
-**Usage:**
-```typescript
-mcp__perplexity__perplexity_ask({
-  messages: [
-    {
-      role: "user",
-      content: "Explain how postgres advisory locks work"
-    }
-  ]
-})
-```
+Perplexity tools are the general-purpose fallback after specialized tools are exhausted. Use them conservatively — tight queries, minimal result counts — and escalate to the researcher agent rather than the raw research tool when depth is needed.
 
-**NOT for:**
-- Library documentation (use Context7)
-- Deep multi-source research (use researcher agent)
+## NEVER
 
-## Prohibited Tool
+- **NEVER call `perplexity_research` directly** — because it burns 30–50k tokens with no user-visible advantage over `/research <topic>`, which adds citations, multi-step planning, and result control. Even one accidental call can exhaust the session context budget.
+- **NEVER use default `max_results` (10) or `max_tokens_per_page` without setting limits** — because the Perplexity MCP defaults are calibrated for thoroughness, not cost; 10 results × 1024 tokens floods context before the answer is even processed. Start at `max_results: 3, max_tokens_per_page: 512`.
+- **NEVER run Perplexity Search for library/framework API questions** — because Perplexity indexes blog posts and Stack Overflow, not versioned API references. Results will be outdated or wrong for the version in use; Context7 has canonical structured docs.
+- **NEVER send a vague keyword query when a specific question would work** — because Perplexity's model optimizes against the full query string. "best practices" returns noise; "postgres connection pool sizing for high-concurrency OLTP 2024" returns actionable signal.
+- **NEVER issue multiple Perplexity calls for sub-questions you could batch into one** — because each call occupies a serial tool slot and adds latency. Compose a single Ask message that covers related sub-questions together.
+- **NEVER use perplexity_ask for tasks that need source URLs** — because Ask synthesizes without guaranteed citation links; if the user will need to verify or share sources, use Search so you get linkable results.
 
-**NEVER use:** `mcp__perplexity__perplexity_research`
+## Query Formulation Strategies
 
-**Use instead:** Researcher agent (`/research <topic>`)
-- Token cost: 30-50k tokens
-- Provides multi-source synthesis with citations
-- Use sparingly for complex questions only
+**For Search** — optimize for source discovery:
+- Include year or recency signal: `"postgres replication lag monitoring 2024"`
+- Name the ecosystem explicitly: `"Next.js 14 App Router data fetching patterns"` not `"react data fetching"`
+- Add format hint when relevant: `"docker compose health check examples github"`
 
-## Tool Selection Chain
+**For Ask** — optimize for synthesis:
+- Frame as a direct question: `"What are the production trade-offs between Kafka and RabbitMQ for event sourcing?"`
+- Scope to avoid shallow coverage: include constraints (`"for a team of 5"`, `"in a Kubernetes environment"`)
+- Ask one composite question rather than chaining multiple Ask calls
 
-**Priority order:**
-1. **Context7 MCP** - Library/framework docs
-2. **Graphite MCP** - Any `gt` CLI mention
-3. **Nx MCP** - THIS workspace questions
-4. **Perplexity Search** - Generic searches
-5. **Perplexity Ask** - Conversational answers
-6. **Researcher agent** - Deep multi-source research
-7. **WebSearch** - Last resort (after Perplexity exhausted)
+**Parameter ladder**:
+| Situation | max_results | max_tokens_per_page |
+|-----------|-------------|---------------------|
+| Default (first attempt) | 3 | 512 |
+| Nothing useful found | 5 | 768 |
+| User needs comprehensive list | 7 | 1024 |
+| Never go above | 10 | 2048 |
 
-## Examples
+## When Things Go Wrong
 
-**✅ CORRECT - Use Perplexity Search:**
-- "Find postgres migration best practices"
-- "Search for React testing tutorials"
-- "Look up latest trends in microservices"
+| Situation | Likely Cause | Recovery |
+|-----------|--------------|----------|
+| Results are outdated or wrong version | Query didn't specify version/year; Perplexity indexed old content | Add version number and year to query; try Context7 for API-specific questions |
+| Results are off-topic or too broad | Query was vague keywords, not a question | Rewrite as a specific question with ecosystem context |
+| Context window pressure after search | Used default (10) results with high token limit | Re-run with `max_results: 3, max_tokens_per_page: 512`; summarize rather than pass raw output |
+| Ask returns no citations | Normal behavior for Ask tool | Switch to Search if user needs linkable sources |
+| perplexity_research call timed out or blew context | The tool is inherently expensive | Abort; use `/research <topic>` researcher agent instead |
 
-**✅ CORRECT - Use Perplexity Ask:**
-- "Explain how postgres advisory locks work"
-- "What are the trade-offs of microservices?"
+## Tool API Reference
 
-**❌ WRONG - Use Context7 instead:**
-- "Search for React hooks documentation" → Context7 MCP
-- "Find Next.js routing docs" → Context7 MCP
-- "Look up Temporal workflow API" → Context7 MCP
+```typescript
+// Search — returns URLs + snippets
+mcp__perplexity__perplexity_search({
+  query: "specific question with context and year",
+  max_results: 3,           // ALWAYS set; default 10 is too high
+  max_tokens_per_page: 512  // ALWAYS set; reduce context cost
+})
 
-**❌ WRONG - Use Graphite MCP instead:**
-- "Search for gt stack commands" → Graphite MCP
-- "Find gt branch workflow" → Graphite MCP
+// Ask — returns synthesized answer (no guaranteed URLs)
+mcp__perplexity__perplexity_ask({
+  messages: [{ role: "user", content: "Direct question scoped with constraints" }]
+})
 
-**❌ WRONG - Use Nx MCP instead:**
-- "Search for build config" (in THIS workspace) → Nx MCP
-- "Find project dependencies" (in THIS workspace) → Nx MCP
+// Research — PROHIBITED: use /research agent instead
+// mcp__perplexity__perplexity_research(...)  ← NEVER CALL
+```
 
-## Key Points
+## Tool Priority Chain
 
-- **Default to limited results** - avoid context bloat
-- **Library docs = Context7** - ALWAYS try Context7 first
-- **"gt" = Graphite MCP** - ANY "gt" mention uses Graphite
-- **Deep research = /research** - NOT perplexity_research tool
-- **Fallback chain** - Search → Ask → WebSearch (last resort)
+1. Context7 MCP — library/framework docs
+2. Graphite MCP — any `gt` CLI mention
+3. Nx MCP — THIS workspace questions
+4. **Perplexity Search** — generic searches needing sources
+5. **Perplexity Ask** — conversational synthesis
+6. Researcher agent (`/research`) — deep multi-source reports
+7. WebSearch — absolute last resort
diff --git a/skills/plugin-forge/SKILL.md b/skills/plugin-forge/SKILL.md
index 6b31439..56ff265 100644
--- a/skills/plugin-forge/SKILL.md
+++ b/skills/plugin-forge/SKILL.md
@@ -1,224 +1,74 @@
 ---
 name: plugin-forge
-description: Create and manage Claude Code plugins with proper structure, manifests, and marketplace integration. Use when creating plugins for a marketplace, adding plugin components (commands, agents, hooks), bumping plugin versions, or working with plugin.json/marketplace.json manifests.
+description: Create and manage Claude Code plugins — plugin.json manifests, marketplace.json registration, skills/commands/agents/hooks wiring, version bumping, and directory-source plugin troubleshooting. Use when building a new plugin, adding components to an existing plugin, diagnosing silent plugin load failures, or publishing to a marketplace.
 ---
 
-# CC Plugin Forge
+# Plugin Forge
 
-## Purpose
+## Mindset
 
-Build and manage Claude Code plugins with correct structure, manifests, and marketplace integration. Includes workflows, automation scripts, and reference docs.
+1. **The loader is picky about git.** Directory-source plugins must have a `.git` dir with at least one commit. A missing or empty repo causes a silent "source type not supported" failure — not an error you can see in the manifest or logs. Git first, always.
+2. **Two manifests, one truth.** Version drift between `plugin.json` and `marketplace.json` causes install mismatches that are hard to diagnose. Treat them as a single atomic update.
+3. **Agents don't hot-reload from marketplace.** Marketplace plugins load skills and commands on install; agent definitions only take effect after a session restart AND a `sudo cp` to `.claude/agents/`. The install command alone is not enough.
+4. **Skills load lazily; size the SKILL.md for that.** Only the `description` field is read at startup. The full SKILL.md loads only when the skill triggers. Bloated SKILL.md files waste context on every activation, not just startup.
+5. **Component paths resolve from plugin root, not from `.claude-plugin/`.** A common off-by-one: putting `commands/` inside `.claude-plugin/` instead of at the plugin root. The manifest only lives in `.claude-plugin/`; everything else is at root.
 
-## When to Use
+## Navigation
 
-- Creating new plugins for a marketplace
-- Adding/modifying plugin components (commands, skills, agents, hooks)
-- Updating plugin versions
-- Working with plugin or marketplace manifests
-- Setting up local plugin testing
-- Publishing plugins
+**Use this skill when**:
+- Scaffolding a new plugin (directory structure, manifest, marketplace registration)
+- Adding a component to an existing plugin (command, skill, agent, hook, MCP server)
+- Diagnosing a plugin that installs without error but shows no commands/skills in session
+- Bumping plugin versions across both manifests
+- Setting up a directory-source marketplace in `settings.json`
 
-## Getting Started
+**Do NOT use this skill when**:
+- Writing the *content* of a skill (SKILL.md authoring) — use skill-judge to evaluate SKILL.md quality after authoring, and agent-md-refactor when plugin instruction files become bloated
+- Configuring hooks behavior or MCP server logic — plugin-forge handles the wiring, not the implementation
+- Publishing to the upstream claude.ai plugin registry — that uses a different submission flow
 
-### Create New Plugin
+**Quick decision tree for ambiguous input**:
+- "My plugin isn't loading" → go to [When Things Go Wrong](#when-things-go-wrong) first
+- "I need to add a command" → component wiring path (see `references/plugin-structure.md`)
+- "I need a new plugin from scratch" → use `scripts/create_plugin.py` then register in marketplace
 
-Use `create_plugin.py` to generate plugin structure:
+## Philosophy
 
-```bash
-python scripts/create_plugin.py plugin-name \
-  --marketplace-root /path/to/marketplace \
-  --author-name "Your Name" \
-  --author-email "your.email@example.com" \
-  --description "Plugin description" \
-  --keywords "keyword1,keyword2" \
-  --category "productivity"
-```
+Plugin-forge exists because the gap between "plugin installed" and "plugin works" is filled with silent failures. Every decision here prioritizes diagnosability: correct git state, valid manifests, proper path placement, and restart discipline — because the loader gives you nothing when it fails.
 
-This automatically:
+## NEVER
 
-- Creates plugin directory structure
-- Generates `plugin.json` manifest
-- Creates README template
-- Updates `marketplace.json`
+- **NEVER create a directory-source plugin without initializing git** — the loader treats non-git directories as unsupported source type and fails silently with no error in the install output. Run `git init && git add -A && git commit -m "init"` before registering.
+- **NEVER update version in only one manifest** — `plugin.json` and `marketplace.json` must stay in sync. A mismatch causes the marketplace to serve stale metadata while the plugin runs the new code, making version tracking unreliable and rollbacks dangerous.
+- **NEVER put component directories inside `.claude-plugin/`** — `.claude-plugin/` is manifest-only. Commands, skills, agents, hooks, and `.mcp.json` must live at the plugin root. Nesting them inside `.claude-plugin/` results in silent load failure with no directory-not-found warning.
+- **NEVER rely on `/plugin install` alone to activate agents** — marketplace install does not copy agent definitions to `.claude/agents/`. Agents require a manual `cp` to the agents directory plus a full session restart. Skipping either step means the agent is invisible to Claude.
+- **NEVER name a skill directory with spaces or uppercase** — the skill loader uses the directory name as the skill identifier. Non-kebab-case names cause lookup mismatches where the skill exists on disk but cannot be resolved by the trigger system.
+- **NEVER ship a SKILL.md over 500 lines** — the full file loads into context on every skill activation. Oversized skills consume token budget silently and degrade performance across the whole session. Move heavy reference content to `references/` subdirectory files.
+- **NEVER hardcode absolute paths in manifests or hooks** — use `${CLAUDE_PLUGIN_ROOT}` for dynamic resolution in hooks/MCP configs. Absolute paths break portability across machines and for other users of a shared marketplace.
 
-### Bump Version
+## When Things Go Wrong
 
-Use `bump_version.py` to update versions in both manifests:
-
-```bash
-python scripts/bump_version.py plugin-name major|minor|patch \
-  --marketplace-root /path/to/marketplace
-```
-
-Semantic versioning:
-
-- **major**: Breaking changes (1.0.0 → 2.0.0)
-- **minor**: New features, refactoring (1.0.0 → 1.1.0)
-- **patch**: Bug fixes, docs (1.0.0 → 1.0.1)
-
-## Development Workflow
-
-### 1. Create Structure
-
-Manual approach (if not using script):
-
-```bash
-mkdir -p plugins/plugin-name/.claude-plugin
-mkdir -p plugins/plugin-name/commands
-mkdir -p plugins/plugin-name/skills
-```
-
-### 2. Plugin Manifest
-
-File: `plugins/plugin-name/.claude-plugin/plugin.json`
-
-```json
-{
-  "name": "plugin-name",
-  "version": "0.1.0",
-  "description": "Plugin description",
-  "author": {
-    "name": "Your Name",
-    "email": "your.email@example.com"
-  },
-  "keywords": ["keyword1", "keyword2"]
-}
-```
-
-### 3. Register in Marketplace
-
-Update `.claude-plugin/marketplace.json`:
-
-```json
-{
-  "name": "plugin-name",
-  "source": "./plugins/plugin-name",
-  "description": "Plugin description",
-  "version": "0.1.0",
-  "keywords": ["keyword1", "keyword2"],
-  "category": "productivity"
-}
-```
-
-### 4. Add Components
-
-Create in respective directories:
-
-| Component | Location | Format |
-|-----------|----------|--------|
-| Commands | `commands/` | Markdown with frontmatter |
-| Skills | `skills/<name>/` | Directory with `SKILL.md` |
-| Agents | `agents/` | Markdown definitions |
-| Hooks | `hooks/hooks.json` | Event handlers |
-| MCP Servers | `.mcp.json` | External integrations |
-
-### 5. Local Testing
-
-```bash
-# Add marketplace
-/plugin marketplace add /path/to/marketplace-root
-
-# Install plugin
-/plugin install plugin-name@marketplace-name
-
-# After changes: reinstall
-/plugin uninstall plugin-name@marketplace-name
-/plugin install plugin-name@marketplace-name
-```
-
-## Plugin Patterns
-
-### Framework Plugin
-
-For framework-specific guidance (React, Vue, etc.):
-
-```
-plugins/framework-name/
-├── .claude-plugin/plugin.json
-├── skills/
-│   └── framework-name/
-│       ├── SKILL.md
-│       └── references/
-├── commands/
-│   └── prime/
-│       ├── components.md
-│       └── framework.md
-└── README.md
-```
-
-### Utility Plugin
-
-For tools and commands:
-
-```
-plugins/utility-name/
-├── .claude-plugin/plugin.json
-├── commands/
-│   ├── action1.md
-│   └── action2.md
-└── README.md
-```
-
-### Domain Plugin
-
-For domain-specific knowledge:
-
-```
-plugins/domain-name/
-├── .claude-plugin/plugin.json
-├── skills/
-│   └── domain-name/
-│       ├── SKILL.md
-│       ├── references/
-│       └── scripts/
-└── README.md
-```
-
-## Command Naming
-
-Subdirectory-based namespacing with `:` separator:
-
-- `commands/namespace/command.md` → `/namespace:command`
-- `commands/simple.md` → `/simple`
-
-Examples:
-
-- `commands/prime/vue.md` → `/prime:vue`
-- `commands/docs/generate.md` → `/docs:generate`
-
-## Version Management
-
-**Important:** Update version in BOTH locations:
-
-1. `plugins/<name>/.claude-plugin/plugin.json`
-2. `.claude-plugin/marketplace.json`
-
-Use `bump_version.py` to automate.
-
-## Git Commits
-
-Use conventional commits:
-
-```bash
-git commit -m "feat: add new plugin"
-git commit -m "fix: correct plugin manifest"
-git commit -m "docs: update plugin README"
-git commit -m "feat!: breaking change"
-```
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| Plugin installs without error but `/command` is not available | Component dirs inside `.claude-plugin/` instead of plugin root, OR session not restarted after install | Move dirs to plugin root; `/plugin uninstall` then reinstall; restart session |
+| Directory-source install fails with "source type not supported" | Plugin directory has no `.git` or zero commits | `git init && git add -A && git commit -m "init"` in plugin root, then reinstall |
+| Plugin installs but shows wrong version | `plugin.json` and `marketplace.json` versions are out of sync | Use `scripts/bump_version.py` to update both atomically; uninstall and reinstall |
+| Agent defined in plugin is not visible to Claude | Agents don't auto-load from marketplace; manual copy required | `sudo cp agents/<name>.md ~/.claude/agents/` and restart session completely |
+| Skill triggers but reads wrong SKILL.md | Skill directory name doesn't match the `name:` field in frontmatter | Align directory name and frontmatter `name:` field; they must be identical |
+| Hook fires but `${CLAUDE_PLUGIN_ROOT}` resolves incorrectly | Plugin was installed to a non-standard path | Verify install path with `/plugin list`; check hooks.json uses variable not literal path |
+| MCP server from plugin doesn't appear in session | `.mcp.json` is missing or inside `.claude-plugin/` instead of plugin root | Move `.mcp.json` to plugin root; reinstall plugin |
 
 ## Reference Docs
 
-Detailed documentation included:
-
 | Reference | Content |
 |-----------|---------|
-| `references/plugin-structure.md` | Directory structure, manifest schema, components |
-| `references/marketplace-schema.md` | Marketplace format, plugin entries, distribution |
-| `references/workflows.md` | Step-by-step workflows, patterns, publishing |
+| `references/plugin-structure.md` | Full directory schema, manifest fields, component placement rules |
+| `references/marketplace-schema.md` | Marketplace JSON format, source types (local/GitHub/git URL), team distribution via settings.json |
+| `references/workflows.md` | Step-by-step create/test/publish workflows, conventional commit conventions |
 
-### Scripts
+## Scripts
 
 | Script | Purpose |
 |--------|---------|
-| `scripts/create_plugin.py` | Scaffold new plugin |
-| `scripts/bump_version.py` | Update versions |
+| `scripts/create_plugin.py` | Scaffold new plugin with correct structure, manifests, and marketplace registration |
+| `scripts/bump_version.py` | Update version atomically in both `plugin.json` and `marketplace.json` |
diff --git a/skills/professional-communication/SKILL.md b/skills/professional-communication/SKILL.md
index d8b50a2..af58e31 100644
--- a/skills/professional-communication/SKILL.md
+++ b/skills/professional-communication/SKILL.md
@@ -1,267 +1,85 @@
 ---
 name: professional-communication
-description: Guide technical communication for software developers. Covers email structure, team messaging etiquette, meeting agendas, and adapting messages for technical vs non-technical audiences. Use when drafting professional messages, preparing meeting communications, or improving written communication.
+description: Craft, improve, or structure professional written communication for technical contexts — email, async chat, meeting summaries, status updates, and audience-adapted technical explanations. Trigger phrases: "write an email", "draft a message", "help me communicate", "status update", "meeting agenda", "translate this for stakeholders", "improve my message", "how should I phrase this".
 allowed-tools: Read, Glob, Grep
 ---
 
-# Professional Communication
+## Mindset
 
-## Overview
+1. **The recipient's cognitive load is the only metric that matters.** A message is only as good as what the reader does with it. If they have to re-read, ask a clarifying question, or forward it to someone else, the communication failed — regardless of how accurate or thorough it was.
 
-This skill provides frameworks and guidance for effective professional communication in software development contexts. Whether you're writing an email to stakeholders, crafting a team chat message, or preparing meeting agendas, these principles help you communicate clearly and build professional credibility.
+2. **Urgency and importance are almost never self-evident.** Every sender thinks their message is high-priority. Explicit signals ("Decision needed by EOD Friday") outperform implied urgency 100% of the time. The absence of a deadline means "whenever."
 
-**Core principle:** Effective communication isn't about proving how much you know - it's about ensuring your message is received and understood.
+3. **Tone debt compounds.** A single passive-aggressive or blame-shifting sentence can permanently change how a reader filters your future messages. Unlike code, communication debt has no automated linter — it requires deliberate reversion.
 
-## When to Use This Skill
+4. **Medium selection is the first communication decision.** Choosing the wrong medium (chat for a complex decision, email for an urgent blocker) wastes more time than poor wording. Format mismatch signals poor situational awareness to peers and managers.
 
-Use this skill when:
+5. **Precision protects you; vagueness protects no one.** Hedge language ("might," "could," "sort of") reads as uncertainty in technical contexts. Stakeholders interpret ambiguity as risk. Be precise even when the situation is uncertain — state the uncertainty explicitly instead.
 
-- Writing emails to teammates, managers, or stakeholders
-- Crafting team chat messages or async communications
-- Preparing meeting agendas or summaries
-- Translating technical concepts for non-technical audiences
-- Structuring status updates or reports
-- Improving clarity of written communication
+## Navigation
 
-**Keywords**: email, chat, teams, slack, discord, message, writing, communication, meeting, agenda, status update, report
+**Use this skill when**:
+- Drafting or improving any written professional message (email, chat, Slack/Teams, async update)
+- Adapting technical content for non-technical audiences (executives, clients, product)
+- Structuring meeting agendas, summaries, or post-mortems
+- Handling escalations, conflict-adjacent communications, or sensitive announcements
+- The user says "help me say this," "is this too harsh," "how do I ask for X," or "translate this"
 
-## Core Frameworks
+**Do NOT use this skill when**:
+- The task is a difficult conversation requiring emotional intelligence — use `feedback-mastery` instead
+- The output is formal legal, HR, or compliance documentation — those require domain-specific review beyond communication framing
+- The user needs speech/presentation coaching (different register, different skill)
 
-### The What-Why-How Structure
+**Quick disambiguation**:
+- Feedback to a colleague → `feedback-mastery`
+- Written message, update, or audience translation → this skill
+- Both → use this skill first, then flag `feedback-mastery` for delivery coaching
 
-Use this universal framework to organize any professional message:
+## Philosophy
 
-| Component | Purpose | Example |
-| --- | --- | --- |
-| **What** | State the topic/request clearly | "We need to delay the release by one week" |
-| **Why** | Explain the reasoning | "Critical bug found in payment processing" |
-| **How** | Outline next steps/action items | "QA will retest by Thursday; I'll update stakeholders Friday" |
+Professional communication is not about demonstrating expertise — it is about creating the conditions for the reader to act correctly. Every structural choice (subject line, bullet vs. prose, opening sentence) is either reducing or adding to the reader's work.
 
-**Apply to**: Emails, status updates, meeting talking points, technical explanations
+## NEVER
 
-### Three Golden Rules for Written Communication
+- **NEVER open with context before stating the ask** — recipients scan for what they need to do, not what led to it. Burying the request at the bottom of a long context section guarantees it gets missed or deferred. Lead with the ask; follow with the rationale.
 
-1. **Start with a clear subject/purpose** - Recipients should immediately grasp what your message is about
-2. **Use bullets, headlines, and scannable formatting** - Nobody wants a wall of text
-3. **Key messages first** - Busy people appreciate efficiency; state your main point upfront
+- **NEVER use a vague subject line** — "Quick question" or "Follow-up" forces the recipient to open the email to know whether it's urgent. This creates anxiety and delays action. Subject lines are pre-decisions: "API rate limit approval needed by Thursday" gives the reader everything before they open.
 
-### Audience Calibration
+- **NEVER send a "wall of text" as a chat message** — prose paragraphs in Slack/Teams signal that you didn't consider the reader's context. Chat is a scanning medium; documents are a reading medium. Long explanations belong in a linked document, not a thread.
 
-Before communicating, ask yourself:
+- **NEVER mark something urgent without stating the consequence of inaction** — "URGENT" without "if we don't decide by 3pm, the deploy is blocked until Monday" is noise. Stakeholders ignore unexplained urgency signals; they respond to explicit impact.
 
-1. **Who** are you writing to? (Technical peers, managers, stakeholders, customers)
-2. **What level of detail** do they need? (High-level overview vs implementation details)
-3. **What's the value** for them? (How does this affect their work/decisions?)
+- **NEVER write in passive voice when accountability matters** — "The bug was missed" distributes blame diffusely and signals defensiveness. "QA missed the bug in the payment flow" is harder to write but faster to resolve — it's clear who needs to be in the post-mortem. Passive voice in incident communications is a red flag.
 
-## Email Best Practices
+- **NEVER assume a technical peer wants more detail** — over-explanation to a peer signals distrust of their competence. If they need more, they'll ask. Default to the conclusion; offer depth if asked. This is the reverse of non-technical audience rules.
 
-### Subject Line Formula
+- **NEVER continue an async thread past 3 exchanges without offering a sync** — ping-pong past 3 rounds means the problem has ambiguity that text cannot resolve. Offering a 15-minute call is not a failure of async; it is the correct escalation path. Continuing to type is ego protection, not communication.
 
-| Instead of | Try |
-| --- | --- |
-| "Project updates" | "Project X: Status Update and Next Steps" |
-| "Question" | "Quick question: API rate limiting approach" |
-| "FYI" | "FYI: Deployment scheduled for Tuesday 3pm" |
+## When Things Go Wrong
 
-### Email Structure Template
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| No response after 48h | No explicit ask, unclear deadline, or wrong channel | Resend with explicit action request in the first sentence; switch medium if needed |
+| Reply asks clarifying questions you already answered | Information was buried in the middle; reader scanned and missed it | Restructure: ask/decision first, context second; use bold to surface key data |
+| Message perceived as aggressive or blame-y | Passive constructions + implicit accusations ("someone missed this") | Rewrite to specific facts + forward-looking action: what happened, what we'll do |
+| Executive asks to "simplify" after reading your update | Technical depth exceeded their decision-making need | Strip all implementation detail; restate as: business impact, risk, decision needed |
+| Message went to wrong audience (e.g., escalated too high) | Urgency misread medium; no escalation check | Send an immediate correction with explicit framing: "This was meant for [team]; forwarding to correct channel now" |
 
-```markdown
-**Subject:** [Project/Topic]: [Specific Purpose]
+## How to Draft Any Message
 
-Hi [Name],
+1. **Identify the output type**: decision request / status update / FYI / escalation / explanation. Different types have different templates — see `references/email-templates.md`.
+2. **State the ask or key fact in the first sentence.** No preamble.
+3. **Apply audience calibration**: engineering peer → technical precision; manager → impact + timeline; executive → decision needed + risk if none; customer → plain language + what it means for them.
+4. **Apply the "So What?" test** to every paragraph: if a paragraph doesn't change what the reader does or understands, cut it.
+5. **Check medium fit**: real-time urgency → call; complex decision → email + doc; quick coordination → chat.
 
-[1-2 sentences stating the key point or request upfront]
+## References
 
-**Context/Background:**
-- [Bullet point 1]
-- [Bullet point 2]
-
-**What I need from you:**
-- [Specific action or decision needed]
-- [Timeline if applicable]
-
-[Optional: Brief next steps or follow-up plan]
-
-Best,
-[Your name]
-```
-
-### Common Email Types
-
-| Type | Key Elements |
-| --- | --- |
-| **Status Update** | Progress summary, blockers, next steps, timeline |
-| **Request** | Clear ask, context, deadline, why it matters |
-| **Escalation** | Issue summary, impact, attempted solutions, needed decision |
-| **FYI/Announcement** | What changed, who's affected, any required action |
-
-**For templates**: See `references/email-templates.md`
-
-## Team Messaging Etiquette
-
-> **Note:** Examples use Slack terminology, but these principles apply equally to Microsoft Teams, Discord, or any team messaging platform.
-
-### When to Use Chat vs Email
-
-| Use Chat | Use Email |
-| --- | --- |
-| Quick questions with short answers | Detailed documentation needing records |
-| Real-time coordination | Formal communications to stakeholders |
-| Informal team discussions | Messages requiring careful review |
-| Time-sensitive updates | Complex explanations with multiple parts |
-
-### Team Messaging Best Practices
-
-1. **Use threads** - Keep main channels scannable; follow-ups go in threads
-2. **@mention thoughtfully** - Don't notify people unnecessarily
-3. **Channel organization** - Right channel for right topic
-4. **Be direct** - "Can you review my PR?" beats "Hey, are you busy?"
-5. **Async-friendly** - Write messages that don't require immediate response
-
-### The "No Hello" Principle
-
-Instead of:
-
-```text
-You: Hi
-You: Are you there?
-You: Can I ask you something?
-[waiting...]
-```
-
-Try:
-
-```text
-You: Hi Sarah - quick question about the deployment script.
-     Getting a permission error on line 42. Have you seen this before?
-     Here's the error: [paste error]
-```
-
-## Technical vs Non-Technical Communication
-
-### When to Be Technical vs Accessible
-
-| Audience | Approach |
-| --- | --- |
-| **Engineering peers** | Technical details, code examples, architecture specifics |
-| **Technical managers** | Balance of detail and high-level impact |
-| **Non-technical stakeholders** | Business impact, analogies, outcomes over implementation |
-| **Customers** | Plain language, what it means for them, avoid jargon |
-
-### Three Strategies for Simplification
-
-1. **Start with the big picture before details** - People process "why" before "how"
-2. **Simplify without losing accuracy** - Use analogies; replace jargon with plain language
-3. **Know when to switch** - Read the room; adjust based on questions and engagement
-
-### Jargon Translation Examples
-
-| Technical | Plain Language |
-| --- | --- |
-| "Microservices architecture" | "Our system is split into smaller, independent pieces that can scale separately" |
-| "Asynchronous message processing" | "Tasks are queued and processed in the background" |
-| "CI/CD pipeline" | "Automated process that tests and deploys our code" |
-| "Database migration" | "Updating how our data is organized and stored" |
-
-**For more examples**: See `references/jargon-simplification.md`
-
-## Writing Clarity Principles
-
-### Active Voice Over Passive Voice
-
-Active voice is clearer, more direct, and conveys authority:
-
-| Passive (avoid) | Active (prefer) |
-| --- | --- |
-| "A bug was identified by the team" | "The team identified a bug" |
-| "The feature will be implemented" | "We will implement the feature" |
-| "Errors were found during testing" | "Testing revealed errors" |
-
-### Eliminate Filler Words
-
-| Instead of | Use |
-| --- | --- |
-| "At this point in time" | "Now" |
-| "In the event that" | "If" |
-| "Due to the fact that" | "Because" |
-| "In order to" | "To" |
-| "I just wanted to check if" | "Can you" |
-
-### The "So What?" Test
-
-After writing, ask: "So what? Why does this matter to the reader?"
-
-If you can't answer clearly, restructure your message to lead with the value/impact.
-
-## Meeting Communication
-
-### Before: Agenda Best Practices
-
-Every meeting invite should include:
-
-1. **Clear objective** - What will be accomplished?
-2. **Agenda items** - Topics to cover with time estimates
-3. **Preparation required** - What should attendees bring/review?
-4. **Expected outcome** - Decision needed? Information sharing? Brainstorm?
-
-### During: Facilitation Tips
-
-- **Time-box discussions** - "Let's spend 5 minutes on this, then move on"
-- **Capture action items live** - Who does what by when
-- **Parking lot** - Note off-topic items for later
-
-### After: Summary Format
-
-```markdown
-**Meeting: [Topic] - [Date]**
-
-**Attendees:** [Names]
-
-**Key Decisions:**
-- [Decision 1]
-- [Decision 2]
-
-**Action Items:**
-- [ ] [Person]: [Task] - Due [Date]
-- [ ] [Person]: [Task] - Due [Date]
-
-**Next Steps:**
-- [Follow-up meeting if needed]
-- [Documents to share]
-```
-
-**For structures by meeting type**: See `references/meeting-structures.md`
-
-## Quick Reference: Communication Checklist
-
-Before sending any professional communication:
-
-- [ ] **Clear purpose** - Can the recipient understand intent in 5 seconds?
-- [ ] **Right audience** - Is this the appropriate person/channel?
-- [ ] **Key message first** - Is the main point upfront?
-- [ ] **Scannable** - Are there bullets, headers, short paragraphs?
-- [ ] **Action clear** - Does the recipient know what (if anything) they need to do?
-- [ ] **Jargon check** - Will the audience understand all terminology?
-- [ ] **Tone appropriate** - Is it professional but not cold?
-- [ ] **Proofread** - Any typos or unclear phrasing?
-
-## Additional Tools
-
-- `references/email-templates.md` - Ready-to-use email templates by type
-- `references/meeting-structures.md` - Structures for standups, retros, reviews
-- `references/jargon-simplification.md` - Technical-to-plain-language translations
+- `references/email-templates.md` — Ready-to-use templates by message type (status update, escalation, FYI, request)
+- `references/meeting-structures.md` — Agenda and summary formats for standups, retros, architecture reviews, post-mortems
+- `references/jargon-simplification.md` — Technical-to-plain-language translations by domain
+- `references/remote-async-communication.md` — Async-first principles, channel selection matrix, time zone norms; **load this when the communication is cross-timezone or fully async**
 
 ## Companion Skills
 
-- `feedback-mastery` - For difficult conversations and feedback delivery
-- `/draft-email` - Generate emails using these frameworks
-
----
-
-**Last Updated:** 2025-12-22
-
-## Version History
-
-- **v1.0.0** (2025-12-26): Initial release
-
----
+- `feedback-mastery` — Difficult conversations, delivering critical feedback, managing conflict
diff --git a/skills/project-manager/SKILL.md b/skills/project-manager/SKILL.md
index 7fd4fb7..2171c7d 100644
--- a/skills/project-manager/SKILL.md
+++ b/skills/project-manager/SKILL.md
@@ -1,88 +1,79 @@
 ---
 name: project-manager
 description: >
-  Central OS Agent for multi-agent development workflows. Tracks live agent states (ID,
-  specialization, work item, active files, skills, git status, health/blockers), maintains
-  .agent_os_state.json, enforces dependency locking between agents, and renders a scannable
-  CLI dashboard. Use when asked for "system status", "agent dashboard", "project status",
-  or to track/manage agents across a multi-agent workflow.
+  Central OS Agent for multi-agent development workflows. Tracks live agent states,
+  maintains .agent_os_state.json, enforces file-level dependency locking, detects
+  state corruption, and renders a CLI dashboard. Trigger keywords: "system status",
+  "agent dashboard", "project status", "check blockers", "register agent", "agent
+  is working on", "who owns", "lock conflict", "unblock agent".
 ---
 
-You are the Central Operating System (OS) Agent for a multi-agent development workflow. Your job is to aggregate, visualize, and orchestrate the live operational states of all active agents working on the project.
-
-## Core State Tracking
-
-Track these metrics for every agent:
-- **Agent ID & Specialization** (e.g., UI-Agent, Auth-Agent, Test-Agent)
-- **Current Work Item / Ticket ID** (linked to the project roadmap)
-- **Active File Paths** (exact files the agent is currently reading or writing)
-- **Skills Loaded** (e.g., git-tools, db-migrator, regex-parser, test-runner)
-- **Git/Push Status** (e.g., Clean, 3 commits ahead, Uncommitted changes, Merge conflict)
-- **Health & Blockers** (e.g., Idle, Executing, Blocked by Agent-02)
-
-## State Management Rules
-
-1. **State Synchronization**: Create and maintain `.agent_os_state.json` in the root directory. Read and update this file whenever an agent reports a state change.
-2. **Dependency Locking**: If Agent-A's push status is "Pending Review" and touches files that Agent-B needs, automatically flag Agent-B as "Blocked" and notify the supervisor.
-3. **Skill Management**: Dynamically recommend or inject specific tool scripts or "skills" into an agent's workspace based on their current execution errors or Git push failures.
-4. **Terminal Dashboard**: Provide a scannable CLI dashboard layout showing holistic project health.
-
-## State File Schema
-
-Maintain `.agent_os_state.json` with this shape:
-
-```json
-{
-  "last_updated": "ISO-8601 timestamp",
-  "agents": {
-    "Agent-01": {
-      "specialization": "auth",
-      "ticket": "TICKET-42",
-      "active_files": ["src/auth/login.ts"],
-      "skills": ["git", "web-scrap"],
-      "git_status": "2 commits ahead",
-      "health": "RUNNING",
-      "blocker": null
-    }
-  },
-  "locks": {
-    "src/auth/login.ts": "Agent-01"
-  }
-}
-```
-
-When asked to update an agent's state, read the current `.agent_os_state.json`, apply the change, and write it back. If the file doesn't exist, create it with an empty agents object.
-
-## System Status Dashboard Output
-
-When asked for "system status", output this exact markdown structure:
-
-```
-## 🖥️ AGENT OS DIGITAL TWIN DASHBOARD
------------------------------------------------------------------------
-[Agent ID]    [Active Skill]    [Git Status]    [Current Task]   [Status]
------------------------------------------------------------------------
-Agent-01      git, web-scrap     2 Commits Ahead Fix Login Bug    ⚙️ RUNNING
-Agent-02      db-schema, sql     Clean           Migrate Users    ⏸️ IDLE
-Agent-03      jest, coverage     Uncommitted     Write UI Tests   🛑 BLOCKED (by 01)
-
-### 🔄 Active Interactivity Matrix:
-- Agent-01 is modifying `src/auth/login.ts`.
-- Agent-03 execution paused: Target file `src/auth/login.ts` is locked by Agent-01.
-
-### 🛠️ Required Actions:
-- [ ] Merge Agent-01 branch to release Agent-03.
-- [ ] Load `performance-profiler` skill to Agent-02 for the upcoming database migration.
-```
-
-Status icons: ⚙️ RUNNING | ⏸️ IDLE | 🛑 BLOCKED | ✅ DONE | ⚠️ ERROR
+## Mindset
+
+Multi-agent state is **eventually inconsistent by default**. Every read of `.agent_os_state.json` can be stale. Treat it as an advisory ledger, not ground truth — confirm with the actual agent before acting on lock data.
+
+File-level locks are a proxy for intent, not enforcement. Two agents can both write to the same file without triggering a lock error if they registered their ownership naively. The lock only prevents corruption when agents *ask before acting*.
+
+A BLOCKED agent that has been blocked for more than one polling cycle is a system smell, not a normal operating state. Silent blocks cascade: Agent-03 blocks Agent-07, and Agent-07's downstream dependency is never surfaced unless you check transitively.
+
+Dashboard freshness degrades immediately. Timestamps older than 2 minutes in a hot session mean the reporting agent is either hung, disconnected, or running in a shell that lost its context window.
+
+Health status and blocker fields are self-reported — an agent can lie or go silent without updating its own record.
+
+## Navigation
+
+**Use this skill when**:
+- Coordinating 2+ agents that share files, branches, or output artifacts
+- Diagnosing why an agent is stuck, silent, or producing stale output
+- Running "system status", "agent dashboard", or "check blockers"
+- Registering, updating, or releasing agents and their file locks
+- Investigating a lock conflict or state corruption scenario
+
+**Do NOT use this skill when**:
+- Only one agent is active — single-agent workflows have no coordination overhead to manage
+- The project has no shared files between agents — lock management adds friction with no benefit
+- You are debugging code logic rather than agent coordination; use the `code-review` or `systematic-debugging` skill instead
+
+**Quick triage**: If an agent reports BLOCKED → check lock owner → verify owner is still alive → if stale, force-release. If multiple agents report BLOCKED on the same file → lock owner crashed without cleanup; see recovery table below.
+
+## Philosophy
+
+State without freshness guarantees is noise. Every coordination decision must account for the age of the state data, the trustworthiness of the reporting agent, and the transitive dependency graph — not just the direct lock entry.
+
+## NEVER
+
+- NEVER release a lock based solely on a requesting agent's claim that it needs the file — always verify with the lock-holder's current status first, because phantom release creates a race condition where both agents write simultaneously.
+- NEVER treat a missing `.agent_os_state.json` as a clean slate without checking if an agent crashed mid-write — a partial write leaves a corrupt JSON file that silently fails future reads; always validate with `jq . .agent_os_state.json` before assuming the file is absent.
+- NEVER mark an agent DONE and release all its locks in a single step — release locks file-by-file and confirm downstream agents are notified before transitioning status, because batch-release can trigger a thundering herd where 3+ blocked agents all start writing the same dependency simultaneously.
+- NEVER propagate a lock from a crashed agent to the requester without recording the original lock owner in a `previous_holder` field — without audit history, you cannot determine whether a file was modified in a partially-applied state when the original agent died.
+- NEVER use agent self-reported `health` as the primary signal for scheduling new work — an agent in state RUNNING can be CPU-starved, token-limited, or stuck in a retry loop; cross-check with `last_updated` timestamp delta before assigning dependent tasks.
+- NEVER store absolute paths in `active_files` or the `locks` map — path portability breaks when the state file is shared across machines or worktrees; use paths relative to the repo root.
+- NEVER allow two agents to hold a lock on the same file simultaneously, even if one claims it is "read-only" — read locks that turn into write locks mid-execution are the most common source of state corruption in concurrent Claude Code sessions.
+
+## When Things Go Wrong
+
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| Agent reports BLOCKED but lock-holder shows DONE | Lock was not released on completion (crash or missed cleanup step) | Manually call "agent [ID] finished [file]" for each file the dead agent held; verify `locks` map is empty for those paths |
+| `.agent_os_state.json` fails JSON parse | Agent crashed mid-write, leaving partial JSON | Run `jq . .agent_os_state.json`; if corrupt, restore from `.agent_os_state.json.bak` if present, or reconstruct from each agent's last self-report |
+| Multiple agents claim to own the same lock entry | Race condition on simultaneous registration | Last-write-wins policy applied; re-interview each agent for its actual file set; rebuild the `locks` map from scratch |
+| Dashboard timestamp is stale (>2 min in active session) | Agent is hung, token-exhausted, or context was reset | Treat agent as UNKNOWN health; do not assign new work; attempt a ping ("update [Agent-ID] status to PING") and wait one cycle |
+| Transitive block chain (A blocks B, B blocks C) | No deadlock detection in baseline schema | Walk the `blocker` fields recursively; if any chain loops back to itself, you have a deadlock — break it by promoting the oldest lock-holder's work to completion or reassigning |
+| Agent registers with no `active_files` and immediately claims RUNNING | Agent is faking status to appear active | Cross-check against actual git diff for that agent's branch; no file changes = agent is idle or misconfigured |
 
 ## Commands
 
-- **"system status"** / **"dashboard"**: Render the full dashboard from `.agent_os_state.json`.
+- **"system status"** / **"dashboard"**: Render the full dashboard. See [references/dashboard-format.md](references/dashboard-format.md).
 - **"register agent [ID] as [specialization]"**: Add a new agent entry to state.
 - **"update [Agent-ID] status to [status]"**: Update an agent's health field.
-- **"agent [ID] is working on [file]"**: Lock the file to that agent, check for conflicts.
-- **"agent [ID] finished [file]"**: Release the file lock.
-- **"check blockers"**: List all agents blocked and why.
-- **"recommend skills for [Agent-ID]"**: Based on their current git/error state, suggest skills to inject.
+- **"agent [ID] is working on [file]"**: Acquire a lock; check for conflicts first.
+- **"agent [ID] finished [file]"**: Release the file lock; notify dependents.
+- **"check blockers"**: List all blocked agents with transitive chain analysis.
+- **"recommend skills for [Agent-ID]"**: Suggest skills based on current error/git state.
+- **"force-release [file]"**: Override a stale lock; record `previous_holder` before release.
+
+## State File
+
+Schema and validation details: [references/state-schema.md](references/state-schema.md)
+
+Key rule: always read → validate → mutate → write. Never write without a preceding read in the same operation sequence.
diff --git a/skills/project-manager/references/dashboard-format.md b/skills/project-manager/references/dashboard-format.md
new file mode 100644
index 0000000..f351b05
--- /dev/null
+++ b/skills/project-manager/references/dashboard-format.md
@@ -0,0 +1,50 @@
+# Dashboard Format Reference
+
+## Output structure
+
+When asked for "system status" or "dashboard", render this exact markdown:
+
+```
+## AGENT OS DIGITAL TWIN DASHBOARD
+-----------------------------------------------------------------------
+[Agent ID]    [Active Skill]    [Git Status]      [Current Task]   [Status]
+-----------------------------------------------------------------------
+Agent-01      git, web-scrap    2 Commits Ahead   Fix Login Bug    RUNNING
+Agent-02      db-schema, sql    Clean             Migrate Users    IDLE
+Agent-03      jest, coverage    Uncommitted       Write UI Tests   BLOCKED (by 01)
+
+### Active Interactivity Matrix:
+- Agent-01 is modifying `src/auth/login.ts`.
+- Agent-03 execution paused: `src/auth/login.ts` locked by Agent-01.
+
+### Required Actions:
+- [ ] Merge Agent-01 branch to release Agent-03.
+- [ ] Load `performance-profiler` skill to Agent-02 for upcoming DB migration.
+
+### State Freshness:
+- Last updated: [timestamp]
+- Agents with stale timestamps (>2 min): [list or "none"]
+```
+
+## Status labels
+
+| Label | Meaning |
+|-------|---------|
+| RUNNING | Agent is actively executing work |
+| IDLE | Agent has no current task |
+| BLOCKED | Agent is waiting on a lock held by another agent |
+| DONE | Agent completed its work item; locks should be released |
+| ERROR | Agent encountered an unrecoverable error |
+| UNKNOWN | No heartbeat for >2 minutes; treat as potentially dead |
+
+## Interactivity Matrix rules
+
+- Only list file interactions where a lock is active.
+- Show all transitive block chains (A→B→C), not just direct blocks.
+- If the `locks` map is empty, output "No active file locks."
+
+## Required Actions generation
+
+- One action item per blocked agent.
+- One action item per agent with skills mismatch (agent needs a tool it doesn't have loaded).
+- One action item per UNKNOWN-health agent.
diff --git a/skills/project-manager/references/state-schema.md b/skills/project-manager/references/state-schema.md
new file mode 100644
index 0000000..124844c
--- /dev/null
+++ b/skills/project-manager/references/state-schema.md
@@ -0,0 +1,47 @@
+# State File Schema Reference
+
+## File location
+
+`.agent_os_state.json` in the repository root (relative paths only for all file entries).
+
+## Schema
+
+```json
+{
+  "last_updated": "ISO-8601 timestamp",
+  "agents": {
+    "Agent-01": {
+      "specialization": "auth",
+      "ticket": "TICKET-42",
+      "active_files": ["src/auth/login.ts"],
+      "skills": ["git", "web-scrap"],
+      "git_status": "2 commits ahead",
+      "health": "RUNNING",
+      "blocker": null,
+      "previous_holder": null
+    }
+  },
+  "locks": {
+    "src/auth/login.ts": "Agent-01"
+  }
+}
+```
+
+## Field notes
+
+| Field | Type | Notes |
+|-------|------|-------|
+| `last_updated` | ISO-8601 string | Updated on every write; use to detect stale state |
+| `health` | enum | RUNNING, IDLE, BLOCKED, DONE, ERROR, UNKNOWN |
+| `blocker` | string or null | Agent ID that holds the blocking lock; null if unblocked |
+| `previous_holder` | string or null | Set when a force-release occurs; audit trail |
+| `active_files` | array of strings | Relative paths from repo root only |
+| `locks` | object | Map of `relative_path → agent_id`; single source of truth for lock ownership |
+
+## Backup convention
+
+Before any write, copy `.agent_os_state.json` to `.agent_os_state.json.bak` to enable single-step recovery from mid-write corruption.
+
+## Validation
+
+Always run `jq . .agent_os_state.json` before reading state into memory. If it fails, the file is corrupt — do not proceed with stale in-memory assumptions.
diff --git a/skills/qa-test-planner/SKILL.md b/skills/qa-test-planner/SKILL.md
index a269932..9ae1f2c 100644
--- a/skills/qa-test-planner/SKILL.md
+++ b/skills/qa-test-planner/SKILL.md
@@ -1,757 +1,131 @@
 ---
 name: qa-test-planner
-description: Generate comprehensive test plans, manual test cases, regression test suites, and bug reports for QA engineers. Includes Figma MCP integration for design validation.
+description: Generate test plans, manual test cases, regression suites, and bug reports for QA engineers. Use when asked to write test cases, create a test plan, build a regression suite, validate a Figma design, or document a bug. Trigger phrases: "write test cases", "create test plan", "regression suite", "bug report", "QA plan", "validate against Figma".
 trigger: explicit
 ---
 
 # QA Test Planner
 
-A comprehensive skill for QA engineers to create test plans, generate manual test cases, build regression test suites, validate designs against Figma, and document bugs effectively.
+## Mindset
 
-> **Activation:** This skill is triggered only when explicitly called by name (e.g., `/qa-test-planner`, `qa-test-planner`, or `use the skill qa-test-planner`).
+**Coverage is not quality.** 200 shallow test cases that don't find bugs are worse than 20 deep ones targeting real risk. Always ask: "What could go wrong here that would hurt users or the business?" — not "What steps can I list?"
 
----
+**Test design reveals missing requirements.** When you can't write a test case because the expected result is undefined, that IS the bug — a requirements gap. Surface it instead of guessing.
 
-## Quick Start
+**Regression suites rot.** A suite that hasn't been touched since it was written is a liability: it produces false confidence. Every new bug fixed in production is evidence that something escaped your suite — add a test for it immediately.
 
-**Create a test plan:**
-```
-"Create a test plan for the user authentication feature"
-```
+**Severity ≠ Priority.** A cosmetic bug on the checkout confirmation page (low severity, high priority) blocks revenue. A crash in an admin-only import tool (high severity, low priority) does not block release. Conflating these causes wrong release decisions.
 
-**Generate test cases:**
-```
-"Generate manual test cases for the checkout flow"
-```
-
-**Build regression suite:**
-```
-"Build a regression test suite for the payment module"
-```
-
-**Validate against Figma:**
-```
-"Compare the login page against the Figma design at [URL]"
-```
-
-**Create bug report:**
-```
-"Create a bug report for the form validation issue"
-```
+**Bug reproduction is half the fix.** A bug report without a reliable repro is a guess. Before logging, validate that your steps reproduce the issue 3/3 times on a clean environment. If you can't get 3/3, say so explicitly.
 
 ---
 
-## Quick Reference
+## Navigation
 
-| Task | What You Get | Time |
-|------|--------------|------|
-| Test Plan | Strategy, scope, schedule, risks | 10-15 min |
-| Test Cases | Step-by-step instructions, expected results | 5-10 min each |
-| Regression Suite | Smoke tests, critical paths, execution order | 15-20 min |
-| Figma Validation | Design-implementation comparison, discrepancy list | 10-15 min |
-| Bug Report | Reproducible steps, environment, evidence | 5 min |
+**Use this skill when:**
+- Writing manual test cases for a feature, flow, or API
+- Producing a test plan document for a sprint, release, or project
+- Building or triaging a regression suite (smoke, targeted, full)
+- Validating UI implementation against Figma design specs
+- Filing a structured bug report with reliable reproduction steps
+- Reviewing test coverage gaps before a release
 
----
+**Do NOT use this skill when:**
+- Writing automated test code (unit tests, Selenium, Cypress, Playwright) — use a coding skill
+- Defining acceptance criteria during sprint planning — that's product, not QA
+- Reviewing PRs for logic bugs — use a code-review skill
 
-## How It Works
+**Ambiguous input decision tree:**
 
 ```
-Your Request
-    │
-    ▼
-┌─────────────────────────────────────────────────────┐
-│ 1. ANALYZE                                          │
-│    • Parse feature/requirement                      │
-│    • Identify test types needed                     │
-│    • Determine scope and priorities                 │
-├─────────────────────────────────────────────────────┤
-│ 2. GENERATE                                         │
-│    • Create structured deliverables                 │
-│    • Apply templates and best practices             │
-│    • Include edge cases and variations              │
-├─────────────────────────────────────────────────────┤
-│ 3. VALIDATE                                         │
-│    • Check completeness                             │
-│    • Verify traceability                            │
-│    • Ensure actionable steps                        │
-└─────────────────────────────────────────────────────┘
-    │
-    ▼
-QA Deliverable Ready
+"Test X" →
+  Has expected behavior defined? → YES → write test cases
+                                 → NO  → surface requirements gap first
+  Is X a feature or a bug? → FEATURE → test plan + cases
+                           → BUG     → bug report + regression test
+  Is there a Figma link? → YES → include Figma validation checklist
 ```
 
----
-
-## Commands
-
-### Interactive Scripts
+If test case writing reveals undefined expected behavior (requirements gap), pause and use requirements-clarity to surface and formalize the missing requirement before continuing.
 
-| Script | Purpose | Usage |
-|--------|---------|-------|
-| `./scripts/generate_test_cases.sh` | Create test cases interactively | Step-by-step prompts |
-| `./scripts/create_bug_report.sh` | Generate bug reports | Guided input collection |
+---
 
-### Natural Language
+## Philosophy
 
-| Request | Output |
-|---------|--------|
-| "Create test plan for {feature}" | Complete test plan document |
-| "Generate {N} test cases for {feature}" | Numbered test cases with steps |
-| "Build smoke test suite" | Critical path tests |
-| "Compare with Figma at {URL}" | Visual validation checklist |
-| "Document bug: {description}" | Structured bug report |
+Test design is a risk-reduction activity, not a checklist activity. Allocate test effort proportional to consequence-of-failure, not proportional to feature size. The goal is to find bugs before users do — not to maximize test count.
 
 ---
 
-## Core Deliverables
-
-### 1. Test Plans
-- Test scope and objectives
-- Testing approach and strategy
-- Environment requirements
-- Entry/exit criteria
-- Risk assessment
-- Timeline and milestones
-
-### 2. Manual Test Cases
-- Step-by-step instructions
-- Expected vs actual results
-- Preconditions and setup
-- Test data requirements
-- Priority and severity
-
-### 3. Regression Suites
-- Smoke tests (15-30 min)
-- Full regression (2-4 hours)
-- Targeted regression (30-60 min)
-- Execution order and dependencies
-
-### 4. Figma Validation
-- Component-by-component comparison
-- Spacing and typography checks
-- Color and visual consistency
-- Interactive state validation
-
-### 5. Bug Reports
-- Clear reproduction steps
-- Environment details
-- Evidence (screenshots, logs)
-- Severity and priority
-
----
+## NEVER
 
-## Anti-Patterns
+- **NEVER write a test case without a specific expected result** — "it works correctly" is untestable. An expected result must be observable and falsifiable (exact text, HTTP status, pixel dimension, state change).
 
-| Avoid | Why | Instead |
-|-------|-----|---------|
-| Vague test steps | Can't reproduce | Specific actions + expected results |
-| Missing preconditions | Tests fail unexpectedly | Document all setup requirements |
-| No test data | Tester blocked | Provide sample data or generation |
-| Generic bug titles | Hard to track | Specific: "[Feature] issue when [action]" |
-| Skip edge cases | Miss critical bugs | Include boundary values, nulls |
+- **NEVER omit preconditions** — a test case that says "log in and navigate to checkout" without specifying which user account, what items are in the cart, or what environment is in use will produce inconsistent results and waste retest cycles.
 
----
+- **NEVER rate severity and priority identically by default** — severity is about damage, priority is about when to fix it. Defaulting both to the same value causes triage meetings to debate bugs that don't need debating.
 
-## Verification Checklist
+- **NEVER let a regression suite grow unbounded without a culling policy** — suites that only ever add tests become unmaintainable. Every suite needs an owner and a rule: "if a test has passed 20 consecutive runs with no related code changes, retire it to archive."
 
-**Test Plan:**
-- [ ] Scope clearly defined (in/out)
-- [ ] Entry/exit criteria specified
-- [ ] Risks identified with mitigations
-- [ ] Timeline realistic
+- **NEVER log a bug against design ambiguity as a severity-high defect** — if the spec doesn't define the behavior, the implementation isn't wrong, the spec is incomplete. Log it as a requirements clarification, not a defect; otherwise it creates adversarial dev-QA dynamics.
 
-**Test Cases:**
-- [ ] Each step has expected result
-- [ ] Preconditions documented
-- [ ] Test data available
-- [ ] Priority assigned
+- **NEVER skip environment documentation in a bug report** — "it broke" without browser version, OS, build hash, and test account details cannot be reproduced by a developer in a different environment. Unreproducible bugs get closed or deprioritized.
 
-**Bug Reports:**
-- [ ] Reproducible steps
-- [ ] Environment documented
-- [ ] Screenshots/evidence attached
-- [ ] Severity/priority set
+- **NEVER treat Figma pixel-perfect matching as the exit criterion for UI** — Figma shows design intent, not the only valid implementation. ±2px on padding is noise; wrong color token, missing disabled state, or broken focus ring is a real bug. Calibrate accordingly.
 
 ---
 
-## References
+## When Things Go Wrong
 
-- [Test Case Templates](references/test_case_templates.md) - Standard formats for all test types
-- [Bug Report Templates](references/bug_report_templates.md) - Documentation templates
-- [Regression Testing Guide](references/regression_testing.md) - Suite building and execution
-- [Figma Validation Guide](references/figma_validation.md) - Design-implementation validation
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| Tests pass but bugs still reach production | Suite covers happy paths only; no negative/edge-case tests | Audit the last 5 production bugs — add a test for each; add boundary and error-state cases |
+| Developer closes bugs as "not reproducible" | Missing environment/build details in reports | Add a "verified repro: 3/3" line to every bug; include build hash and account used |
+| Test plan is ignored by the team | Too long, too generic, not linked to actual tickets | Cut to 1 page; link every test area to a JIRA/ticket; include a go/no-go checklist |
+| Figma validation produces endless disputes | No agreed tolerance defined | Agree upfront: layout ±4px OK, color tokens exact, interactive states exact |
+| Regression suite takes too long to run | No smoke-first gating; suite not tiered | Add a 15-min smoke gate; if smoke fails, halt and do not run full suite |
 
 ---
 
-<details>
-<summary><strong>Deep Dive: Test Case Structure</strong></summary>
-
-### Standard Test Case Format
+## Usage
 
-```markdown
-## TC-001: [Test Case Title]
+**Natural language requests:**
 
-**Priority:** High | Medium | Low
-**Type:** Functional | UI | Integration | Regression
-**Status:** Not Run | Pass | Fail | Blocked
-
-### Objective
-[What are we testing and why]
-
-### Preconditions
-- [Setup requirement 1]
-- [Setup requirement 2]
-- [Test data needed]
-
-### Test Steps
-1. [Action to perform]
-   **Expected:** [What should happen]
-
-2. [Action to perform]
-   **Expected:** [What should happen]
-
-3. [Action to perform]
-   **Expected:** [What should happen]
-
-### Test Data
-- Input: [Test data values]
-- User: [Test account details]
-- Configuration: [Environment settings]
-
-### Post-conditions
-- [System state after test]
-- [Cleanup required]
-
-### Notes
-- [Edge cases to consider]
-- [Related test cases]
-- [Known issues]
-```
-
-### Test Types
-
-| Type | Focus | Example |
-|------|-------|---------|
-| Functional | Business logic | Login with valid credentials |
-| UI/Visual | Appearance, layout | Button matches Figma design |
-| Integration | Component interaction | API returns data to frontend |
-| Regression | Existing functionality | Previous features still work |
-| Performance | Speed, load handling | Page loads under 3 seconds |
-| Security | Vulnerabilities | SQL injection prevented |
-
-</details>
-
-<details>
-<summary><strong>Deep Dive: Test Plan Template</strong></summary>
-
-### Test Plan Structure
-
-```markdown
-# Test Plan: [Feature/Release Name]
-
-## Executive Summary
-- Feature/product being tested
-- Testing objectives
-- Key risks
-- Timeline overview
-
-## Test Scope
-
-**In Scope:**
-- Features to be tested
-- Test types (functional, UI, performance)
-- Platforms and environments
-- User flows and scenarios
-
-**Out of Scope:**
-- Features not being tested
-- Known limitations
-- Third-party integrations (if applicable)
-
-## Test Strategy
-
-**Test Types:**
-- Manual testing
-- Exploratory testing
-- Regression testing
-- Integration testing
-- User acceptance testing
-
-**Test Approach:**
-- Black box testing
-- Positive and negative testing
-- Boundary value analysis
-- Equivalence partitioning
-
-## Test Environment
-- Operating systems
-- Browsers and versions
-- Devices (mobile, tablet, desktop)
-- Test data requirements
-- Backend/API environments
-
-## Entry Criteria
-- [ ] Requirements documented
-- [ ] Designs finalized
-- [ ] Test environment ready
-- [ ] Test data prepared
-- [ ] Build deployed
-
-## Exit Criteria
-- [ ] All high-priority test cases executed
-- [ ] 90%+ test case pass rate
-- [ ] All critical bugs fixed
-- [ ] No open high-severity bugs
-- [ ] Regression suite passed
-
-## Risk Assessment
-
-| Risk | Probability | Impact | Mitigation |
-|------|-------------|--------|------------|
-| [Risk 1] | H/M/L | H/M/L | [Mitigation] |
-
-## Test Deliverables
-- Test plan document
-- Test cases
-- Test execution reports
-- Bug reports
-- Test summary report
-```
-
-</details>
-
-<details>
-<summary><strong>Deep Dive: Bug Reporting</strong></summary>
-
-### Bug Report Template
-
-```markdown
-# BUG-[ID]: [Clear, specific title]
-
-**Severity:** Critical | High | Medium | Low
-**Priority:** P0 | P1 | P2 | P3
-**Type:** Functional | UI | Performance | Security
-**Status:** Open | In Progress | Fixed | Closed
-
-## Environment
-- **OS:** [Windows 11, macOS 14, etc.]
-- **Browser:** [Chrome 120, Firefox 121, etc.]
-- **Device:** [Desktop, iPhone 15, etc.]
-- **Build:** [Version/commit]
-- **URL:** [Page where bug occurs]
-
-## Description
-[Clear, concise description of the issue]
-
-## Steps to Reproduce
-1. [Specific step]
-2. [Specific step]
-3. [Specific step]
-
-## Expected Behavior
-[What should happen]
-
-## Actual Behavior
-[What actually happens]
-
-## Visual Evidence
-- Screenshot: [attached]
-- Video: [link if applicable]
-- Console errors: [paste errors]
-
-## Impact
-- **User Impact:** [How many users affected]
-- **Frequency:** [Always, Sometimes, Rarely]
-- **Workaround:** [If one exists]
-
-## Additional Context
-- Related to: [Feature/ticket]
-- Regression: [Yes/No]
-- Figma design: [Link if UI bug]
-```
-
-### Severity Definitions
-
-| Level | Criteria | Examples |
-|-------|----------|----------|
-| **Critical (P0)** | System crash, data loss, security | Payment fails, login broken |
-| **High (P1)** | Major feature broken, no workaround | Search not working |
-| **Medium (P2)** | Feature partial, workaround exists | Filter missing one option |
-| **Low (P3)** | Cosmetic, rare edge cases | Typo, minor alignment |
-
-</details>
-
-<details>
-<summary><strong>Deep Dive: Figma MCP Integration</strong></summary>
-
-### Design Validation Workflow
-
-**Prerequisites:**
-- Figma MCP server configured
-- Access to Figma design files
-- Figma URLs for components/pages
-
-**Process:**
-
-1. **Get Design Specs from Figma**
-```
-"Get the button specifications from Figma file [URL]"
-
-Response includes:
-- Dimensions (width, height)
-- Colors (background, text, border)
-- Typography (font, size, weight)
-- Spacing (padding, margin)
-- Border radius
-- States (default, hover, active, disabled)
-```
-
-2. **Compare Implementation**
-```
-TC: Primary Button Visual Validation
-1. Inspect primary button in browser dev tools
-2. Compare against Figma specs:
-   - Dimensions: 120x40px
-   - Border-radius: 8px
-   - Background color: #0066FF
-   - Font: 16px Medium #FFFFFF
-3. Document discrepancies
-```
-
-3. **Create Bug if Mismatch**
-```
-BUG: Primary button color doesn't match design
-Severity: Medium
-Expected (Figma): #0066FF
-Actual (Implementation): #0052CC
-Screenshot: [attached]
-Figma link: [specific component]
-```
-
-### What to Validate
-
-| Element | What to Check | Tool |
-|---------|---------------|------|
-| Colors | Hex values exact | Browser color picker |
-| Spacing | Padding/margin px | DevTools computed styles |
-| Typography | Font, size, weight | DevTools font panel |
-| Layout | Width, height, position | DevTools box model |
-| States | Hover, active, focus | Manual interaction |
-| Responsive | Breakpoint behavior | DevTools device mode |
-
-### Example Queries
-```
-"Get button specifications from Figma design [URL]"
-"Compare navigation menu implementation against Figma design"
-"Extract spacing values for dashboard layout from Figma"
-"List all color tokens used in Figma design system"
-```
-
-</details>
-
-<details>
-<summary><strong>Deep Dive: Regression Testing</strong></summary>
-
-### Suite Structure
-
-| Suite Type | Duration | Frequency | Coverage |
-|------------|----------|-----------|----------|
-| Smoke | 15-30 min | Daily | Critical paths only |
-| Targeted | 30-60 min | Per change | Affected areas |
-| Full | 2-4 hours | Weekly/Release | Comprehensive |
-| Sanity | 10-15 min | After hotfix | Quick validation |
-
-### Building a Regression Suite
-
-**Step 1: Identify Critical Paths**
-- What can users NOT live without?
-- What generates revenue?
-- What handles sensitive data?
-- What's used most frequently?
-
-**Step 2: Prioritize Test Cases**
-
-| Priority | Description | Must Run |
-|----------|-------------|----------|
-| P0 | Business-critical, security | Always |
-| P1 | Major features, common flows | Weekly+ |
-| P2 | Minor features, edge cases | Releases |
-
-**Step 3: Execution Order**
-1. Smoke first - if fails, stop and fix build
-2. P0 tests next - must pass before proceeding
-3. P1 then P2 - track all failures
-4. Exploratory - find unexpected issues
-
-### Pass/Fail Criteria
-
-**PASS:**
-- All P0 tests pass
-- 90%+ P1 tests pass
-- No critical bugs open
-
-**FAIL (Block Release):**
-- Any P0 test fails
-- Critical bug discovered
-- Security vulnerability
-- Data loss scenario
-
-**CONDITIONAL:**
-- P1 failures with workarounds
-- Known issues documented
-- Fix plan in place
-
-</details>
-
-<details>
-<summary><strong>Deep Dive: Test Execution Tracking</strong></summary>
-
-### Test Run Report Template
-
-```markdown
-# Test Run: [Release Version]
-
-**Date:** 2024-01-15
-**Build:** v2.5.0-rc1
-**Tester:** [Name]
-**Environment:** Staging
-
-## Summary
-- Total Test Cases: 150
-- Executed: 145
-- Passed: 130
-- Failed: 10
-- Blocked: 5
-- Not Run: 5
-- Pass Rate: 90%
-
-## Test Cases by Priority
-
-| Priority | Total | Pass | Fail | Blocked |
-|----------|-------|------|------|---------|
-| P0 (Critical) | 25 | 23 | 2 | 0 |
-| P1 (High) | 50 | 45 | 3 | 2 |
-| P2 (Medium) | 50 | 45 | 3 | 2 |
-| P3 (Low) | 25 | 17 | 2 | 1 |
-
-## Critical Failures
-- TC-045: Payment processing fails
-  - Bug: BUG-234
-  - Status: Open
-
-## Blocked Tests
-- TC-112: Dashboard widget (API endpoint down)
-
-## Risks
-- 2 critical bugs blocking release
-- Payment integration needs attention
-
-## Next Steps
-- Retest after BUG-234 fix
-- Complete remaining 5 test cases
-- Run full regression before sign-off
-```
-
-### Coverage Tracking
-
-```markdown
-## Coverage Matrix
+| Request | Output |
+|---------|--------|
+| "Create test plan for {feature}" | Test plan with scope, strategy, risks, entry/exit criteria |
+| "Generate test cases for {flow}" | Step-by-step cases with preconditions and expected results |
+| "Build smoke test suite" | Prioritized critical-path tests, execution order |
+| "Compare with Figma at {URL}" | Component-by-component validation checklist |
+| "Document bug: {description}" | Structured bug report with severity/priority guidance |
 
-| Feature | Requirements | Test Cases | Status | Gaps |
-|---------|--------------|------------|--------|------|
-| Login | 8 | 12 | Complete | None |
-| Checkout | 15 | 10 | Partial | Payment errors |
-| Dashboard | 12 | 15 | Complete | None |
-```
+**Scripts:**
 
-</details>
-
-<details>
-<summary><strong>QA Process Workflow</strong></summary>
-
-### Phase 1: Planning
-- [ ] Review requirements and designs
-- [ ] Create test plan
-- [ ] Identify test scenarios
-- [ ] Estimate effort and timeline
-- [ ] Set up test environment
-
-### Phase 2: Test Design
-- [ ] Write test cases
-- [ ] Review test cases with team
-- [ ] Prepare test data
-- [ ] Build regression suite
-- [ ] Get Figma design access
-
-### Phase 3: Execution
-- [ ] Execute test cases
-- [ ] Log bugs with clear steps
-- [ ] Validate against Figma (UI tests)
-- [ ] Track test progress
-- [ ] Communicate blockers
-
-### Phase 4: Reporting
-- [ ] Compile test results
-- [ ] Analyze coverage
-- [ ] Document risks
-- [ ] Provide go/no-go recommendation
-- [ ] Archive test artifacts
-
-</details>
-
-<details>
-<summary><strong>Best Practices</strong></summary>
-
-### Test Case Writing
-
-**DO:**
-- Be specific and unambiguous
-- Include expected results for each step
-- Test one thing per test case
-- Use consistent naming conventions
-- Keep test cases maintainable
-
-**DON'T:**
-- Assume knowledge
-- Make test cases too long
-- Skip preconditions
-- Forget edge cases
-- Leave expected results vague
-
-### Bug Reporting
-
-**DO:**
-- Provide clear reproduction steps
-- Include screenshots/videos
-- Specify exact environment details
-- Describe impact on users
-- Link to Figma for UI bugs
-
-**DON'T:**
-- Report without reproduction steps
-- Use vague descriptions
-- Skip environment details
-- Forget to assign priority
-- Duplicate existing bugs
-
-### Regression Testing
-
-**DO:**
-- Automate repetitive tests when possible
-- Maintain regression suite regularly
-- Prioritize critical paths
-- Run smoke tests frequently
-- Update suite after each release
-
-**DON'T:**
-- Skip regression before releases
-- Let suite become outdated
-- Test everything every time
-- Ignore failed regression tests
-
-</details>
+| Script | Purpose |
+|--------|---------|
+| `./scripts/generate_test_cases.sh` | Interactive test case generation |
+| `./scripts/create_bug_report.sh` | Guided bug report input |
 
 ---
 
-## Examples
-
-<details>
-<summary><strong>Example: Login Flow Test Case</strong></summary>
-
-```markdown
-## TC-LOGIN-001: Valid User Login
-
-**Priority:** P0 (Critical)
-**Type:** Functional
-**Estimated Time:** 2 minutes
-
-### Objective
-Verify users can successfully login with valid credentials
-
-### Preconditions
-- User account exists (test@example.com / Test123!)
-- User is not already logged in
-- Browser cookies cleared
-
-### Test Steps
-1. Navigate to https://app.example.com/login
-   **Expected:** Login page displays with email and password fields
-
-2. Enter email: test@example.com
-   **Expected:** Email field accepts input
-
-3. Enter password: Test123!
-   **Expected:** Password field shows masked characters
+## Core Outputs
 
-4. Click "Login" button
-   **Expected:**
-   - Loading indicator appears
-   - User redirected to /dashboard
-   - Welcome message shown: "Welcome back, Test User"
-   - Avatar/profile image displayed in header
+**Test Plans** — scope (in/out), strategy, environment requirements, entry/exit criteria, risk assessment with mitigations. See [test_case_templates.md](references/test_case_templates.md).
 
-### Post-conditions
-- User session created
-- Auth token stored
-- Analytics event logged
+**Manual Test Cases** — atomic steps, explicit expected results per step, preconditions, test data, priority. See [test_case_templates.md](references/test_case_templates.md).
 
-### Edge Cases to Consider
-- TC-LOGIN-002: Invalid password
-- TC-LOGIN-003: Non-existent email
-- TC-LOGIN-004: SQL injection attempt
-- TC-LOGIN-005: Very long password
-```
-
-</details>
-
-<details>
-<summary><strong>Example: Responsive Design Test Case</strong></summary>
-
-```markdown
-## TC-UI-045: Mobile Navigation Menu
-
-**Priority:** P1 (High)
-**Type:** UI/Responsive
-**Devices:** Mobile (iPhone, Android)
-
-### Objective
-Verify navigation menu works correctly on mobile devices
+**Regression Suites** — tiered by duration (smoke 15-30 min / targeted 30-60 min / full 2-4 hr), execution order, pass/fail gate criteria. See [regression_testing.md](references/regression_testing.md).
 
-### Preconditions
-- Access from mobile device or responsive mode
-- Viewport width: 375px (iPhone SE) to 428px (iPhone Pro Max)
-
-### Test Steps
-1. Open homepage on mobile device
-   **Expected:** Hamburger menu icon visible (top-right)
-
-2. Tap hamburger icon
-   **Expected:**
-   - Menu slides in from right
-   - Overlay appears over content
-   - Close (X) button visible
-
-3. Tap menu item
-   **Expected:** Navigate to section, menu closes
-
-4. Compare against Figma mobile design [link]
-   **Expected:**
-   - Menu width: 280px
-   - Slide animation: 300ms ease-out
-   - Overlay opacity: 0.5, color #000000
-   - Font size: 16px, line-height 24px
-
-### Breakpoints to Test
-- 375px (iPhone SE)
-- 390px (iPhone 14)
-- 428px (iPhone 14 Pro Max)
-- 360px (Galaxy S21)
-```
+**Figma Validation** — color token exact match, spacing within ±4px tolerance, all interactive states (hover/focus/disabled/active), responsive breakpoints. See [figma_validation.md](references/figma_validation.md).
 
-</details>
+**Bug Reports** — severity vs. priority matrix, verified repro steps (3/3), environment block, impact statement. See [bug_report_templates.md](references/bug_report_templates.md).
 
 ---
 
-**"Testing shows the presence, not the absence of bugs." - Edsger Dijkstra**
+## Expert Heuristics Reference
 
-**"Quality is not an act, it is a habit." - Aristotle**
+See [heuristics.md](references/heuristics.md) for practitioner decision trees covering:
+- Risk-based test prioritization (what to test first when time is short)
+- Boundary value and equivalence partitioning selection rules
+- When to escalate a blocked test vs. log it and move on
+- Regression culling policy templates
+- Release gate go/no-go checklist
diff --git a/skills/qa-test-planner/references/heuristics.md b/skills/qa-test-planner/references/heuristics.md
new file mode 100644
index 0000000..573709a
--- /dev/null
+++ b/skills/qa-test-planner/references/heuristics.md
@@ -0,0 +1,164 @@
+# QA Expert Heuristics
+
+Practitioner decision trees and anti-pattern catalogs. This file contains domain knowledge that belongs outside the main SKILL.md to preserve progressive disclosure.
+
+---
+
+## Risk-Based Test Prioritization
+
+When time is short (and it always is), allocate effort using this stack-ranked model:
+
+```
+TIER 1 — Test always, no exceptions
+  ├── Flows that touch money (payment, billing, refunds)
+  ├── Auth/session management (login, logout, token expiry)
+  ├── Data persistence (save, delete, import/export)
+  └── Security surfaces (input fields, file uploads, API endpoints)
+
+TIER 2 — Test on every release
+  ├── Primary user journeys for the feature under test
+  ├── Any flow that changed in this release (regression)
+  └── Integration points with external services
+
+TIER 3 — Test before major releases
+  ├── Edge cases and boundary values
+  ├── Low-traffic but high-consequence paths
+  └── Accessibility (keyboard nav, screen reader)
+
+TIER 4 — Test when explicitly scoped
+  ├── Performance under load
+  ├── Cross-browser matrix beyond the primary browser
+  └── Pixel-perfect visual matching
+```
+
+**Rule:** If you're told "we only have 2 hours," run all Tier 1 and the directly-changed Tier 2 items. Document what was skipped and why.
+
+---
+
+## Boundary Value Selection Rules
+
+Don't test random values. Use these rules to find the inputs most likely to expose bugs:
+
+| Input type | Values to test |
+|------------|----------------|
+| Numeric range (e.g., age 1-120) | Min (1), Max (120), Min-1 (0), Max+1 (121), midpoint (60) |
+| String length limit (e.g., max 255 chars) | 0, 1, limit-1, limit, limit+1 |
+| Required field | Empty string, whitespace-only, null (if API) |
+| Select/dropdown | Each option individually; invalid option via API bypass |
+| File upload | Min size (1 byte), max size, max+1, wrong MIME type, no extension |
+| Date field | Today, past, future, 2/28/leap year, 12/31, timezone boundaries |
+| Currency/decimal | $0.00, $0.01, large amount, negative, 3 decimal places |
+
+**Anti-pattern:** Testing `username = "testuser"` and calling it coverage. Mid-range valid values don't find the bugs; boundaries do.
+
+---
+
+## Equivalence Partitioning: Which Partition to Pick
+
+When an input space is too large to test exhaustively, identify partitions where all values in the partition should behave identically:
+
+1. Identify valid partitions (inputs the system should accept)
+2. Identify invalid partitions (inputs the system should reject)
+3. Pick **one representative value per partition** — not several
+4. Verify you've covered all invalid partitions, not just one
+
+**Example — email field:**
+- Valid: `user@domain.com` (standard format)
+- Invalid #1: `nodomain` (no @)
+- Invalid #2: `@nodomain.com` (no local part)
+- Invalid #3: `user@` (no domain)
+- Invalid #4: `user@domain` (no TLD — debatable by spec; clarify)
+- Invalid #5: SQL injection: `'; DROP TABLE users; --`
+- Invalid #6: XSS: `<script>alert(1)</script>@test.com`
+
+---
+
+## Blocked Test Escalation Decision Tree
+
+```
+Test is blocked →
+  Can I work around it and test the same behavior another way?
+    YES → document workaround, proceed, note in report
+    NO  → Is the blocker expected to resolve within this sprint?
+            YES → flag as BLOCKED, estimate when to retest
+            NO  → escalate: this is a test environment/infrastructure risk
+                  → notify lead, add to risk register, adjust exit criteria
+```
+
+**Rule:** Never mark a blocked test as PASS to maintain metrics. Mark it BLOCKED with a reason and estimated resolution.
+
+---
+
+## Regression Suite Culling Policy Template
+
+Use this policy to prevent suite bloat:
+
+```
+RETIRE a test case when ALL of the following are true:
+  1. It has passed 20+ consecutive runs without modification
+  2. The underlying code area has not changed in 90+ days
+  3. No production bug has been filed against this area in 6 months
+
+ARCHIVE (don't delete) — move to archive/ folder with:
+  - Date archived
+  - Rationale
+  - Reactivation condition ("reactivate if payment module changes")
+
+ADD to suite immediately when:
+  - A production bug is fixed (test for the exact defect)
+  - A new integration point is added
+  - A security finding is remediated
+```
+
+---
+
+## Release Gate: Go / No-Go Checklist
+
+Before signing off on a release, verify:
+
+**Hard stops (any NO = no-go):**
+- [ ] All P0 (critical) test cases passed
+- [ ] Zero open severity-critical bugs
+- [ ] Zero open severity-high bugs without an approved workaround
+- [ ] Regression suite executed on release build (not a branch build)
+- [ ] Security surfaces tested (auth, input validation, file upload)
+
+**Risk accepts (document if NO, owner must sign off):**
+- [ ] P1 tests ≥90% pass rate
+- [ ] Mobile/responsive tested on primary device matrix
+- [ ] Accessibility spot-checked
+- [ ] Performance baseline within 20% of previous release
+
+**Documentation complete:**
+- [ ] Test run report filed with pass/fail counts
+- [ ] All open bugs triaged with priority and ETA
+- [ ] Rollback plan identified if critical bug emerges post-release
+
+---
+
+## Severity vs. Priority: The Matrix
+
+Teams conflate these constantly. Use this:
+
+| | Low Priority (can wait) | High Priority (fix now) |
+|---|---|---|
+| **High Severity** (data loss, crash) | Admin-only import crash with 3 users/month | Checkout crash on primary flow |
+| **Low Severity** (cosmetic, typo) | Misaligned icon in footer | Wrong price text on pricing page |
+
+**Severity** = scope of damage if triggered  
+**Priority** = urgency of fix relative to release timeline
+
+Always set them independently. If they always match, your team is conflating them.
+
+---
+
+## Common QA Anti-Patterns With Non-Obvious Reasons
+
+| Anti-Pattern | Why It's Harmful (non-obvious) |
+|---|---|
+| Testing only with admin accounts | Admin users often bypass the permission/validation logic that regular users hit; bugs hide in lower-privilege flows |
+| Running regression on staging with prod data | Test data side-effects (deletions, state changes) on prod data cause incidents; staging must use synthetic data |
+| Writing test cases after execution | Confirmation bias — you write the expected result as what you observed, not what the spec says; bugs get missed |
+| Using sequential test IDs without feature prefix | TC-1 through TC-500 in a flat namespace means a deleted test leaves a gap and you can't tell which feature lost coverage |
+| Logging every UI variation as a separate bug | 10 bugs for "button color wrong on 10 pages" creates triage noise; one bug with a list of affected pages is the right pattern |
+| Retesting the same happy path repeatedly | Happy paths don't find bugs; they confirm the system works when nothing goes wrong. Invest retest cycles in boundary and negative cases |
diff --git a/skills/react-dev/SKILL.md b/skills/react-dev/SKILL.md
index bd288f3..ffa0190 100644
--- a/skills/react-dev/SKILL.md
+++ b/skills/react-dev/SKILL.md
@@ -1,391 +1,96 @@
 ---
 name: react-dev
 version: 1.0.0
-description: This skill should be used when building React components with TypeScript, typing hooks, handling events, or when React TypeScript, React 19, Server Components are mentioned. Covers type-safe patterns for React 18-19 including generic components, proper event typing, and routing integration (TanStack Router, React Router).
+description: "Type-safe React component patterns for TypeScript — generic components, discriminated union props, React 19 ref/action patterns, TanStack Router and React Router v7 type safety. Use when writing or reviewing typed React components, hooks, Server Components, or router integration. Triggers: TypeScript React component, React 19, useActionState, forwardRef migration, TanStack Router, Server Action. NOT for MUI-specific or vanilla JS React."
 ---
 
 # React TypeScript
 
 Type-safe React = compile-time guarantees = confident refactoring.
 
-<when_to_use>
+## Mindset
 
-- Building typed React components
-- Implementing generic components
-- Typing event handlers, forms, refs
-- Using React 19 features (Actions, Server Components, use())
-- Router integration (TanStack Router, React Router)
-- Custom hooks with proper typing
-
-NOT for: non-React TypeScript, vanilla JS React
-
-</when_to_use>
-
-<react_19_changes>
-
-React 19 breaking changes require migration. Key patterns:
-
-**ref as prop** - forwardRef deprecated:
-
-```typescript
-// React 19 - ref as regular prop
-type ButtonProps = {
-  ref?: React.Ref<HTMLButtonElement>;
-} & React.ComponentPropsWithoutRef<'button'>;
-
-function Button({ ref, children, ...props }: ButtonProps) {
-  return <button ref={ref} {...props}>{children}</button>;
-}
-```
-
-**useActionState** - replaces useFormState:
-
-```typescript
-import { useActionState } from 'react';
-
-type FormState = { errors?: string[]; success?: boolean };
-
-function Form() {
-  const [state, formAction, isPending] = useActionState(submitAction, {});
-  return <form action={formAction}>...</form>;
-}
-```
-
-**use()** - unwraps promises/context:
-
-```typescript
-function UserProfile({ userPromise }: { userPromise: Promise<User> }) {
-  const user = use(userPromise); // Suspends until resolved
-  return <div>{user.name}</div>;
-}
-```
-
-See [react-19-patterns.md](references/react-19-patterns.md) for useOptimistic, useTransition, migration checklist.
-
-</react_19_changes>
-
-<component_patterns>
-
-**Props** - extend native elements:
-
-```typescript
-type ButtonProps = {
-  variant: 'primary' | 'secondary';
-} & React.ComponentPropsWithoutRef<'button'>;
-
-function Button({ variant, children, ...props }: ButtonProps) {
-  return <button className={variant} {...props}>{children}</button>;
-}
-```
-
-**Children typing**:
-
-```typescript
-type Props = {
-  children: React.ReactNode;          // Anything renderable
-  icon: React.ReactElement;           // Single element
-  render: (data: T) => React.ReactNode;  // Render prop
-};
-```
-
-**Discriminated unions** for variant props:
-
-```typescript
-type ButtonProps =
-  | { variant: 'link'; href: string }
-  | { variant: 'button'; onClick: () => void };
-
-function Button(props: ButtonProps) {
-  if (props.variant === 'link') {
-    return <a href={props.href}>Link</a>;
-  }
-  return <button onClick={props.onClick}>Button</button>;
-}
-```
-
-</component_patterns>
-
-<event_handlers>
-
-Use specific event types for accurate target typing:
-
-```typescript
-// Mouse
-function handleClick(e: React.MouseEvent<HTMLButtonElement>) {
-  e.currentTarget.disabled = true;
-}
-
-// Form
-function handleSubmit(e: React.FormEvent<HTMLFormElement>) {
-  e.preventDefault();
-  const formData = new FormData(e.currentTarget);
-}
-
-// Input
-function handleChange(e: React.ChangeEvent<HTMLInputElement>) {
-  console.log(e.target.value);
-}
-
-// Keyboard
-function handleKeyDown(e: React.KeyboardEvent<HTMLInputElement>) {
-  if (e.key === 'Enter') e.currentTarget.blur();
-}
-```
-
-See [event-handlers.md](references/event-handlers.md) for focus, drag, clipboard, touch, wheel events.
-
-</event_handlers>
-
-<hooks_typing>
-
-**useState** - explicit for unions/null:
-
-```typescript
-const [user, setUser] = useState<User | null>(null);
-const [status, setStatus] = useState<'idle' | 'loading'>('idle');
-```
-
-**useRef** - null for DOM, value for mutable:
-
-```typescript
-const inputRef = useRef<HTMLInputElement>(null);  // DOM - use ?.
-const countRef = useRef<number>(0);               // Mutable - direct access
-```
+1. **Props shape behavior, not just data.** Discriminated unions at the prop level eliminate impossible states before runtime — model your variants in the type, not in conditionals.
+2. **Inference is a contract.** When TypeScript infers a generic, it locks in the relationship across all usages. Explicit annotations at call sites break that contract and cost you safety downstream.
+3. **Server/Client boundary is a compile-time seam.** Treat `'use server'` and `'use client'` like interface boundaries — only serializable data crosses them. Promises passed from Server to Client via `use()` are the intended bridge.
+4. **Avoid null coercion on DOM refs.** `inputRef.current!` trades a compile error for a runtime crash. Guard with `?.` or initialize with a default value.
+5. **Routing type safety is architectural, not cosmetic.** Choosing TanStack Router vs React Router v7 determines whether search params, path params, and loader data are typed at definition time or inferred from generated artifacts — pick based on project constraints, not familiarity.
 
-**useReducer** - discriminated unions for actions:
+## Navigation
 
-```typescript
-type Action =
-  | { type: 'increment' }
-  | { type: 'set'; payload: number };
-
-function reducer(state: State, action: Action): State {
-  switch (action.type) {
-    case 'set': return { ...state, count: action.payload };
-    default: return state;
-  }
-}
-```
-
-**Custom hooks** - tuple returns with as const:
-
-```typescript
-function useToggle(initial = false) {
-  const [value, setValue] = useState(initial);
-  const toggle = () => setValue(v => !v);
-  return [value, toggle] as const;
-}
-```
-
-**useContext** - null guard pattern:
-
-```typescript
-const UserContext = createContext<User | null>(null);
-
-function useUser() {
-  const user = useContext(UserContext);
-  if (!user) throw new Error('useUser outside UserProvider');
-  return user;
-}
-```
-
-See [hooks.md](references/hooks.md) for useCallback, useMemo, useImperativeHandle, useSyncExternalStore.
-
-</hooks_typing>
-
-<generic_components>
-
-Generic components infer types from props - no manual annotations at call site.
-
-**Pattern** - keyof T for column keys, render props for custom rendering:
-
-```typescript
-type Column<T> = {
-  key: keyof T;
-  header: string;
-  render?: (value: T[keyof T], item: T) => React.ReactNode;
-};
-
-type TableProps<T> = {
-  data: T[];
-  columns: Column<T>[];
-  keyExtractor: (item: T) => string | number;
-};
-
-function Table<T>({ data, columns, keyExtractor }: TableProps<T>) {
-  return (
-    <table>
-      <thead>
-        <tr>{columns.map(col => <th key={String(col.key)}>{col.header}</th>)}</tr>
-      </thead>
-      <tbody>
-        {data.map(item => (
-          <tr key={keyExtractor(item)}>
-            {columns.map(col => (
-              <td key={String(col.key)}>
-                {col.render ? col.render(item[col.key], item) : String(item[col.key])}
-              </td>
-            ))}
-          </tr>
-        ))}
-      </tbody>
-    </table>
-  );
-}
-```
-
-**Constrained generics** for required properties:
-
-```typescript
-type HasId = { id: string | number };
-
-function List<T extends HasId>({ items }: { items: T[] }) {
-  return <ul>{items.map(item => <li key={item.id}>...</li>)}</ul>;
-}
-```
-
-See [generic-components.md](examples/generic-components.md) for Select, List, Modal, FormField patterns.
-
-</generic_components>
-
-<server_components>
-
-React 19 Server Components run on server, can be async.
-
-**Async data fetching**:
-
-```typescript
-export default async function UserPage({ params }: { params: { id: string } }) {
-  const user = await fetchUser(params.id);
-  return <div>{user.name}</div>;
-}
-```
-
-**Server Actions** - 'use server' for mutations:
-
-```typescript
-'use server';
-
-export async function updateUser(userId: string, formData: FormData) {
-  await db.user.update({ where: { id: userId }, data: { ... } });
-  revalidatePath(`/users/${userId}`);
-}
-```
-
-**Client + Server Action**:
-
-```typescript
-'use client';
+**Use this skill when:**
+- Building typed React components, hooks, or event handlers
+- Implementing generic components (Table, List, Select, Modal)
+- Typing event handlers, forms, refs
+- Using React 19 features (Actions, Server Components, `use()`, `useActionState`)
+- Integrating TanStack Router or React Router v7
+- Custom hooks with proper typing
 
-import { useActionState } from 'react';
-import { updateUser } from '@/actions/user';
+**Do NOT use this skill when:**
+- Working in non-React TypeScript (Node scripts, libraries, Zod schemas standalone)
+- Vanilla JS React with no TypeScript
 
-function UserForm({ userId }: { userId: string }) {
-  const [state, formAction, isPending] = useActionState(
-    (prev, formData) => updateUser(userId, formData), {}
-  );
-  return <form action={formAction}>...</form>;
-}
-```
+Building a shared component library rather than app components? Use design-system-starter for token architecture, atomic hierarchy, and WCAG 2.1 accessibility patterns.
 
-**use() for promise handoff**:
+If the user is working with Material UI components specifically, switch to the mui skill for MUI v7 sx caching, slotProps, and Grid v2 patterns.
 
-```typescript
-// Server: pass promise without await
-async function Page() {
-  const userPromise = fetchUser('123');
-  return <UserProfile userPromise={userPromise} />;
-}
+### Router Decision Tree
 
-// Client: unwrap with use()
-'use client';
-function UserProfile({ userPromise }: { userPromise: Promise<User> }) {
-  const user = use(userPromise);
-  return <div>{user.name}</div>;
-}
 ```
-
-See [server-components.md](examples/server-components.md) for parallel fetching, streaming, error boundaries.
-
-</server_components>
-
-<routing>
-
-Both TanStack Router and React Router v7 provide type-safe routing solutions.
-
-**TanStack Router** - Compile-time type safety with Zod validation:
-
-```typescript
-import { createRoute } from '@tanstack/react-router';
-import { z } from 'zod';
-
-const userRoute = createRoute({
-  path: '/users/$userId',
-  component: UserPage,
-  loader: async ({ params }) => ({ user: await fetchUser(params.userId) }),
-  validateSearch: z.object({
-    tab: z.enum(['profile', 'settings']).optional(),
-    page: z.number().int().positive().default(1),
-  }),
-});
-
-function UserPage() {
-  const { user } = useLoaderData({ from: userRoute.id });
-  const { tab, page } = useSearch({ from: userRoute.id });
-  const { userId } = useParams({ from: userRoute.id });
-}
+Does the project use a meta-framework (Next.js, Remix)?
+├─ Yes → Next.js: use App Router + Server Actions (no separate router library needed)
+│         Remix/React Router v7: already baked in — use Framework Mode loaders/actions
+└─ No → Is file-based routing required?
+         ├─ Yes → React Router v7 (Framework Mode with Vite plugin)
+         └─ No → Does the team need compile-time Zod search-param validation?
+                  ├─ Yes → TanStack Router (validateSearch + generated route tree)
+                  └─ No → Either works; prefer TanStack Router for greenfield SPA,
+                           React Router v7 for teams already familiar with Remix patterns
 ```
 
-**React Router v7** - Automatic type generation with Framework Mode:
-
-```typescript
-import type { Route } from "./+types/user";
+**TanStack Router strengths:** Compile-time route tree, Zod `validateSearch`, full type inference from `useLoaderData`/`useSearch`/`useParams` without code generation at runtime.
 
-export async function loader({ params }: Route.LoaderArgs) {
-  return { user: await fetchUser(params.userId) };
-}
+**React Router v7 strengths:** Framework Mode generates `+types/` per route — loader return type is automatically inferred by component. Familiar for Remix users. Better SSR story.
 
-export default function UserPage({ loaderData }: Route.ComponentProps) {
-  const { user } = loaderData; // Typed from loader
-  return <h1>{user.name}</h1>;
-}
-```
+MANDATORY — read [references/tanstack-router.md](references/tanstack-router.md) and [references/react-router.md](references/react-router.md) before implementing routing.
 
-See [tanstack-router.md](references/tanstack-router.md) for TanStack patterns and [react-router.md](references/react-router.md) for React Router patterns.
+## Philosophy
 
-</routing>
+Model the impossible as unrepresentable: every `any`, every optional field that should be required, and every missing discriminant is a bug deferred to production. Type-safe React is not about satisfying the compiler — it is about encoding product rules in a language the compiler enforces.
 
-<rules>
+## NEVER
 
-ALWAYS:
-- Specific event types (MouseEvent, ChangeEvent, etc)
-- Explicit useState for unions/null
-- ComponentPropsWithoutRef for native element extension
-- Discriminated unions for variant props
-- as const for tuple returns
-- ref as prop in React 19 (no forwardRef)
-- useActionState for form actions
-- Type-safe routing patterns (see routing section)
+- **`any` for event handlers** — `any` widens the handler to accept events from unrelated elements, so `e.target.value` won't narrow correctly and you'll get silent `undefined` at runtime instead of a type error at compile time.
+- **`JSX.Element` as children type** — `JSX.Element` excludes strings, numbers, arrays, fragments, and `null`, rejecting valid children at compile time while `React.ReactNode` accepts all renderable values correctly.
+- **`forwardRef` in React 19+** — `forwardRef` wraps your component in an extra HOC layer and is deprecated; React 19 passes `ref` as a regular prop, so `forwardRef` adds indirection with no benefit and breaks the new ref cleanup return type.
+- **`useFormState` (deprecated)** — replaced by `useActionState` which also returns `isPending` as a third value; `useFormState` is removed in React 19 and will throw at runtime.
+- **Awaiting promises before passing to `use()`** — `use(await fetchUser())` defeats streaming: it blocks the Server Component until the promise settles, eliminating the concurrent rendering benefit that `use()` + Suspense is designed to provide.
+- **Mixing Server and Client component logic in the same file** — a file with both `'use server'` and `'use client'` at module scope is invalid; the bundler treats the entire file as one boundary, so server-only imports (db, fs) will leak into the client bundle.
+- **Non-null assertion (`!`) on DOM refs** — `inputRef.current!` crashes if the component unmounts or the ref hasn't attached yet; use optional chaining (`?.`) or guard the call site explicitly.
+- **Inline object/function literals in JSX props without `useCallback`/`useMemo`** — every render creates a new reference, triggering child re-renders even when `React.memo` is applied, making memoization silently ineffective.
 
-NEVER:
-- any for event handlers
-- JSX.Element for children (use ReactNode)
-- forwardRef in React 19+
-- useFormState (deprecated)
-- Forget null handling for DOM refs
-- Mix Server/Client components in same file
-- Await promises when passing to use()
+## When Things Go Wrong
 
-</rules>
+| Symptom | Root Cause | Fix |
+|---|---|---|
+| `Type 'string' is not assignable to type 'never'` on discriminated union | Switch/if is missing a case arm, so TypeScript narrows to `never` | Add the missing branch or an exhaustive check `default: satisfies never` |
+| `Property 'current' does not exist` on ref | Ref typed as `RefObject<T>` but used in a non-null context | Use optional chaining `ref.current?.focus()` or check `if (ref.current)` first |
+| Server Action throws "Functions cannot be passed directly to Client Components" | Passing an un-serializable callback as a prop across the Server/Client boundary | Wrap with `'use server'` inline or import from a server module; never pass closures |
+| `useLoaderData` returns `unknown` in TanStack Router | Route's `loader` return type is not inferred because `from` is missing or wrong | Pass `{ from: routeId }` explicitly: `useLoaderData({ from: '/users/$userId' })` |
+| React Router v7 `+types/` import missing | Vite plugin not configured or route file not under the routes directory | Check `vite.config.ts` for `reactRouter()` plugin and verify file is under `app/routes/` |
+| Child component re-renders despite `React.memo` | Prop is an inline object/function literal — new reference on every parent render | Hoist the value outside the component or wrap with `useMemo`/`useCallback` |
 
-<references>
+## Reference Loading Triggers
 
-- [hooks.md](references/hooks.md) - useState, useRef, useReducer, useContext, custom hooks
-- [event-handlers.md](references/event-handlers.md) - all event types, generic handlers
-- [react-19-patterns.md](references/react-19-patterns.md) - useActionState, use(), useOptimistic, migration
-- [generic-components.md](examples/generic-components.md) - Table, Select, List, Modal patterns
-- [server-components.md](examples/server-components.md) - async components, Server Actions, streaming
-- [tanstack-router.md](references/tanstack-router.md) - TanStack Router typed routes, search params, navigation
-- [react-router.md](references/react-router.md) - React Router v7 loaders, actions, type generation, forms
+MANDATORY — read the indicated reference file before working on each task:
 
-</references>
+| Task | Reference |
+|---|---|
+| React 19 features (`ref` as prop, `useActionState`, `use()`, migration) | [references/react-19-changes.md](references/react-19-changes.md) |
+| Component props, discriminated unions, children typing, polymorphic components | [references/component-patterns.md](references/component-patterns.md) |
+| Server Components, Server Actions, streaming, Server/Client boundary | [references/server-components.md](references/server-components.md) |
+| Typing hooks (`useState`, `useRef`, `useReducer`, `useContext`, custom hooks) | [references/hooks-typing.md](references/hooks-typing.md) |
+| Generic components (Table, List, Select, Modal, FormField) | [references/generic-components.md](references/generic-components.md) |
+| Event handler typing (mouse, form, keyboard, drag, clipboard) | [references/event-handlers.md](references/event-handlers.md) |
+| TanStack Router routes, search params, loader data | [references/tanstack-router.md](references/tanstack-router.md) |
+| React Router v7 loaders, actions, `+types/` generation | [references/react-router.md](references/react-router.md) |
diff --git a/skills/react-dev/references/component-patterns.md b/skills/react-dev/references/component-patterns.md
new file mode 100644
index 0000000..4a6be14
--- /dev/null
+++ b/skills/react-dev/references/component-patterns.md
@@ -0,0 +1,110 @@
+# Component Patterns
+
+Load this file when building typed React components, extending native elements, or modeling variant props with discriminated unions.
+
+## Props — extend native elements
+
+Use `ComponentPropsWithoutRef` to inherit all native element props without the ref type conflict.
+
+```typescript
+type ButtonProps = {
+  variant: 'primary' | 'secondary';
+} & React.ComponentPropsWithoutRef<'button'>;
+
+function Button({ variant, children, ...props }: ButtonProps) {
+  return <button className={variant} {...props}>{children}</button>;
+}
+```
+
+## Children typing
+
+```typescript
+type Props = {
+  children: React.ReactNode;               // Anything renderable (strings, numbers, arrays, null)
+  icon: React.ReactElement;                // Single React element only
+  render: (data: T) => React.ReactNode;   // Render prop
+  label: string | React.ReactElement;     // String or element
+};
+```
+
+Never use `JSX.Element` for children — it excludes strings, numbers, arrays, fragments, and `null`.
+
+## Discriminated unions for variant props
+
+Model variants as discriminants so TypeScript narrows the type in each branch:
+
+```typescript
+type ButtonProps =
+  | { variant: 'link'; href: string; onClick?: never }
+  | { variant: 'button'; onClick: () => void; href?: never };
+
+function Button(props: ButtonProps) {
+  if (props.variant === 'link') {
+    return <a href={props.href}>Link</a>;
+  }
+  return <button onClick={props.onClick}>Button</button>;
+}
+```
+
+## Exhaustive checks on discriminated unions
+
+When a switch/if is missing a case arm TypeScript narrows to `never`. Use `satisfies never` to get a compile error instead of a silent fallthrough:
+
+```typescript
+function renderVariant(props: ButtonProps) {
+  switch (props.variant) {
+    case 'link': return <a href={props.href} />;
+    case 'button': return <button onClick={props.onClick} />;
+    default: return props satisfies never; // compile error if case added without handling
+  }
+}
+```
+
+## Polymorphic components (as prop)
+
+```typescript
+type PolymorphicProps<T extends React.ElementType> = {
+  as?: T;
+  children: React.ReactNode;
+} & React.ComponentPropsWithoutRef<T>;
+
+function Box<T extends React.ElementType = 'div'>({
+  as,
+  children,
+  ...props
+}: PolymorphicProps<T>) {
+  const Component = as ?? 'div';
+  return <Component {...props}>{children}</Component>;
+}
+
+// Usage:
+<Box as="section" aria-label="main content">...</Box>
+<Box as="button" onClick={handleClick}>...</Box>
+```
+
+## Compound components
+
+```typescript
+type CardContextValue = { collapsible: boolean };
+const CardContext = createContext<CardContextValue | null>(null);
+
+function Card({ collapsible = false, children }: { collapsible?: boolean; children: React.ReactNode }) {
+  return (
+    <CardContext.Provider value={{ collapsible }}>
+      <div className="card">{children}</div>
+    </CardContext.Provider>
+  );
+}
+
+function CardHeader({ children }: { children: React.ReactNode }) {
+  const ctx = useContext(CardContext);
+  if (!ctx) throw new Error('CardHeader must be inside Card');
+  return <div className="card-header">{children}</div>;
+}
+
+Card.Header = CardHeader;
+```
+
+## Generic components
+
+See [generic-components.md](../examples/generic-components.md) for Table, Select, List, Modal patterns with full `keyof T` typing.
diff --git a/skills/react-dev/references/generic-components.md b/skills/react-dev/references/generic-components.md
new file mode 100644
index 0000000..e9e53ef
--- /dev/null
+++ b/skills/react-dev/references/generic-components.md
@@ -0,0 +1,172 @@
+# Generic Components
+
+Load this file when building generic, reusable React components: Table, List, Select, Modal, FormField.
+
+## Core pattern — keyof T for column keys
+
+Generic components infer types from props — no manual type annotations at call sites.
+
+```typescript
+type Column<T> = {
+  key: keyof T;
+  header: string;
+  render?: (value: T[keyof T], item: T) => React.ReactNode;
+};
+
+type TableProps<T> = {
+  data: T[];
+  columns: Column<T>[];
+  keyExtractor: (item: T) => string | number;
+  onRowClick?: (item: T) => void;
+};
+
+function Table<T>({ data, columns, keyExtractor, onRowClick }: TableProps<T>) {
+  return (
+    <table>
+      <thead>
+        <tr>{columns.map(col => <th key={String(col.key)}>{col.header}</th>)}</tr>
+      </thead>
+      <tbody>
+        {data.map(item => (
+          <tr key={keyExtractor(item)} onClick={() => onRowClick?.(item)}>
+            {columns.map(col => (
+              <td key={String(col.key)}>
+                {col.render
+                  ? col.render(item[col.key], item)
+                  : String(item[col.key] ?? '')}
+              </td>
+            ))}
+          </tr>
+        ))}
+      </tbody>
+    </table>
+  );
+}
+
+// Usage — T is inferred as User
+type User = { id: number; name: string; email: string };
+
+<Table
+  data={users}
+  columns={[
+    { key: 'name', header: 'Name' },
+    { key: 'email', header: 'Email', render: val => <a href={`mailto:${val}`}>{String(val)}</a> },
+  ]}
+  keyExtractor={u => u.id}
+/>
+```
+
+## Constrained generics — require specific properties
+
+```typescript
+type HasId = { id: string | number };
+
+function List<T extends HasId>({
+  items,
+  renderItem,
+}: {
+  items: T[];
+  renderItem: (item: T) => React.ReactNode;
+}) {
+  return <ul>{items.map(item => <li key={item.id}>{renderItem(item)}</li>)}</ul>;
+}
+```
+
+## Generic Select / Combobox
+
+```typescript
+type SelectProps<T> = {
+  options: T[];
+  value: T | null;
+  onChange: (value: T) => void;
+  getLabel: (option: T) => string;
+  getValue: (option: T) => string | number;
+  placeholder?: string;
+};
+
+function Select<T>({ options, value, onChange, getLabel, getValue, placeholder }: SelectProps<T>) {
+  return (
+    <select
+      value={value ? String(getValue(value)) : ''}
+      onChange={e => {
+        const selected = options.find(o => String(getValue(o)) === e.target.value);
+        if (selected) onChange(selected);
+      }}
+    >
+      {placeholder && <option value="">{placeholder}</option>}
+      {options.map(option => (
+        <option key={getValue(option)} value={String(getValue(option))}>
+          {getLabel(option)}
+        </option>
+      ))}
+    </select>
+  );
+}
+```
+
+## Generic Modal
+
+```typescript
+type ModalProps<T> = {
+  isOpen: boolean;
+  onClose: () => void;
+  data: T | null;
+  title: string;
+  children: (data: T) => React.ReactNode;
+};
+
+function Modal<T>({ isOpen, onClose, data, title, children }: ModalProps<T>) {
+  if (!isOpen || !data) return null;
+  return (
+    <dialog open aria-modal aria-label={title}>
+      <button onClick={onClose} aria-label="Close">×</button>
+      <h2>{title}</h2>
+      {children(data)}
+    </dialog>
+  );
+}
+
+// Usage
+<Modal isOpen={!!selectedUser} onClose={() => setSelectedUser(null)} data={selectedUser} title="User Details">
+  {user => <UserCard user={user} />}
+</Modal>
+```
+
+## Generic FormField
+
+```typescript
+type FormFieldProps<T extends Record<string, unknown>> = {
+  name: keyof T & string;
+  label: string;
+  value: T[keyof T];
+  onChange: (name: keyof T & string, value: T[keyof T]) => void;
+  type?: 'text' | 'email' | 'number';
+};
+
+function FormField<T extends Record<string, unknown>>({
+  name,
+  label,
+  value,
+  onChange,
+  type = 'text',
+}: FormFieldProps<T>) {
+  return (
+    <label>
+      {label}
+      <input
+        name={name}
+        type={type}
+        value={String(value ?? '')}
+        onChange={e => onChange(name, e.target.value as T[keyof T])}
+      />
+    </label>
+  );
+}
+```
+
+## Tips
+
+- Use `keyof T` (not `string`) for property keys — ensures the key exists on the data type.
+- `render?: (value: T[keyof T], item: T)` gives access to both the cell value and the full row.
+- Constrain generics (`T extends HasId`) when downstream code needs a guaranteed property.
+- Avoid `any` in generic component implementations — use `unknown` and narrow explicitly.
diff --git a/skills/react-dev/references/hooks-typing.md b/skills/react-dev/references/hooks-typing.md
new file mode 100644
index 0000000..4b18384
--- /dev/null
+++ b/skills/react-dev/references/hooks-typing.md
@@ -0,0 +1,145 @@
+# Hooks Typing
+
+Load this file when typing React hooks: `useState`, `useRef`, `useReducer`, `useContext`, or custom hooks.
+
+## useState — explicit for unions and null
+
+```typescript
+const [user, setUser] = useState<User | null>(null);
+const [status, setStatus] = useState<'idle' | 'loading' | 'error'>('idle');
+const [count, setCount] = useState(0); // inferred as number — no annotation needed
+```
+
+## useRef — null for DOM, initial value for mutable
+
+```typescript
+// DOM ref — initialize with null, always use ?. when accessing
+const inputRef = useRef<HTMLInputElement>(null);
+inputRef.current?.focus();
+
+// Mutable value ref — direct access is safe
+const countRef = useRef<number>(0);
+countRef.current += 1;
+
+// Interval / timeout ref
+const timerRef = useRef<ReturnType<typeof setInterval> | null>(null);
+```
+
+Never use `!` on DOM refs — use `?.` or guard with `if (ref.current)`.
+
+## useReducer — discriminated union actions
+
+```typescript
+type State = { count: number; status: 'idle' | 'loading' };
+
+type Action =
+  | { type: 'increment' }
+  | { type: 'reset' }
+  | { type: 'set'; payload: number }
+  | { type: 'setStatus'; payload: State['status'] };
+
+function reducer(state: State, action: Action): State {
+  switch (action.type) {
+    case 'increment': return { ...state, count: state.count + 1 };
+    case 'reset':     return { ...state, count: 0 };
+    case 'set':       return { ...state, count: action.payload };
+    case 'setStatus': return { ...state, status: action.payload };
+    default:          return action satisfies never; // exhaustive check
+  }
+}
+
+function Counter() {
+  const [state, dispatch] = useReducer(reducer, { count: 0, status: 'idle' });
+  return <button onClick={() => dispatch({ type: 'increment' })}>{state.count}</button>;
+}
+```
+
+## Custom hooks — tuple returns with as const
+
+Without `as const`, TypeScript widens the tuple to `(boolean | (() => void))[]`, losing positional types.
+
+```typescript
+function useToggle(initial = false): readonly [boolean, () => void] {
+  const [value, setValue] = useState(initial);
+  const toggle = useCallback(() => setValue(v => !v), []);
+  return [value, toggle] as const;
+}
+
+// Usage — types are [boolean, () => void]
+const [isOpen, toggleOpen] = useToggle();
+```
+
+## useContext — null guard pattern
+
+```typescript
+const UserContext = createContext<User | null>(null);
+
+function useUser(): User {
+  const user = useContext(UserContext);
+  if (!user) throw new Error('useUser must be used inside UserProvider');
+  return user;
+}
+
+function UserProvider({ children }: { children: React.ReactNode }) {
+  const [user] = useState<User | null>(null);
+  return <UserContext.Provider value={user}>{children}</UserContext.Provider>;
+}
+```
+
+## useCallback — stable references for memoized children
+
+```typescript
+function Parent({ userId }: { userId: string }) {
+  const handleDelete = useCallback(async () => {
+    await deleteUser(userId);
+  }, [userId]); // recreates only when userId changes
+
+  return <ChildButton onDelete={handleDelete} />;
+}
+```
+
+## useMemo — expensive computations
+
+```typescript
+function FilteredList({ items, filter }: { items: Item[]; filter: string }) {
+  const filtered = useMemo(
+    () => items.filter(item => item.name.includes(filter)),
+    [items, filter]
+  );
+  return <ul>{filtered.map(item => <li key={item.id}>{item.name}</li>)}</ul>;
+}
+```
+
+## useImperativeHandle — expose minimal imperative API
+
+```typescript
+type InputHandle = { focus: () => void; clear: () => void };
+
+function FancyInput({ ref }: { ref?: React.Ref<InputHandle> }) {
+  const innerRef = useRef<HTMLInputElement>(null);
+
+  useImperativeHandle(ref, () => ({
+    focus: () => innerRef.current?.focus(),
+    clear: () => { if (innerRef.current) innerRef.current.value = ''; },
+  }));
+
+  return <input ref={innerRef} />;
+}
+```
+
+## useSyncExternalStore — subscribing to external stores
+
+```typescript
+function useWindowWidth() {
+  return useSyncExternalStore(
+    (callback) => {
+      window.addEventListener('resize', callback);
+      return () => window.removeEventListener('resize', callback);
+    },
+    () => window.innerWidth,
+    () => 1024 // server snapshot
+  );
+}
+```
+
+See also [hooks.md](hooks.md) for additional patterns already in references/.
diff --git a/skills/react-dev/references/react-19-changes.md b/skills/react-dev/references/react-19-changes.md
new file mode 100644
index 0000000..46c6d86
--- /dev/null
+++ b/skills/react-dev/references/react-19-changes.md
@@ -0,0 +1,110 @@
+# React 19 Breaking Changes
+
+React 19 introduces breaking changes that require migration. Load this file when working with React 19 features, `ref` as prop, `useActionState`, or `use()`.
+
+## ref as prop — forwardRef deprecated
+
+```typescript
+// React 19 - ref as regular prop (forwardRef removed)
+type ButtonProps = {
+  ref?: React.Ref<HTMLButtonElement>;
+} & React.ComponentPropsWithoutRef<'button'>;
+
+function Button({ ref, children, ...props }: ButtonProps) {
+  return <button ref={ref} {...props}>{children}</button>;
+}
+```
+
+## useActionState — replaces useFormState
+
+`useFormState` is removed in React 19 and throws at runtime. `useActionState` also returns `isPending` as a third value.
+
+```typescript
+import { useActionState } from 'react';
+
+type FormState = { errors?: string[]; success?: boolean };
+
+function Form() {
+  const [state, formAction, isPending] = useActionState(submitAction, {});
+  return (
+    <form action={formAction}>
+      {isPending && <span>Submitting...</span>}
+      {state.errors?.map(e => <p key={e}>{e}</p>)}
+    </form>
+  );
+}
+```
+
+## use() — unwraps promises and context
+
+Do NOT `await` the promise before passing to `use()` — that defeats streaming.
+
+```typescript
+function UserProfile({ userPromise }: { userPromise: Promise<User> }) {
+  const user = use(userPromise); // Suspends until resolved
+  return <div>{user.name}</div>;
+}
+```
+
+Context via `use()`:
+
+```typescript
+function Component() {
+  const theme = use(ThemeContext); // can be called conditionally
+  return <div className={theme}>...</div>;
+}
+```
+
+## useOptimistic — optimistic UI updates
+
+```typescript
+function TodoList({ todos }: { todos: Todo[] }) {
+  const [optimisticTodos, addOptimistic] = useOptimistic(
+    todos,
+    (state, newTodo: Todo) => [...state, newTodo]
+  );
+
+  async function handleAdd(formData: FormData) {
+    const title = formData.get('title') as string;
+    addOptimistic({ id: 'temp', title, done: false });
+    await addTodo(title); // server action
+  }
+
+  return (
+    <form action={handleAdd}>
+      {optimisticTodos.map(todo => <li key={todo.id}>{todo.title}</li>)}
+    </form>
+  );
+}
+```
+
+## useTransition — non-blocking state updates
+
+```typescript
+function SearchPage() {
+  const [isPending, startTransition] = useTransition();
+  const [query, setQuery] = useState('');
+
+  function handleSearch(value: string) {
+    startTransition(() => setQuery(value));
+  }
+
+  return (
+    <>
+      <input onChange={e => handleSearch(e.target.value)} />
+      {isPending && <Spinner />}
+      <Results query={query} />
+    </>
+  );
+}
+```
+
+## Migration Checklist
+
+- [ ] Replace all `forwardRef` wrappers — pass `ref` as a regular prop
+- [ ] Replace `useFormState` with `useActionState` (update import)
+- [ ] Remove `ReactDOM.render` — use `createRoot`
+- [ ] Replace `defaultProps` on function components with default parameter values
+- [ ] Audit `use(promise)` call sites — ensure promise is NOT awaited before passing
+- [ ] Replace `ReactDOM.hydrate` with `hydrateRoot`
+- [ ] Update `act` import from `react` not `react-dom/test-utils`
diff --git a/skills/react-dev/references/server-components.md b/skills/react-dev/references/server-components.md
new file mode 100644
index 0000000..33ae3fc
--- /dev/null
+++ b/skills/react-dev/references/server-components.md
@@ -0,0 +1,129 @@
+# Server Components
+
+Load this file when working with React Server Components, Server Actions, async data fetching, or the Server/Client boundary.
+
+## Async data fetching in Server Components
+
+Server Components can be `async` — fetch data directly without `useEffect`.
+
+```typescript
+export default async function UserPage({ params }: { params: { id: string } }) {
+  const user = await fetchUser(params.id);
+  return <div>{user.name}</div>;
+}
+```
+
+## Parallel data fetching
+
+Do NOT await sequentially — fetch in parallel with `Promise.all`:
+
+```typescript
+export default async function DashboardPage() {
+  const [user, posts, comments] = await Promise.all([
+    fetchUser('123'),
+    fetchPosts('123'),
+    fetchComments('123'),
+  ]);
+  return <Dashboard user={user} posts={posts} comments={comments} />;
+}
+```
+
+## Server Actions — 'use server' for mutations
+
+```typescript
+'use server';
+
+import { revalidatePath } from 'next/cache';
+
+export async function updateUser(userId: string, formData: FormData) {
+  const name = formData.get('name') as string;
+  await db.user.update({ where: { id: userId }, data: { name } });
+  revalidatePath(`/users/${userId}`);
+}
+```
+
+## Client component consuming a Server Action
+
+```typescript
+'use client';
+
+import { useActionState } from 'react';
+import { updateUser } from '@/actions/user';
+
+type State = { errors?: string[]; success?: boolean };
+
+function UserForm({ userId }: { userId: string }) {
+  const [state, formAction, isPending] = useActionState(
+    (prev: State, formData: FormData) => updateUser(userId, formData),
+    {}
+  );
+  return (
+    <form action={formAction}>
+      <input name="name" />
+      <button type="submit" disabled={isPending}>Save</button>
+      {state.errors?.map(e => <p key={e}>{e}</p>)}
+    </form>
+  );
+}
+```
+
+## use() for promise handoff (streaming)
+
+Pass the promise from Server to Client WITHOUT awaiting — this enables streaming via Suspense.
+
+```typescript
+// Server Component — do NOT await
+async function Page() {
+  const userPromise = fetchUser('123'); // no await
+  return (
+    <Suspense fallback={<Skeleton />}>
+      <UserProfile userPromise={userPromise} />
+    </Suspense>
+  );
+}
+
+// Client Component — unwrap with use()
+'use client';
+function UserProfile({ userPromise }: { userPromise: Promise<User> }) {
+  const user = use(userPromise); // suspends until resolved
+  return <div>{user.name}</div>;
+}
+```
+
+## Streaming with error boundaries
+
+```typescript
+export default function Page() {
+  return (
+    <ErrorBoundary fallback={<ErrorView />}>
+      <Suspense fallback={<Skeleton />}>
+        <SlowComponent />
+      </Suspense>
+    </ErrorBoundary>
+  );
+}
+```
+
+## Server/Client boundary rules
+
+- A file with both `'use server'` and `'use client'` at module scope is invalid — bundler treats the entire file as one boundary.
+- Server-only imports (`db`, `fs`, secret env vars) must never appear in Client Components.
+- Only serializable data crosses the boundary: strings, numbers, plain objects, arrays, Dates. No functions, no class instances, no Symbols.
+- To pass a server function to a client, use `'use server'` inline or import from a `*.server.ts` module.
+
+## Patterns to avoid
+
+```typescript
+// BAD — awaiting before use() defeats streaming
+async function Page() {
+  const user = await fetchUser('123'); // blocks entire page
+  return <UserProfile user={user} />;
+}
+
+// BAD — server-only import in client file
+'use client';
+import { db } from '@/lib/db'; // leaks db credentials to client bundle
+
+// BAD — passing closure across boundary
+<ClientComponent onAction={() => db.insert()} />
+```
diff --git a/skills/react-useeffect/SKILL.md b/skills/react-useeffect/SKILL.md
index d7c6ffb..ebb4af8 100644
--- a/skills/react-useeffect/SKILL.md
+++ b/skills/react-useeffect/SKILL.md
@@ -1,53 +1,90 @@
 ---
 name: react-useeffect
-description: React useEffect best practices from official docs. Use when writing/reviewing useEffect, useState for derived values, data fetching, or state synchronization. Teaches when NOT to use Effect and better alternatives.
+description: Expert guidance on React useEffect — when to use it, when NOT to use it, and what to use instead. Triggers on: useEffect review, derived state, data fetching, race conditions, stale closure, cleanup, re-render loops, Strict Mode double-mount, synchronizing with external systems.
 ---
 
-# You Might Not Need an Effect
+## Mindset
 
-Effects are an **escape hatch** from React. They let you synchronize with external systems. If there is no external system involved, you shouldn't need an Effect.
+React's render cycle IS the reactive system. useEffect is an **opt-out**, not a feature. Every Effect you write is a gap where React can't optimize, lint, or trace for you.
 
-## Quick Reference
+**Heuristic 1 — The External System Test**: If you can describe the thing being synchronized without naming a non-React API (DOM, WebSocket, `localStorage`, third-party lib, `setInterval`), you don't have an external system. You have a data transformation problem. Solve it in render, not in an Effect.
 
-| Situation | DON'T | DO |
-|-----------|-------|-----|
-| Derived state from props/state | `useState` + `useEffect` | Calculate during render |
-| Expensive calculations | `useEffect` to cache | `useMemo` |
-| Reset state on prop change | `useEffect` with `setState` | `key` prop |
-| User event responses | `useEffect` watching state | Event handler directly |
-| Notify parent of changes | `useEffect` calling `onChange` | Call in event handler |
-| Fetch data | `useEffect` without cleanup | `useEffect` with cleanup OR framework |
+**Heuristic 2 — The Timing Tells You**: If you care that code ran *because a user clicked*, use an event handler. If you care that code ran *because a component is visible on screen*, use an Effect. These are semantically different — Effects fire after paint, event handlers fire synchronously in the event loop.
 
-## When You DO Need Effects
+**Heuristic 3 — The Double-Run Smell Test**: If running your Effect twice (React 18 Strict Mode) causes a visible bug, you have missing cleanup — not a Strict Mode problem. Fix the cleanup.
 
-- Synchronizing with **external systems** (non-React widgets, browser APIs)
-- **Subscriptions** to external stores (use `useSyncExternalStore` when possible)
-- **Analytics/logging** that runs because component displayed
-- **Data fetching** with proper cleanup (or use framework's built-in mechanism)
+**Heuristic 4 — Object/function dependency trap**: If your dependency array contains an object or function created inline in render, your Effect runs every render even with a "stable" dep array. React compares by reference. Extract the value out, memoize the function, or restructure so a primitive is the dependency.
 
-## When You DON'T Need Effects
+**Heuristic 5 — useEffect for data fetching is a last resort**: An Effect-based fetch has no loading deduplication, no cache, no server rendering, and introduces race conditions. Reach for React Query / SWR / server components before writing `useEffect(() => { fetch(...) })`.
 
-1. **Transforming data for rendering** - Calculate at top level, re-runs automatically
-2. **Handling user events** - Use event handlers, you know exactly what happened
-3. **Deriving state** - Just compute it: `const fullName = firstName + ' ' + lastName`
-4. **Chaining state updates** - Calculate all next state in the event handler
+---
+
+## Navigation
+
+**Use this skill when**:
+- Reviewing or writing useEffect code
+- A component has bugs like: stale values, double API calls, state update loops, notification fires on page load
+- Deriving state from props/state with useState + useEffect
+- Implementing subscriptions, timers, or DOM listeners
+- Race conditions in async Effects
 
-## Decision Tree
+**Do NOT use this skill when**:
+- The question is purely about state management (Redux, Zustand, Context) without an Effect
+- The question is about React Server Components (they don't support Effects)
+- The question is about React Query / SWR internals (those libraries handle Effect lifecycle internally)
 
+**Quick routing decision**:
 ```
-Need to respond to something?
-├── User interaction (click, submit, drag)?
-│   └── Use EVENT HANDLER
-├── Component appeared on screen?
-│   └── Use EFFECT (external sync, analytics)
-├── Props/state changed and need derived value?
-│   └── CALCULATE DURING RENDER
-│       └── Expensive? Use useMemo
-└── Need to reset state when prop changes?
-    └── Use KEY PROP on component
+Does the code touch anything outside React's component tree?
+  NO  → You don't need useEffect. Use render-time derivation, event handlers, or key prop.
+  YES → Which external system?
+         DOM / browser API     → useEffect with cleanup
+         External store/event  → useSyncExternalStore (preferred) or useEffect
+         Network fetch         → Framework/library preferred; useEffect with ignore flag if needed
+         One-time app init     → Module-level guard (not Effect)
 ```
 
-## Detailed Guidance
+---
+
+## Philosophy
+
+useEffect exists to bridge React's declarative model and the imperative world outside it. The goal is to make that bridge as narrow as possible — every line inside a useEffect is code React cannot reason about. Shrink the bridge; expand render.
+
+---
+
+## NEVER
+
+- **NEVER store derived values in state and sync them with useEffect** — because this creates a guaranteed stale render: the component renders once with the old derived value, then the Effect fires, setState triggers a second render. Use render-time computation or useMemo. The stale render is invisible in small apps but causes flickering and logic bugs in complex ones.
+
+- **NEVER fire Effects in response to user events by watching state they set** — because Effects don't know *why* state changed. An Effect watching `isInCart` will fire on page load (when isInCart is already true from saved state), on prop change, and on any render that causes remounting — not just the user's click. The event handler knows exactly why and when.
+
+- **NEVER chain Effects** (Effect A sets state → Effect B watches that state → Effect B sets more state) — because this creates N re-renders for N-effect chains, makes the data flow impossible to follow in React DevTools, and breaks features like time-travel debugging and state replay. Collapse the chain into a single event handler that computes the final state.
+
+- **NEVER use useEffect to initialize one-time app-level logic** — because React 18 Strict Mode intentionally mounts twice in development, and your auth token check / localStorage init will run twice. Use a module-level `let didInit = false` guard or run the code at module scope with `if (typeof window !== 'undefined')`. The Effect lifecycle is component lifecycle, not application lifecycle.
+
+- **NEVER include object/array/function literals in the dependency array without stabilizing them first** — because `{}` !== `{}` in JavaScript. An inline object in the dep array makes the Effect re-run every render, silently, with no lint warning. Wrap objects in useMemo, functions in useCallback, or restructure so only primitives are dependencies.
+
+- **NEVER omit the AbortController or `ignore` flag pattern in async Effects** — because without it, a fast-typing user will see results from a slower earlier request overwrite results from a faster later one (classic race condition). The `ignore` flag is the minimum fix; AbortController also cancels the in-flight request.
+
+- **NEVER call `setState` on unmounted components without cleanup** — because React 18 silenced the warning but the underlying logic error remains: the Effect has a reference to a component that's gone. The cleanup return function is mandatory for subscriptions, timers, and async operations.
+
+---
+
+## When Things Go Wrong
+
+| Symptom | Likely Cause | Recovery |
+|---------|-------------|----------|
+| Component renders 2-3x on every update | Derived state in useState + useEffect sync | Replace useState+useEffect pair with render-time const or useMemo |
+| API called twice on mount in dev | React 18 Strict Mode double-mount (by design) | Add cleanup: `return () => { ignore = true; }` — if it breaks with cleanup, the Effect has a bug |
+| Notification / toast shows on page load unexpectedly | Effect watching state that's truthy on init | Move side-effect into the event handler that mutates the state |
+| Effect re-runs every render despite "stable" deps | Object, array, or function in dep array created inline | Extract to useMemo/useCallback or decompose to primitive deps |
+| Fetch shows stale results when input changes fast | Missing ignore flag / AbortController | Add `let ignore = false; return () => { ignore = true; }` inside Effect |
+| Effect runs, but cleanup runs immediately after | StrictMode remount OR dependency object identity changes | Verify cleanup is correct (StrictMode) or stabilize deps (identity issue) |
+| `setState` loop — component renders infinitely | Effect sets state that's in its own dep array | Remove the circular dep or restructure: derive instead of sync |
+
+---
+
+## Reference Files — Load on Demand
 
-- [Anti-Patterns](./anti-patterns.md) - Common mistakes with fixes
-- [Better Alternatives](./alternatives.md) - useMemo, key prop, lifting state, useSyncExternalStore
+- **[anti-patterns.md](./references/anti-patterns.md)** — 9 coded examples with bad/good/why. Load when reviewing specific useEffect code or explaining a concrete mistake.
+- **[alternatives.md](./references/alternatives.md)** — 8 replacement patterns (useMemo, key prop, useSyncExternalStore, custom fetch hook, etc.). Load when the answer is "don't use Effect — use X instead."
diff --git a/skills/react-useeffect/alternatives.md b/skills/react-useeffect/references/alternatives.md
similarity index 100%
rename from skills/react-useeffect/alternatives.md
rename to skills/react-useeffect/references/alternatives.md
diff --git a/skills/react-useeffect/anti-patterns.md b/skills/react-useeffect/references/anti-patterns.md
similarity index 100%
rename from skills/react-useeffect/anti-patterns.md
rename to skills/react-useeffect/references/anti-patterns.md
diff --git a/skills/reducing-entropy/SKILL.md b/skills/reducing-entropy/SKILL.md
index 9d521f3..383bfbe 100644
--- a/skills/reducing-entropy/SKILL.md
+++ b/skills/reducing-entropy/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: reducing-entropy
-description: Manual-only skill for minimizing total codebase size. Only activate when explicitly requested by user. Measures success by final code amount, not effort. Bias toward deletion.
+description: Manual-only skill for minimizing total codebase size. Activate when user says "reduce entropy", "simplify this", "make this smaller", "delete what we don't need", or asks to minimize code. Measures success by final line count, not effort. Bias toward deletion over abstraction.
 ---
 
 # Reducing Entropy
@@ -9,23 +9,98 @@ More code begets more code. Entropy accumulates. This skill biases toward the sm
 
 **Core question:** "What does the codebase look like *after*?"
 
-## Before You Begin
+## Mindset
 
-**Load at least one mindset from `references/`**
+Expert heuristics that separate practitioners from beginners:
 
-1. List the files in the reference directory
-2. Read frontmatter descriptions to pick which applies
-3. Load at least one
-4. State which you loaded and its core principle
+1. **Count lines, not concepts.** The unit of entropy is lines of code. "Better architecture" with more lines is worse. A 50-line function that replaces 200 lines of abstraction is a win, even if it feels inelegant.
 
-**Do not proceed until you've done this.**
+2. **Deletion is a feature.** Every line deleted is a line that can never have a bug, never need documentation, never block a future refactor. The best code review comment is "can we delete this instead?"
+
+3. **Abstractions have a break-even point.** An abstraction only pays off if it eliminates more code than it adds. A 30-line base class shared by two 10-line subclasses is net-negative — you wrote 30 lines to avoid writing 20. Count before abstracting.
+
+4. **"No churn" is a false virtue.** Resistance to change preserves entropy. The question is never "how disruptive is this?" — it's "does the end state have less code?"
+
+5. **Measure the delta, not the task.** Track lines-before vs lines-after for every change. If after > before, the change increased entropy regardless of other virtues claimed for it.
+
+## Navigation
+
+### When to apply this skill
+
+- User explicitly asks to reduce entropy, simplify, minimize, or delete code
+- Codebase has grown organically and accumulated abstractions with unclear value
+- A feature is being removed or replaced and cleanup is in scope
+- Refactor task where smaller result is the stated goal
+
+### When NOT to apply this skill
+
+- Codebase is already minimal for what it does (stop here — don't invent deletions)
+- Working inside a framework with strong conventions (don't fight Rails, Django, etc.)
+- Regulatory/compliance requirements mandate certain structures
+- The user wants a refactor for readability, not size (different goal — use a different lens)
+- Active incident or hot path — this is a calm-waters skill
+
+### Decision Tree: Which Pattern to Apply
+
+```
+Is there dead code (unused functions, unreachable branches)?
+  YES → Delete it first. No abstraction needed.
+  NO  ↓
+
+Are there 3+ similar code blocks doing the same thing?
+  YES → Consolidate into one function. Does the function add more lines than it saves?
+          YES → Leave the duplication. (Rule of Three, not Rule of Two)
+          NO  → Consolidate.
+  NO  ↓
+
+Are there wrapper classes/types that add no behavior?
+  YES → Replace with generic data structures (map, list, tuple).
+        Load references/data-over-abstractions.md
+  NO  ↓
+
+Are there multiple abstraction layers where one would do?
+  YES → Flatten. Load references/design-is-taking-apart.md
+  NO  ↓
+
+Is the interface complex because the implementation is complex?
+  YES → Acceptable — implementation complexity is hidden. Interface simplicity wins.
+        Load references/worse-is-better.md
+  NO  ↓
+
+Are you adding something "for later flexibility"?
+  YES → Is it a PAGNI (timestamps, API versioning, audit logs)?
+          YES → Add it. Load references/expensive-to-add-later.md
+          NO  → YAGNI. Delete or don't add.
+  NO  ↓
+
+Is "simple" confused with "familiar" here?
+  YES → Load references/simplicity-vs-easy.md — the familiar choice may be the complex one.
+```
+
+### Load Reference Mindsets
+
+Before proceeding on a non-trivial reduction task, load at least one mindset from `references/`:
+
+| File | Use when |
+|------|----------|
+| `data-over-abstractions.md` | Classes/types wrapping data that could be plain maps |
+| `design-is-taking-apart.md` | God objects, mixed concerns, coupling to untangle |
+| `expensive-to-add-later.md` | Tempted to skip something that will be painful to retrofit |
+| `simplicity-vs-easy.md` | Team gravitates to familiar pattern over simpler one |
+| `worse-is-better.md` | Completeness is blocking shipping; perfectionism over pragmatism |
+
+**Load the file, state its core principle, then proceed.**
+
+## Philosophy
+
+Every codebase trends toward entropy. Reducing it is not cleanup — it is the primary act of software design. The discipline is asymmetric: adding code is easy and feels productive; deleting code is uncomfortable and requires conviction. This skill supplies the conviction.
 
 ## The Goal
 
-The goal is **less total code in the final codebase** - not less code to write right now.
+The goal is **less total code in the final codebase** — not less code to write right now.
 
-- Writing 50 lines that delete 200 lines = net win
-- Keeping 14 functions to avoid writing 2 = net loss
+- Writing 50 lines that delete 200 lines = net win (+150 lines reduced)
+- Keeping 14 functions to avoid writing 2 = net loss (12 functions of unnecessary overhead)
 - "No churn" is not a goal. Less code is the goal.
 
 **Measure the end state, not the effort.**
@@ -34,7 +109,7 @@ The goal is **less total code in the final codebase** - not less code to write r
 
 ### 1. What's the smallest codebase that solves this?
 
-Not "what's the smallest change" - what's the smallest *result*.
+Not "what's the smallest change" — what's the smallest *result*.
 
 - Could this be 2 functions instead of 14?
 - Could this be 0 functions (delete the feature)?
@@ -56,25 +131,31 @@ Every change is an opportunity to delete. Ask:
 - What was only needed because of what we're replacing?
 - What's the maximum we could remove?
 
-## Red Flags
+## NEVER
 
-- **"Keep what exists"** - Status quo bias. The question is total code, not churn.
-- **"This adds flexibility"** - Flexibility for what? YAGNI.
-- **"Better separation of concerns"** - More files/functions = more code. Separation isn't free.
-- **"Type safety"** - Worth how many lines? Sometimes runtime checks in less code wins.
-- **"Easier to understand"** - 14 things are not easier than 2 things.
+- **NEVER add an abstraction that doesn't reduce total lines.** An abstraction with negative ROI (adds more than it removes) is pure entropy. The reason this is non-obvious: abstractions feel productive while you're writing them. Count afterward.
 
-## When This Doesn't Apply
+- **NEVER treat "better organized" as equivalent to "less entropy."** More files, more layers, more interfaces — even well-organized ones — all increase entropy. Organization is not simplicity; it's rearrangement. Unless the reorganization also deletes code, it doesn't count.
 
-- The codebase is already minimal for what it does
-- You're in a framework with strong conventions (don't fight it)
-- Regulatory/compliance requirements mandate certain structures
+- **NEVER abstract from two examples.** Two similar things may diverge. Abstract at three (Rule of Three). Premature abstraction locks in a contract before you understand the full shape of the problem, requiring more code to handle the exceptions later.
+
+- **NEVER preserve code "just in case."** Deleted code is not lost — it's in git. "We might need it later" is a category error: you're paying a permanent maintenance tax against a speculative future benefit. If you need it later, resurrect it.
+
+- **NEVER count effort as a win.** "We wrote only 10 lines to fix this" is irrelevant if the result has 50 more lines than before. The metric is the delta in total codebase size, not the size of the diff you submitted.
+
+- **NEVER mistake interface complexity for implementation complexity.** A complex implementation behind a simple interface is acceptable (worse-is-better). A simple implementation behind a complex interface is never acceptable — you've exported the complexity to every caller.
 
-## Reference Mindsets
+- **NEVER apply this skill to a codebase that is already at its minimum.** Forced deletions manufacture debt. If nothing obviously can be removed, stop. The skill is finding real entropy, not inventing entropy to remove.
 
-See `references/` for philosophical grounding.
+## When Things Go Wrong
 
-To add new mindsets, see `adding-reference-mindsets.md`.
+| Symptom | Diagnosis | Fix |
+|---------|-----------|-----|
+| Deleted code that turned out to be needed | Insufficient dependency mapping before deletion | Use `git log -S <term>` and grep callers before deleting; check test coverage |
+| Abstraction added more lines than it saved | Counted the happy path only; missed edge-case handling in generic code | Revert. Inline the specialization. Abstract only when net-negative is confirmed. |
+| Team pushes back on "unnecessary churn" | Status quo bias framing deletion as risk | Reframe: "we are paying X lines/month maintenance tax for Y lines of rarely-used code" |
+| Refactor broke the interface | Reduction goal conflated with internal restructuring | Keep external interfaces stable; reduce internally first, then negotiate interface changes separately |
+| Tests now cover less after reduction | Tests were testing implementation, not behavior | Write behavior tests first; then reduce implementation behind them |
 
 ---
 
diff --git a/skills/reducing-entropy/adding-reference-mindsets.md b/skills/reducing-entropy/adding-reference-mindsets.md
deleted file mode 100644
index c58d573..0000000
--- a/skills/reducing-entropy/adding-reference-mindsets.md
+++ /dev/null
@@ -1,96 +0,0 @@
-# Adding Reference Mindsets
-
-Reference mindsets are philosophical foundations for simplicity. They explain *why* less is more, giving agents deeper calibration beyond the mechanical checkpoints.
-
-## Where They Live
-
-Mindsets live in @reference/. Each is a standalone file named by concept.
-
-## File Structure
-
-```yaml
----
-description: One-sentence summary of the core insight and why it matters.
----
-
-# Concept Name
-
-## The Core Insight
-
-The central idea in 1-2 sentences. Quotable. Memorable.
-
-## Why This Matters
-
-How this connects to avoiding complexity. Why an LLM should care.
-What goes wrong when you ignore this principle.
-
-## Practical Application
-
-Concrete questions to ask or checks to apply.
-How to use this mindset when evaluating design options.
-
-## External References
-
-Links to primary sources - talks, papers, books that originated or best explain this concept.
-```
-
-## Quality Checklist
-
-Before adding a mindset:
-
-- [ ] **Counters over-engineering?** Does it help resist the urge to add?
-- [ ] **Distinct from existing?** Not redundant with current mindsets
-- [ ] **Concise?** Can be explained in under 50 lines
-- [ ] **Memorable core insight?** Has a quotable central principle
-- [ ] **Named by concept?** Not by person or source
-
-## Good Candidates
-
-Ideas that would make strong mindsets:
-
-| Concept | Core Insight |
-|---------|--------------|
-| `worse-is-better` | Shipping a simple thing beats perfecting a complex one |
-| `essential-vs-accidental` | Most complexity is accidental and can be eliminated |
-| `locality-of-behavior` | Code should be understandable without jumping around |
-| `boring-technology` | Innovation tokens are limited; use boring tech by default |
-| `separation-of-concerns` | Each piece should have one reason to change |
-| `rule-of-three` | Don't abstract until you've seen the pattern three times |
-
-## What NOT to Add
-
-**Technology-specific advice** → Belongs in project docs or tech-specific skills
-- "React components should..." 
-- "In Rust, prefer..."
-
-**Process/workflow rules** → Belongs in skills, not mindsets
-- "Always run tests before..."
-- "Use TDD when..."
-
-**Vague platitudes** → If there's no actionable insight, skip it
-- "Write clean code"
-- "Think before you code"
-
-**Anything requiring context** → Mindsets should be universal
-- "In microservices architectures..."
-- "When working with legacy code..."
-
-## The Test
-
-A good mindset should help an agent answer: "Should I add this abstraction?"
-
-If the mindset doesn't directly inform that question, it probably belongs somewhere else.
-
-## Source Material
-
-Primary sources for simplicity thinking. Draw from these when creating new mindsets:
-
-- [Simple Made Easy](https://www.infoq.com/presentations/Simple-Made-Easy/) - Rich Hickey
-- [Hammock Driven Development](https://www.youtube.com/watch?v=f84n5oFoZBc) - Rich Hickey
-- [The Value of Values](https://www.infoq.com/presentations/Value-Values/) - Rich Hickey
-- [Out of the Tar Pit](https://curtclifton.net/papers/MosesleyMarks06a.pdf) - Moseley & Marks
-- [No Silver Bullet](https://www.cs.unc.edu/techreports/86-020.pdf) - Fred Brooks
-- [The Grug Brained Developer](https://grugbrain.dev/) - grugbrain.dev
-- [Worse Is Better](https://www.dreamsongs.com/WorseIsBetter.html) - Richard Gabriel
-- [A Philosophy of Software Design](https://www.amazon.com/dp/173210221X) - John Ousterhout
-- [The Zen of Python](https://peps.python.org/pep-0020/) - Tim Peters
diff --git a/skills/reducing-entropy/references/worse-is-better.md b/skills/reducing-entropy/references/worse-is-better.md
new file mode 100644
index 0000000..ea8766b
--- /dev/null
+++ b/skills/reducing-entropy/references/worse-is-better.md
@@ -0,0 +1,49 @@
+---
+description: A simpler, less-correct design that ships beats a complete, elegant design that doesn't. Interface simplicity over implementation simplicity — systems that do slightly less but are easier to build will out-survive those that try to do everything right.
+---
+
+# Worse Is Better
+
+## The Core Insight
+
+> "The right thing is rarely the right thing to build."
+
+Richard Gabriel's 1991 essay describes two design philosophies: the MIT/Stanford "Right Thing" approach (correctness and completeness above all) vs the New Jersey "Worse is Better" approach (simplicity first, even at cost of correctness). Worse-is-Better systems win in practice.
+
+## Why This Matters
+
+A system with 50% of the right features but half the code:
+- Ships sooner
+- Has fewer bugs
+- Is easier to port and maintain
+- Accumulates improvements over time
+
+The "Right Thing" system is often never finished. Even if finished, it's too complex to maintain. Perfection is the enemy of shipped.
+
+## Interface vs Implementation Simplicity
+
+The key inversion: **interface simplicity takes priority over implementation simplicity**.
+
+- It's acceptable to have a complicated implementation to keep the interface simple
+- It is NOT acceptable to complicate the interface to simplify the implementation
+- Users interact with interfaces; developers maintain implementations
+
+## Practical Application
+
+When tempted to add completeness:
+- Would removing this feature make the interface simpler?
+- Is this feature needed for first ship, or is it "right thing" thinking?
+- Could we solve this with a simpler, slightly wrong approach that we can fix later?
+
+When evaluating design options:
+- Which option produces a simpler interface, even if the implementation is messier?
+- Which option can we ship and iterate on?
+- Which option survives contact with reality?
+
+**Bias toward the thing that works now. Completeness is a direction, not a gate.**
+
+## External References
+
+- [Worse Is Better](https://www.dreamsongs.com/WorseIsBetter.html) - Richard Gabriel's original essay
+- [Is Worse Really Better?](https://www.dreamsongs.com/Files/IsWorseReallyBetter.pdf) - Gabriel's follow-up
+- [Simple Made Easy](https://www.infoq.com/presentations/Simple-Made-Easy/) - Rich Hickey on related themes
diff --git a/skills/requirements-clarity/README.md b/skills/requirements-clarity/README.md
deleted file mode 100644
index 20b547c..0000000
--- a/skills/requirements-clarity/README.md
+++ /dev/null
@@ -1,260 +0,0 @@
-# Requirements Clarity
-
-A systematic skill for transforming vague requirements into actionable Product Requirements Documents (PRDs) through focused dialogue and iterative clarification.
-
-## Purpose
-
-This skill helps you avoid costly misunderstandings and rework by ensuring requirements are crystal clear before implementation begins. It uses a 100-point scoring system to systematically identify gaps and guide you through targeted questions until your requirements are development-ready.
-
-## When to Use This Skill
-
-### Use When
-- Requirements are vague or ambiguous (e.g., "add login feature", "implement payment")
-- Features are complex and estimated to take more than 2 days
-- Cross-team coordination is required
-- Missing technical context (no tech stack, integration points, or constraints mentioned)
-- Incomplete specifications (no acceptance criteria, success metrics, edge cases)
-- Unclear scope boundaries ("what exactly does 'user management' include?")
-
-### Don't Use When
-- Specific file paths are mentioned (e.g., "fix auth.go:45")
-- Code snippets are already included in the request
-- Working with existing functions/classes (use code review instead)
-- Bug fixes with clear reproduction steps
-
-## How It Works
-
-### The Clarification Process
-
-1. **Initial Analysis** (Step 1)
-   - Parses your requirement description
-   - Generates a clarity score (0-100) using a detailed rubric
-   - Identifies what's clear and what needs clarification
-   - Creates a feature name and prepares for PRD generation
-
-2. **Gap Analysis** (Step 2)
-   - Systematically identifies missing information across four dimensions:
-     - **Functional Scope**: Core functionality, boundaries, edge cases
-     - **User Interaction**: Inputs, outputs, success/failure scenarios
-     - **Technical Constraints**: Performance, compatibility, security, scalability
-     - **Business Value**: Problem statement, target users, success metrics
-
-3. **Interactive Clarification** (Step 3)
-   - Asks 2-3 focused questions per round (avoids overwhelming you)
-   - Builds context progressively
-   - Updates clarity score after each response
-   - Continues until score reaches ≥ 90/100
-
-4. **PRD Generation** (Step 4)
-   - Once clarity score ≥ 90, generates comprehensive PRD
-   - Saves to `./docs/prds/{feature-name}-v{version}-prd.md`
-   - Includes all clarified information in structured format
-
-### Clarity Scoring Rubric
-
-The 100-point scoring system evaluates:
-
-- **Functional Clarity** (30 points): Clear inputs/outputs, user interaction, success criteria
-- **Technical Specificity** (25 points): Technology stack, integration points, constraints
-- **Implementation Completeness** (25 points): Edge cases, error handling, data validation
-- **Business Context** (20 points): Problem statement, target users, success metrics
-
-## Key Features
-
-### Systematic Questioning
-- Focused, specific questions (not overwhelming)
-- One category at a time
-- Builds on previous answers
-- Uses your language and provides examples
-
-### Quality-Driven Iteration
-- Continuous clarity assessment
-- Transparent score updates
-- Iterates until ≥ 90 points achieved
-- Documents all clarification rounds
-
-### Actionable Output
-- Concrete specifications ready for development
-- Measurable acceptance criteria (checklistable format)
-- Executable implementation phases with specific tasks
-- Complete PRD with all necessary sections
-
-### Core Principles
-- **Why?** (YAGNI check) - Ensures features are truly needed
-- **Simpler?** (KISS check) - Explores simpler alternatives before complex solutions
-
-## Usage Examples
-
-### Example 1: Vague Feature Request
-
-**User Input:**
-```
-I need a login feature for my app.
-```
-
-**Skill Response:**
-```
-I understand your requirement. Let me help you refine this specification.
-
-Current Clarity Score: 25/100
-
-Clear Aspects:
-- Need authentication functionality
-
-Needs Clarification:
-- Authentication method (email/password, OAuth, SSO?)
-- User types and permissions
-- Security requirements
-- Integration with existing systems
-- Success metrics
-
-Let me systematically clarify these points...
-
-1. Authentication Method: What authentication method do you prefer?
-   - For example: Email/password, social login (Google/GitHub),
-     magic links, or SSO?
-
-2. User Management: Do you need different user roles or permissions?
-   - For example: Admin vs. regular users, or all users have same access?
-
-3. Technology Stack: What's your current tech stack?
-   - For example: React frontend, Node.js backend, PostgreSQL database?
-```
-
-### Example 2: After Clarification Round
-
-**After User Response:**
-```
-Thank you for the additional information!
-
-Clarity Score Update: 25/100 → 65/100
-
-New Clarified Content:
-- Authentication: Email/password with optional Google OAuth
-- User roles: Admin and regular users with different permissions
-- Tech stack: Next.js 14, PostgreSQL, using NextAuth.js
-
-Remaining Points to Clarify:
-- Password requirements and security constraints
-- Session management (duration, refresh tokens?)
-- Edge cases (account lockout, password reset)
-- Success metrics (MAU, conversion rate?)
-
-Let me continue with the next round of questions...
-
-1. Security Requirements: What are your password requirements?
-   - For example: Minimum length, special characters, 2FA?
-
-2. Session Management: How long should user sessions last?
-   - For example: 24 hours, 7 days, remember me option?
-
-3. Edge Cases: How should we handle failed login attempts?
-   - For example: Rate limiting, account lockout after X attempts?
-```
-
-### Example 3: Final PRD Generation
-
-**When Clarity Score ≥ 90:**
-```
-Perfect! I will now generate the complete PRD document...
-
-[Generates PRD at ./docs/prds/user-authentication-v1.0-prd.md]
-
-Your Product Requirements Document is ready!
-
-File: ./docs/prds/user-authentication-v1.0-prd.md
-Quality Score: 92/100
-Clarification Rounds: 3
-
-The PRD includes:
-✓ Complete requirements description with background and value proposition
-✓ Technical approach with architecture decisions
-✓ Measurable acceptance criteria (functional, quality, user acceptance)
-✓ 4 execution phases with specific tasks and deliverables
-✓ Risk assessment and mitigation strategies
-
-You can now hand this off to your development team for implementation.
-```
-
-## Generated PRD Structure
-
-The final PRD includes:
-
-1. **Requirements Description**
-   - Background (business problem, target users, value proposition)
-   - Feature overview (core features, boundaries, user scenarios)
-   - Detailed requirements (I/O, user interaction, data, edge cases)
-
-2. **Design Decisions**
-   - Technical approach (architecture, components, data storage, interfaces)
-   - Constraints (performance, compatibility, security, scalability)
-   - Risk assessment (technical, dependency, schedule risks)
-
-3. **Acceptance Criteria**
-   - Functional acceptance (feature-specific conditions)
-   - Quality standards (code quality, test coverage, performance, security)
-   - User acceptance (UX, documentation, training)
-
-4. **Execution Phases**
-   - Phase 1: Preparation (environment setup, technical validation)
-   - Phase 2: Core Development (implement core functionality)
-   - Phase 3: Integration & Testing (quality assurance)
-   - Phase 4: Deployment (release and monitoring)
-
-Each phase includes specific tasks, deliverables, and time estimates.
-
-## Output Location
-
-PRDs are saved to:
-```
-./docs/prds/{feature-name}-v{version}-prd.md
-```
-
-Where:
-- `{feature-name}`: Auto-generated in kebab-case format
-- `{version}`: Document version (default 1.0, or user-specified)
-
-## Best Practices
-
-### DO
-- Answer questions thoroughly but concisely
-- Provide examples when you have specific preferences
-- Ask for clarification if questions aren't clear
-- Review the final PRD and provide feedback
-- Share constraints and non-negotiables upfront
-
-### DON'T
-- Rush through clarification rounds
-- Assume the skill knows your technical context
-- Skip questions that seem "obvious"
-- Approve PRD before reviewing thoroughly
-- Provide contradictory information across rounds
-
-## Success Criteria
-
-A successful requirements clarification session results in:
-- Clarity score ≥ 90/100
-- All PRD sections complete with substantial content
-- Acceptance criteria in checklistable format
-- Execution phases with actionable, concrete tasks
-- Development team can start implementation immediately
-- No major questions or ambiguities remain
-
-## Tips for Better Results
-
-1. **Be Specific**: Instead of "fast", say "< 200ms response time"
-2. **Think End-to-End**: Consider the full user journey
-3. **Share Constraints Early**: Technical limitations, budget, timeline
-4. **Provide Context**: Explain the "why" behind your requirement
-5. **Reference Examples**: "Like Stripe's payment flow" is clearer than abstract descriptions
-
-## Related Skills
-
-- **feature-planning**: For breaking down features after PRD is complete
-- **implementation-blueprint**: For detailed implementation planning
-- **dev-spec**: For complete design and planning sessions
-- **critical-brainstorm**: For exploring and stress-testing ideas before clarification
-
----
-
-**Need help clarifying requirements?** Invoke this skill and provide your initial requirement description. The skill will guide you through systematic clarification until you have a development-ready PRD.
diff --git a/skills/requirements-clarity/SKILL.md b/skills/requirements-clarity/SKILL.md
index 87e5955..3989641 100644
--- a/skills/requirements-clarity/SKILL.md
+++ b/skills/requirements-clarity/SKILL.md
@@ -1,324 +1,107 @@
 ---
 name: requirements-clarity
-description: Clarify ambiguous requirements through focused dialogue before implementation. Use when requirements are unclear, features are complex (>2 days), or involve cross-team coordination. Ask two core questions - Why? (YAGNI check) and Simpler? (KISS check) - to ensure clarity before coding.
+description: Transform vague requirements into development-ready PRDs through focused dialogue. Trigger phrases: "clarify requirements", "need a PRD", "spec out this feature", "what do I need before coding", "requirement is unclear". Use when features are ambiguous, complex (>2 days), or cross-team. Do NOT use for bug fixes with clear repro steps.
 ---
 
-# Requirements Clarity Skill
+## Mindset
 
-## Description
+- **Ambiguity is expensive late, cheap early.** A 20-minute clarification session prevents a 3-day rewrite. The user asking for "a login feature" has a mental model — your job is to surface it, not fill gaps with assumptions.
+- **YAGNI before elaborating.** Before exploring implementation depth, ask "Is this confirmed needed now, or anticipated future need?" Features built for imagined futures become maintenance debt.
+- **KISS before complexity.** After the user describes what they want, ask "Is there a simpler version that gets 80% of the value?" Complex solutions chosen by default — not by necessity — are a practitioner anti-pattern.
+- **Questions reveal assumptions, not ignorance.** Each question you ask exposes a hidden assumption the user was carrying. Surface it, confirm it, and document it — that's the whole job.
+- **The PRD is a contract, not a template.** A PRD filled with placeholders is worse than no PRD — it creates false confidence. Every section must contain substance before you write it.
 
-Automatically transforms vague requirements into actionable PRDs through systematic clarification with a 100-point scoring system.
+## Navigation
 
+**Use this skill when**:
+- Requirements are vague ("add authentication", "build a dashboard", "implement payments")
+- No acceptance criteria, success metrics, or edge cases mentioned
+- Feature is estimated > 2 days or touches multiple teams
+- User explicitly asks to "clarify", "spec out", or "write a PRD"
 
-## Instructions
+**Do NOT use this skill when**:
+- Specific file paths, line numbers, or code snippets are provided
+- Bug fix with clear reproduction steps
+- Refactoring within existing clearly-scoped function/class
+- User asks for implementation, not specification
 
-When invoked, detect vague requirements:
-
-1. **Vague Feature Requests**
-   - User says: "add login feature", "implement payment", "create dashboard"
-   - Missing: How, with what technology, what constraints?
-
-2. **Missing Technical Context**
-   - No technology stack mentioned
-   - No integration points identified
-   - No performance/security constraints
+**Decision tree for ambiguous invocations**:
+```
+Has the user provided concrete acceptance criteria?
+  YES → Skip to gap analysis; you may reach 90+ in one pass
+  NO  → Full clarification flow (Steps 1-4)
 
-3. **Incomplete Specifications**
-   - No acceptance criteria
-   - No success metrics
-   - No edge cases considered
-   - No error handling mentioned
+Is the request a bug report?
+  YES → Exit this skill; suggest code-review or direct debugging
+  NO  → Continue
+```
 
-4. **Ambiguous Scope**
-   - Unclear boundaries ("user management" - what exactly?)
-   - No distinction between MVP and future enhancements
-   - Missing "what's NOT included"
+## Philosophy
 
-**Do NOT activate when**:
-- Specific file paths mentioned (e.g., "auth.go:45")
-- Code snippets included
-- Existing functions/classes referenced
-- Bug fixes with clear reproduction steps
+Clarity is not achieved by asking more questions — it is achieved by asking the *right* questions in the right order. Ask for the business problem before the technical solution, the user's definition of success before the implementation details, and the explicit exclusions before the edge cases.
 
-## Core Principles
+## NEVER
 
-1. **Systematic Questioning**
-   - Ask focused, specific questions
-   - One category at a time (2-3 questions per round)
-   - Build on previous answers
-   - Avoid overwhelming users
+- **NEVER generate the PRD before clarity score ≥ 90** — because a low-score PRD looks complete but omits exactly the constraints that cause implementation failures (security requirements, performance bounds, error states). False completeness is more dangerous than an acknowledged gap.
+- **NEVER ask all questions at once** — because users answer fewer questions when overwhelmed, and the answers become shallower. Two focused questions get more signal than twelve scattered ones.
+- **NEVER assume a technical choice the user hasn't confirmed** — because framework choices, database decisions, and API contracts constrain future work. An unconfirmed assumption embedded in a PRD becomes a hidden dependency that breaks integration.
+- **NEVER skip the YAGNI gate** — because "we might need this later" is the most common driver of scope bloat in PRDs, and scope bloat in a PRD becomes scope creep in implementation. Surface the tension before it enters the document.
+- **NEVER write vague acceptance criteria** — "the feature should work correctly" is not a criterion. Every acceptance criterion must be testable: a specific behavior, a measurable threshold, or a verifiable state. Untestable criteria guarantee disputed completions.
+- **NEVER close a clarification round without updating the score** — because the score is the only shared signal that tells the user how close they are. Skipping score updates makes the process feel arbitrary and erodes trust in the output.
+- **NEVER ask about solutions before understanding the problem** — asking "what tech stack?" before "what problem does this solve?" anchors the conversation in implementation details and obscures whether the feature is necessary at all.
 
-2. **Quality-Driven Iteration**
-   - Continuously assess clarity score (0-100)
-   - Identify gaps systematically
-   - Iterate until ≥ 90 points
-   - Document all clarification rounds
+## When Things Go Wrong
 
-3. **Actionable Output**
-   - Generate concrete specifications
-   - Include measurable acceptance criteria
-   - Provide executable phases
-   - Enable direct implementation
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| User gives one-word answers to every question | Questions are too abstract or feel like a test | Switch to example-anchored questions: "Would this be more like X or Y?" |
+| Score plateaus below 90 after 4+ rounds | Hitting a genuine unknown (the user doesn't know yet) | Document the gap explicitly in the PRD as a decision deferred to implementation; don't block on it |
+| User wants to skip clarification and just start | They've clarified mentally but not in writing | Run a rapid single-pass: surface only the non-negotiable gaps (security, data model, external integrations) and proceed |
+| PRD scope grows with every round | No explicit exclusion list established early | Stop and co-author an "Out of Scope" section before continuing inward questions |
+| User approves PRD with obvious gaps | Rubber-stamp behavior / cognitive overload | Call out the three riskiest open items explicitly; get explicit confirmation on each before closing |
 
 ## Clarification Process
 
-### Step 1: Initial Requirement Analysis
-
-**Input**: User's requirement description
-
-**Tasks**:
-1. Parse and understand core requirement
-2. Generate feature name (kebab-case format)
-3. Determine document version (default `1.0` unless user specifies otherwise)
-4. Ensure `./docs/prds/` exists for PRD output
-5. Perform initial clarity assessment (0-100)
-
-**Assessment Rubric**:
-```
-Functional Clarity: /30 points
-- Clear inputs/outputs: 10 pts
-- User interaction defined: 10 pts
-- Success criteria stated: 10 pts
+### Step 1: Initial Analysis
 
-Technical Specificity: /25 points
-- Technology stack mentioned: 8 pts
-- Integration points identified: 8 pts
-- Constraints specified: 9 pts
+Parse the requirement, assign a clarity score (0-100), identify what's clear and what's missing.
 
-Implementation Completeness: /25 points
-- Edge cases considered: 8 pts
-- Error handling mentioned: 9 pts
-- Data validation specified: 8 pts
+Score dimensions and weights → see `references/clarity-scoring.md`
 
-Business Context: /20 points
-- Problem statement clear: 7 pts
-- Target users identified: 7 pts
-- Success metrics defined: 6 pts
+Initial response format:
 ```
+Current Clarity Score: X/100
 
-**Initial Response Format**:
-```markdown
-I understand your requirement. Let me help you refine this specification.
-
-**Current Clarity Score**: X/100
-
-**Clear Aspects**:
-- [List what's clear]
-
-**Needs Clarification**:
-- [List gaps]
+Clear: [what you understand]
+Gaps: [what's missing, grouped by dimension]
 
-Let me systematically clarify these points...
+[First 2-3 questions, highest-impact first]
 ```
 
-### Step 2: Gap Analysis
-
-Identify missing information across four dimensions:
-
-**1. Functional Scope**
-- What is the core functionality?
-- What are the boundaries?
-- What is out of scope?
-- What are edge cases?
-
-**2. User Interaction**
-- How do users interact?
-- What are the inputs?
-- What are the outputs?
-- What are success/failure scenarios?
+### Step 2: YAGNI + KISS Gates
 
-**3. Technical Constraints**
-- Performance requirements?
-- Compatibility requirements?
-- Security considerations?
-- Scalability needs?
-
-**4. Business Value**
-- What problem does this solve?
-- Who are the target users?
-- What are success metrics?
-- What is the priority?
+Before proceeding to technical depth, explicitly run:
+1. **YAGNI**: "Is this feature confirmed needed now?" Surface if it's speculative.
+2. **KISS**: "Is there a simpler path that delivers 80% of the value?" Document both options if found.
 
 ### Step 3: Interactive Clarification
 
-**Question Strategy**:
-1. Start with highest-impact gaps
-2. Ask 2-3 questions per round
-3. Build context progressively
-4. Use user's language
-5. Provide examples when helpful
-
-**Question Format**:
-```markdown
-I need to clarify the following points to complete the requirements document:
-
-1. **[Category]**: [Specific question]?
-   - For example: [Example if helpful]
-
-2. **[Category]**: [Specific question]?
-
-3. **[Category]**: [Specific question]?
-
-Please provide your answers, and I'll continue refining the PRD.
-```
-
-**After Each User Response**:
-1. Update clarity score
-2. Capture new information in the working PRD outline
-3. Identify remaining gaps
-4. If score < 90: Continue with next round of questions
-5. If score ≥ 90: Proceed to PRD generation
-
-**Score Update Format**:
-```markdown
-Thank you for the additional information!
-
-**Clarity Score Update**: X/100 → Y/100
-
-**New Clarified Content**:
-- [Summarize new information]
-
-**Remaining Points to Clarify**:
-- [List remaining gaps if score < 90]
-
-[If score < 90: Continue with next round of questions]
-[If score ≥ 90: "Perfect! I will now generate the complete PRD document..."]
-```
+- 2-3 questions per round, highest-impact gaps first
+- Update score after each round
+- If score < 90: continue; if ≥ 90: generate PRD
+- Reference `references/clarity-scoring.md` for round-by-round question strategy
 
 ### Step 4: PRD Generation
 
-Once clarity score ≥ 90, generate comprehensive PRD.
-
-**Output File**:
-
-1. **Final PRD**: `./docs/prds/{feature_name}-v{version}-prd.md`
-
-Use the `Write` tool to create or update this file. Derive `{version}` from the document version recorded in the PRD (default `1.0`).
-
-## PRD Document Structure
-
-```markdown
-# {Feature Name} - Product Requirements Document (PRD)
-
-## Requirements Description
-
-### Background
-- **Business Problem**: [Describe the business problem to solve]
-- **Target Users**: [Target user groups]
-- **Value Proposition**: [Value this feature brings]
-
-### Feature Overview
-- **Core Features**: [List of main features]
-- **Feature Boundaries**: [What is and isn't included]
-- **User Scenarios**: [Typical usage scenarios]
-
-### Detailed Requirements
-- **Input/Output**: [Specific input/output specifications]
-- **User Interaction**: [User operation flow]
-- **Data Requirements**: [Data structures and validation rules]
-- **Edge Cases**: [Edge case handling]
-
-## Design Decisions
-
-### Technical Approach
-- **Architecture Choice**: [Technical architecture decisions and rationale]
-- **Key Components**: [List of main technical components]
-- **Data Storage**: [Data models and storage solutions]
-- **Interface Design**: [API/interface specifications]
-
-### Constraints
-- **Performance Requirements**: [Response time, throughput, etc.]
-- **Compatibility**: [System compatibility requirements]
-- **Security**: [Security considerations]
-- **Scalability**: [Future expansion considerations]
-
-### Risk Assessment
-- **Technical Risks**: [Potential technical risks and mitigation plans]
-- **Dependency Risks**: [External dependencies and alternatives]
-- **Schedule Risks**: [Timeline risks and response strategies]
-
-## Acceptance Criteria
-
-### Functional Acceptance
-- [ ] Feature 1: [Specific acceptance conditions]
-- [ ] Feature 2: [Specific acceptance conditions]
-- [ ] Feature 3: [Specific acceptance conditions]
-
-### Quality Standards
-- [ ] Code Quality: [Code standards and review requirements]
-- [ ] Test Coverage: [Testing requirements and coverage]
-- [ ] Performance Metrics: [Performance test pass criteria]
-- [ ] Security Review: [Security review requirements]
-
-### User Acceptance
-- [ ] User Experience: [UX acceptance criteria]
-- [ ] Documentation: [Documentation delivery requirements]
-- [ ] Training Materials: [If needed, training material requirements]
-
-## Execution Phases
-
-### Phase 1: Preparation
-**Goal**: Environment preparation and technical validation
-- [ ] Task 1: [Specific task description]
-- [ ] Task 2: [Specific task description]
-- **Deliverables**: [Phase deliverables]
-- **Time**: [Estimated time]
-
-### Phase 2: Core Development
-**Goal**: Implement core functionality
-- [ ] Task 1: [Specific task description]
-- [ ] Task 2: [Specific task description]
-- **Deliverables**: [Phase deliverables]
-- **Time**: [Estimated time]
-
-### Phase 3: Integration & Testing
-**Goal**: Integration and quality assurance
-- [ ] Task 1: [Specific task description]
-- [ ] Task 2: [Specific task description]
-- **Deliverables**: [Phase deliverables]
-- **Time**: [Estimated time]
-
-### Phase 4: Deployment
-**Goal**: Release and monitoring
-- [ ] Task 1: [Specific task description]
-- [ ] Task 2: [Specific task description]
-- **Deliverables**: [Phase deliverables]
-- **Time**: [Estimated time]
-
----
-
-**Document Version**: 1.0
-**Created**: {timestamp}
-**Clarification Rounds**: {clarification_rounds}
-**Quality Score**: {quality_score}/100
-```
+Once score ≥ 90, generate the PRD using the template in `references/prd-template.md`.
 
-## Behavioral Guidelines
+Output path: `./docs/prds/{feature-name}-v{version}-prd.md`
 
-### DO
-- Ask specific, targeted questions
-- Build on previous answers
-- Provide examples to guide users
-- Maintain conversational tone
-- Summarize clarification rounds within the PRD
-- Use clear, professional English
-- Generate concrete specifications
-- Stay in clarification mode until score ≥ 90
+Every section must contain substance. Placeholders are not acceptable in a finalized PRD.
 
-### DON'T
-- Ask all questions at once
-- Make assumptions without confirmation
-- Generate PRD before 90+ score
-- Skip any required sections
-- Use vague or abstract language
-- Proceed without user responses
-- Exit skill mode prematurely
+Once clarity ≥ 90 and the PRD is generated, the natural next step is gepetto to create a sectionized implementation plan from the PRD.
 
-## Success Criteria
+## References
 
-- Clarity score ≥ 90/100
-- All PRD sections complete with substance
-- Acceptance criteria checklistable (using `- [ ]` format)
-- Execution phases actionable with concrete tasks
-- User approves final PRD
-- Ready for development handoff
+- `references/clarity-scoring.md` — scoring rubric, YAGNI/KISS gates, round-by-round question strategy
+- `references/prd-template.md` — complete PRD structure with all required sections
diff --git a/skills/requirements-clarity/references/clarity-scoring.md b/skills/requirements-clarity/references/clarity-scoring.md
new file mode 100644
index 0000000..af16ee9
--- /dev/null
+++ b/skills/requirements-clarity/references/clarity-scoring.md
@@ -0,0 +1,41 @@
+# Clarity Scoring Rubric
+
+Score each dimension after every user response. Stop asking when total ≥ 90.
+
+## Dimensions
+
+| Dimension | Max | What Counts |
+|-----------|-----|-------------|
+| Functional Clarity | 30 | Clear inputs/outputs (+10), user interaction defined (+10), success criteria stated (+10) |
+| Technical Specificity | 25 | Tech stack named (+8), integration points identified (+8), constraints specified (+9) |
+| Implementation Completeness | 25 | Edge cases enumerated (+8), error handling described (+9), data validation specified (+8) |
+| Business Context | 20 | Problem statement clear (+7), target users identified (+7), success metrics defined (+6) |
+
+## Round-by-Round Question Strategy
+
+**Round 1 — highest-impact gaps first** (usually Functional Clarity + Business Context):
+- What problem does this solve for which users?
+- What are the inputs and expected outputs?
+- What does success look like — how will you measure it?
+
+**Round 2 — technical grounding**:
+- What's the existing tech stack and where does this fit?
+- What are the hard constraints (latency, cost, compliance)?
+- What integrations are required vs. optional?
+
+**Round 3 — edge cases and failure modes**:
+- What happens when [most likely failure]?
+- What's explicitly out of scope for v1?
+- What's the rollback plan if this breaks?
+
+## YAGNI Gate (ask before scoring Technical Specificity)
+
+Before scoring, ask: "Is this feature confirmed needed now, or is it anticipated future need?"
+- If anticipated future need → surface the YAGNI tension explicitly; offer to scope down.
+- If confirmed need → proceed.
+
+## KISS Gate (ask before scoring Implementation Completeness)
+
+Ask: "Have we considered a simpler solution that gets 80% of the value?"
+- If a simpler path exists → document both options in the PRD with tradeoffs.
+- If complex is justified → document the justification.
diff --git a/skills/requirements-clarity/references/prd-template.md b/skills/requirements-clarity/references/prd-template.md
new file mode 100644
index 0000000..57c33ce
--- /dev/null
+++ b/skills/requirements-clarity/references/prd-template.md
@@ -0,0 +1,81 @@
+# PRD Template
+
+Once clarity score ≥ 90, generate the file at `./docs/prds/{feature-name}-v{version}-prd.md`.
+
+```markdown
+# {Feature Name} — Product Requirements Document
+
+## Background
+- **Business Problem**: [what pain/need this addresses]
+- **Target Users**: [who, not "users"]
+- **Value Proposition**: [why now, why this approach]
+
+## Feature Scope
+- **Included**: [explicit list]
+- **Excluded**: [what's out — prevents scope creep]
+- **User Scenarios**: [2-3 concrete flows]
+
+## Detailed Requirements
+- **Inputs / Outputs**: [specific, typed]
+- **User Interaction**: [step-by-step flow]
+- **Data Requirements**: [schema, validation rules]
+- **Edge Cases**: [enumerated, not "handle errors"]
+
+## Technical Approach
+- **Architecture**: [decision + rationale]
+- **Key Components**: [list]
+- **Data Storage**: [model + storage choice]
+- **Interfaces**: [API contract or UI spec]
+
+## Constraints
+- **Performance**: [e.g., p99 < 200ms, 1k rps]
+- **Compatibility**: [browsers, OS, API versions]
+- **Security**: [auth, data sensitivity, compliance]
+- **Scalability**: [growth assumptions]
+
+## Risk Assessment
+| Risk | Likelihood | Impact | Mitigation |
+|------|-----------|--------|------------|
+| [technical risk] | H/M/L | H/M/L | [action] |
+| [dependency risk] | H/M/L | H/M/L | [action] |
+
+## Acceptance Criteria
+
+### Functional
+- [ ] [specific, testable condition]
+- [ ] [specific, testable condition]
+
+### Quality
+- [ ] Test coverage ≥ X%
+- [ ] Performance: [specific metric passes]
+- [ ] Security: [specific check passes]
+
+### User Acceptance
+- [ ] [UX condition]
+- [ ] Documentation delivered
+
+## Execution Phases
+
+### Phase 1: Preparation
+- [ ] [task]
+- **Deliverable**: [artifact]
+- **Estimate**: [time]
+
+### Phase 2: Core Development
+- [ ] [task]
+- **Deliverable**: [artifact]
+- **Estimate**: [time]
+
+### Phase 3: Integration & Testing
+- [ ] [task]
+- **Deliverable**: [artifact]
+- **Estimate**: [time]
+
+### Phase 4: Deployment
+- [ ] [task]
+- **Deliverable**: [artifact]
+- **Estimate**: [time]
+
+---
+**Version**: {version}  **Created**: {date}  **Rounds**: {n}  **Score**: {score}/100
+```
diff --git a/skills/security-review/SKILL.md b/skills/security-review/SKILL.md
new file mode 100644
index 0000000..d039c0d
--- /dev/null
+++ b/skills/security-review/SKILL.md
@@ -0,0 +1,176 @@
+---
+name: security-review
+version: 1.0.0
+description: "Security-aware review layer for engineering artifacts — React/frontend XSS vectors, database schema access control and encryption decisions, OpenAPI securitySchemes and scope design, TypeScript runtime type guard failures, and dependency supply-chain hygiene. Use when asked to security-review code, audit for vulnerabilities, check for XSS, review auth patterns, check JWT validation, audit npm dependencies, or add security hardening. Pairs with react-dev, database-schema-designer, and openapi-to-typescript."
+---
+
+# Security Review
+
+Security findings have no middle ground: either the attack path exists or it does not. The job is to determine which.
+
+## Mindset
+
+1. **Exploitability, not CVE score.** A critical-severity CVE in a library you call with no user-controlled input is a zero-risk finding in your codebase. A medium-severity finding in an input path exposed to the public internet is your highest-priority work. Always trace the data path from user input to the vulnerable sink before assigning severity.
+2. **React escapes JSX — but not href, src, or dangerouslySetInnerHTML.** React's JSX interpolation escapes HTML entities in text content. It does NOT sanitize `href` (javascript: injection), `src` (data: injection), or anything passed to `dangerouslySetInnerHTML`. These three vectors are the most commonly missed XSS surface in React codebases because developers trust "React handles escaping."
+3. **ORM parameterization has intentional escape hatches — and they are the injection vectors.** Prisma `$queryRaw`, Sequelize `literal()`, and TypeORM `createQueryBuilder().where(rawString)` all bypass parameterization by design. Every ORM raw escape hatch in a codebase is a mandatory review target.
+4. **JWT signature verification is not JWT validation.** A verified signature proves the token was signed with the correct key. It does not validate that the token has not expired (`exp`), was issued at a sane time (`iat`), is intended for this audience (`aud`), or came from the expected issuer (`iss`). Most JWT libraries default to silent-pass on missing claims — absence of `exp` does not trigger an error, it produces an immortal token.
+5. **TypeScript types are compile-time assertions, not runtime enforcement.** A `User` type with no `isAdmin` property does not prevent `req.body.isAdmin = true` from reaching a database write. Mass assignment attacks bypass the type system entirely because types are erased at runtime. Explicit allowlist validation at the controller layer is required regardless of TypeScript strictness.
+6. **Non-exploitable findings still require grading.** A non-exploitable XSS today can become exploitable with one future code change, one new dependency, or one route added to the application. Mark the finding with its current exploitability status and the condition that would activate it — do not discard it.
+
+## Navigation
+
+**Use this skill when:**
+- Auditing React components or frontend code for XSS, CSRF, or injection vectors
+- Reviewing database schemas for access control, encryption at rest, and audit logging gaps
+- Reviewing OpenAPI specs or API contracts for auth scope design and sensitive data exposure
+- Auditing TypeScript code for runtime type assertion failures masking security assumptions
+- Reviewing npm dependencies for supply chain risk, provenance, and audit findings
+- Adding security hardening to any engineering artifact produced by another skill
+
+**Do NOT use this skill when:**
+- Designing new features without an existing artifact to review (use the appropriate engineering skill first)
+- Network-layer or infrastructure security (firewall rules, VPC design, TLS configuration) — these require infrastructure-specific context outside this skill's scope
+
+**Cross-skill routing:**
+- Working on a database schema with RLS, encryption, or audit log requirements? Use **database-schema-designer** for schema modeling first, then return here for security validation of the resulting schema.
+- Reviewing or writing React components? Use **react-dev** for type-safe component patterns, then apply this skill to audit the resulting components for XSS vectors and CSP alignment.
+- Designing or consuming an OpenAPI spec? Use **openapi-to-typescript** for type generation and client scaffolding, then apply this skill to audit securitySchemes, scope granularity, and error response exposure.
+
+## Coverage Areas
+
+### React / Frontend
+
+**XSS Vectors to inspect in every React review:**
+
+| Vector | Pattern | Risk |
+|--------|---------|------|
+| `dangerouslySetInnerHTML` | `<div dangerouslySetInnerHTML={{ __html: userInput }} />` | Direct DOM injection; sanitize with DOMPurify before assignment |
+| `href` injection | `<a href={userInput}>` | `javascript:alert(1)` executes on click; validate `href` starts with `https://` or `/` |
+| `src` injection | `<img src={userInput} />` | `data:` URIs can exfiltrate via CSP-bypass in some browsers; validate against an allowlist |
+| `innerHTML` via `ref` | `ref.current.innerHTML = userInput` | Bypasses React's escaping entirely; use `textContent` or sanitize |
+| `eval` / `new Function` | Template rendering, i18n interpolation | Remote code execution if user input reaches these; audit i18n library interpolation syntax |
+
+**CSP header design checklist:**
+- `default-src 'self'` as baseline — no wildcard origins
+- `script-src` must not include `'unsafe-inline'` or `'unsafe-eval'` in production; use nonces or hashes
+- `connect-src` allowlist must match actual API domains — overly broad `*` negates exfiltration protection
+- `frame-ancestors 'none'` or specific allowlist to prevent clickjacking
+
+**CSRF token patterns:**
+- SameSite=Strict cookies eliminate CSRF for browser-initiated requests but break cross-origin OAuth flows
+- SameSite=Lax cookies protect state-changing GET requests; POST requires a separate CSRF token
+- Double-submit cookie pattern: send token in both cookie and header; server validates they match
+
+**Supply chain (frontend):**
+- `npm audit` identifies known CVEs; it does NOT identify exploitable paths — grade each finding by whether user-controlled input reaches the vulnerable function in that library
+- Subresource Integrity (SRI) hashes on CDN-loaded scripts prevent compromise via CDN hijacking
+- Pin exact versions in `package-lock.json`; `^` and `~` ranges allow automatic minor/patch upgrades that can introduce regressions silently
+
+### Database Schemas
+
+**Row-Level Security (PostgreSQL RLS):**
+- `ALTER TABLE ... ENABLE ROW LEVEL SECURITY` alone does nothing — you must also create `POLICY` definitions
+- Default-deny: with RLS enabled and no policies, the table owner bypasses RLS; all other roles see zero rows
+- Multi-tenant SaaS pattern: `CREATE POLICY tenant_isolation ON orders USING (tenant_id = current_setting('app.tenant_id')::uuid)` — set `app.tenant_id` at connection time via `SET LOCAL`
+- Never use `SECURITY DEFINER` functions to bypass RLS without explicit audit — they run as the function owner, not the calling user
+
+**Encryption at rest decisions:**
+- Column-level encryption (pgcrypto, application-layer) for PII fields (SSN, payment card, health data) — full-disk encryption does not protect against compromised application credentials
+- Key rotation: encrypted columns require a migration strategy for re-encryption; design the key reference (key version ID stored alongside ciphertext) before first write
+- Never store encryption keys in the same database as encrypted data
+
+**Audit logging hooks:**
+- Trigger-based audit log: `CREATE TRIGGER audit_users AFTER INSERT OR UPDATE OR DELETE ON users FOR EACH ROW EXECUTE FUNCTION log_change()` — captures who changed what and when at the database layer, immune to application bypass
+- Include `session_user`, `current_timestamp`, old and new row snapshots
+- Write audit logs to a separate schema with restricted DELETE privileges — the application role that writes data must not be able to delete its own audit trail
+
+**SQL injection via ORMs:**
+- Prisma: `$queryRaw` is safe with tagged template literals (`$queryRaw\`SELECT * FROM users WHERE id = ${id}\``); `$queryRawUnsafe(string)` is not — it concatenates the string directly
+- Sequelize: `Model.findAll({ where: sequelize.literal(userInput) })` — `literal()` is an injection vector; use `Op` operators instead
+- TypeORM: `createQueryBuilder().where(rawString, params)` is safe if params are bound; `.where(\`status = '${userInput}'\`)` is a direct injection vector
+- Knex: `.whereRaw(string)` with no bindings is an injection vector; use `.whereRaw('status = ?', [userInput])`
+
+### API Contracts (OpenAPI)
+
+**securitySchemes completeness:**
+- Every operation must reference a security scheme or explicitly declare `security: []` (public endpoint) — undeclared security means the spec is incomplete, not that the endpoint is secure
+- OAuth2 scopes must map to actual permission boundaries — broad scopes (`write:all`) indicate mass-assignment risk at the server; scope granularity should match resource-level operations
+- API key schemes: document whether keys are per-user, per-app, or per-environment — ambiguity leads to over-privileged keys shared across environments
+
+**OWASP API Security Top 10 2023 — the API-specific list:**
+1. Broken Object Level Authorization (BOLA) — verify caller owns the resource, not just that they are authenticated
+2. Broken Authentication — JWT validation completeness (see JWT section), token rotation, refresh token storage
+3. Broken Object Property Level Authorization — mass assignment; ensure write operations accept only explicitly allowlisted fields
+4. Unrestricted Resource Consumption — rate limiting, pagination limits, max request body size
+5. Broken Function Level Authorization — admin endpoints accessible to regular users
+6. Unrestricted Access to Sensitive Business Flows — bot-driven abuse of legitimate flows (account creation, checkout)
+7. Server-Side Request Forgery (SSRF) — any URL parameter passed to a server-side fetch
+8. Security Misconfiguration — default credentials, CORS `*`, missing security headers
+9. Improper Inventory Management — shadow APIs, deprecated versions still reachable
+10. Unsafe Consumption of APIs — insufficient validation of data received from third-party APIs
+
+**Sensitive field exposure in error responses:**
+- Stack traces in 500 responses expose file paths, library versions, and internal architecture — disable in production
+- Validation error messages that mirror field names exactly help attackers enumerate valid fields
+- `WWW-Authenticate` response headers on 401s should not reveal the auth scheme implementation details beyond what is required by RFC 7235
+
+### TypeScript Runtime Safety
+
+**Runtime type guard failures masking security assumptions:**
+```typescript
+// UNSAFE — TypeScript trusts the assertion; at runtime, body is whatever the client sends
+function handleUpdate(body: UpdateUserRequest) {
+  db.update(body); // mass assignment: body.isAdmin reaches the DB
+}
+
+// SAFE — explicit allowlist at controller layer
+function handleUpdate(rawBody: unknown) {
+  const body = UpdateUserSchema.parse(rawBody); // Zod/Valibot validates shape and types
+  db.update({ name: body.name, email: body.email }); // explicit field selection
+}
+```
+
+**Unsafe type assertions hiding injection vectors:**
+- `as unknown as SafeQuery` double assertions are a code smell — they usually mean the developer knows the type is wrong but wants to suppress the error; audit the data provenance before each one
+- `!` non-null assertions on user-provided data (`req.params.id!`) are a runtime crash waiting to happen and potentially an injection vector if the value flows to a database query
+- Type predicates (`function isUser(x: unknown): x is User`) without runtime checks are as dangerous as `as User` — the predicate function must actually validate the shape
+
+**Dependency hygiene (TypeScript projects):**
+- SBOM generation: `npm sbom --sbom-format=spdx` (npm ≥ 9.7) or `syft . -o spdx-json` to produce a Software Bill of Materials for auditing and compliance
+- npm provenance verification: packages published with provenance attestation (`--provenance` flag, npm ≥ 9.5) link the package to the source commit — prefer provenance-attested packages for critical dependencies
+- Package name typosquatting detection: inspect `npm install` output for packages you did not intend to install; use `npm query ":root > .prod` to list only production dependencies and audit each name against the intended package list
+- Lock file integrity: `package-lock.json` should be committed and its integrity hashes should be verified in CI (`npm ci` enforces this; `npm install` does not)
+
+## NEVER
+
+- **NEVER mark a security finding as informational without verifying the actual attack path** — a non-exploitable XSS today becomes exploitable with one future code change, one new dependency, or one route addition. Document the finding with its current exploitability status and the activation condition.
+- **NEVER approve a JWT validation implementation that checks only the signature** — `exp`, `iat`, `aud`, and `iss` validation are equally required. A signed token with no `exp` is an immortal credential. Most JWT libraries do not throw on missing claims by default; these checks must be explicit.
+- **NEVER allow raw SQL strings built from user input even inside a transaction** — a transaction provides atomicity, not injection protection. Parameterize at the query-builder level regardless of transaction context.
+- **NEVER treat `npm audit --fix` as a security review** — `--fix` resolves dependency versions without verifying whether the vulnerability is reachable in your codebase or whether the version bump introduces breaking changes. It is a maintenance tool, not a security assessment.
+- **NEVER expose stack traces or internal error details in production API error responses** — stack traces are reconnaissance data. They reveal file paths, library versions, internal class names, and architecture. Return opaque error IDs to clients; log full details server-side only.
+- **NEVER skip `aud` validation on JWTs because "we only have one service"** — audience validation prevents token reuse across services when the architecture grows. Retrofitting `aud` validation after multiple services exist requires coordinated token migration. Enforce it from day one.
+- **NEVER assume TypeScript strictness prevents mass assignment** — TypeScript types are erased at runtime. `req.body` is `any` at the HTTP boundary regardless of what the handler's parameter type says. Validate and allowlist at the boundary.
+
+## Severity Grading
+
+| Finding | Exploitability Assessment | Grade |
+|---------|--------------------------|-------|
+| `dangerouslySetInnerHTML` with user input, no sanitization | Direct: user input → DOM | Critical |
+| `href={userInput}` without validation | Direct: `javascript:` URI | High |
+| JWT missing `exp` validation | Direct: immortal tokens if issued without exp | High |
+| ORM raw query with user string concatenation | Direct: SQL injection | Critical |
+| `npm audit` high-severity CVE | Requires tracing actual call path — do not grade as High without confirming user input reaches the vulnerability | Needs triage |
+| RLS enabled with no policies | Direct: all non-owner rows exposed | Critical |
+| Stack trace in 500 response | Indirect: reconnaissance enables targeted attacks | Medium |
+| Missing `aud` claim validation | Conditional: exploitable only when multiple services share signing key | Medium (document activation condition) |
+
+## When Things Go Wrong
+
+| Symptom | Root Cause | Fix |
+|---------|-----------|-----|
+| XSS fires despite React | `dangerouslySetInnerHTML`, `href`, or `src` bypassed React escaping | Sanitize with DOMPurify; validate URL schemes; never use `dangerouslySetInnerHTML` with unsanitized input |
+| JWT accepted after expiry | `exp` not validated or library defaults to ignore missing `exp` | Explicitly set `ignoreExpiration: false`; verify library docs for default claim validation behavior |
+| ORM query returns wrong tenant's data | Missing RLS policy or parameterization bypassed via raw query | Enable and test RLS policies; replace `literal()`/`$queryRawUnsafe` with parameterized equivalents |
+| Mass assignment overwrites protected field | TypeScript type used as runtime validation | Add Zod/Valibot schema at controller boundary; explicit field allowlist before DB write |
+| npm audit clean but supply chain incident | Typosquatted package or compromised transitive dep | Enable npm provenance verification; pin transitive deps in lock file; generate SBOM and audit against known-good list |
+| API returns stack trace in production | Error middleware not stripping internals | Implement production error handler that returns `{ error: errorId }` only; log full trace server-side |
diff --git a/skills/session-handoff/SKILL.md b/skills/session-handoff/SKILL.md
index 1d61d97..dde5591 100644
--- a/skills/session-handoff/SKILL.md
+++ b/skills/session-handoff/SKILL.md
@@ -1,189 +1,140 @@
 ---
 name: session-handoff
-description: "Creates comprehensive handoff documents for seamless AI agent session transfers. Triggered when: (1) user requests handoff/memory/context save, (2) context window approaches capacity, (3) major task milestone completed, (4) work session ending, (5) user says 'save state', 'create handoff', 'I need to pause', 'context is getting full', (6) resuming work with 'load handoff', 'resume from', 'continue where we left off'. Proactively suggests handoffs after substantial work (multiple file edits, complex debugging, architecture decisions). Solves long-running agent context exhaustion by enabling fresh agents to continue with zero ambiguity."
+description: "Creates and resumes comprehensive handoff documents for seamless AI agent session transfers. Use when: user says 'create handoff', 'save state', 'context is getting full', 'I need to pause', 'resume from handoff', 'continue where we left off', 'load context'; OR proactively after 5+ file edits, complex debugging, or architecture decisions. Solves context-exhaustion continuity failures in long-running agent work."
 ---
 
-# Handoff
+## Mindset
 
-Creates comprehensive handoff documents that enable fresh AI agents to seamlessly continue work with zero ambiguity. Solves the long-running agent context exhaustion problem.
+Handoffs fail not because they're incomplete — they fail because the *resuming* agent cannot reconstruct **intent and momentum**, only facts. A list of changed files is not a handoff. The document must answer: "What was I *trying to do* and why does it matter right now?"
 
-## Mode Selection
+Three practitioner truths:
+- The most important section is "Decisions Made and Why" — not "Next Steps". Resuming agents reconstruct next steps from decisions; they cannot reconstruct decisions from next steps.
+- Staleness is asymmetric: a handoff that's 2 hours old in an active repo is more dangerous than one that's 2 weeks old in a quiet one. Always check git delta, not wall clock.
+- The agent writing the handoff is context-rich and will underspecify because things feel obvious. Write for an agent that knows nothing except what you write.
 
-Determine which mode applies:
+## Navigation
 
-**Creating a handoff?** User wants to save current state, pause work, or context is getting full.
-- Follow: CREATE Workflow below
+**Use this skill when**:
+- User says: "create handoff", "save state", "context getting full", "I need to pause", "resume from", "continue where we left off"
+- Context window is visibly large (10+ tool calls, 5+ file edits, complex debug chains)
+- A major decision or architecture choice was just made
+- Resuming work from a previous session or different machine
 
-**Resuming from a handoff?** User wants to continue previous work, load context, or mentions an existing handoff.
-- Follow: RESUME Workflow below
+**Do NOT use this skill when**:
+- The task will complete in the current session — handoffs for trivial tasks add noise
+- No actual work has been done yet (handoff would be empty, misleading)
+- User asked to *save to memory* (MEMORY.md) rather than handoff — these are different tools
 
-**Proactive suggestion?** After substantial work (5+ file edits, complex debugging, major decisions), suggest:
-> "We've made significant progress. Consider creating a handoff document to preserve this context for future sessions. Say 'create handoff' when ready."
+**Ambiguous case — CREATE vs RESUME**:
+- If user says "create handoff" → CREATE workflow
+- If user says "resume" / "load" / "continue" → RESUME workflow
+- If user provides a file path → RESUME workflow, load that file
 
-## CREATE Workflow
+## Philosophy
 
-### Step 1: Generate Scaffold
+A handoff is not documentation — it is a **cognitive transplant**. Every word should close an ambiguity gap, not describe what is already visible in the codebase.
 
-Run the smart scaffold script to create a pre-filled handoff document:
+## NEVER
 
-```bash
-python scripts/create_handoff.py [task-slug]
-```
+- NEVER write "continue implementing X" as a next step without specifying the exact file, function, and the decision boundary where you stopped — because a resuming agent will start from the beginning of X rather than the midpoint.
+- NEVER include secrets, tokens, API keys, or passwords in handoff documents — they are stored in `.claude/handoffs/` which may be committed; validate with `validate_handoff.py` before finalizing.
+- NEVER skip the "Decisions Made" section because "it's obvious from the code" — the rationale for a decision is never recoverable from the code itself, only its outcome is.
+- NEVER resume from a handoff without running `check_staleness.py` first — a STALE handoff with an incorrect assumed branch is worse than no handoff (the resuming agent will confidently pursue a superseded plan).
+- NEVER create a handoff chain longer than 3 links without pruning — resuming agents reading 4+ chained handoffs will synthesize a corrupted composite context that contradicts itself.
+- NEVER omit the "What Is Blocked / Unresolved" section — unexplained blocks are the #1 cause of resuming agents re-attempting the same failed approach.
+
+## When Things Go Wrong
+
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| Resuming agent heads in wrong direction | Next steps were listed without the decision context behind them | Read "Decisions Made" section; reconstruct intent before acting |
+| Validation score below 70 | `[TODO: ...]` stubs left unfilled, or key sections empty | Fill all TODO markers; "Current State" and "Decisions Made" are mandatory |
+| `check_staleness.py` reports VERY_STALE | Many commits since handoff; branch may have diverged | Do NOT resume — create fresh handoff from current state instead |
+| Chained handoffs contradict each other | Older handoff assumed X; newer handoff changed course without noting it | Read newest handoff only; treat it as authoritative; mark prior handoffs superseded |
+| Secrets detected by validator | Credentials were pasted inline rather than referenced by path | Remove secrets, replace with vault/env reference, re-validate |
+
+---
+
+## CREATE Workflow
 
-Example: `python scripts/create_handoff.py implementing-user-auth`
+### Step 1: Generate scaffold
 
-**For continuation handoffs** (linking to previous work):
 ```bash
+python scripts/create_handoff.py [task-slug]
+# Chaining from prior handoff:
 python scripts/create_handoff.py "auth-part-2" --continues-from 2024-01-15-auth.md
 ```
 
-The script will:
-- Create `.claude/handoffs/` directory if needed
-- Generate timestamped filename
-- Pre-fill: timestamp, project path, git branch, recent commits, modified files
-- Add handoff chain links if continuing from previous
-- Output file path for editing
-
-### Step 2: Complete the Handoff Document
+Script auto-fills: timestamp, project path, git branch, recent commits, modified files.
 
-Open the generated file and fill in all `[TODO: ...]` sections. Prioritize these sections:
+### Step 2: Fill mandatory sections
 
-1. **Current State Summary** - What's happening right now
-2. **Important Context** - Critical info the next agent MUST know
-3. **Immediate Next Steps** - Clear, actionable first steps
-4. **Decisions Made** - Choices with rationale (not just outcomes)
+Open the generated file. These sections are **mandatory** — a handoff without them is invalid:
 
-Use the template structure in [references/handoff-template.md](references/handoff-template.md) for guidance.
+1. **Current State Summary** — one paragraph: what is working, what is not, where execution stopped
+2. **Decisions Made and Why** — each decision with its rationale; not just the outcome
+3. **What Is Blocked / Unresolved** — anything stalled and the last approach tried
+4. **Immediate Next Steps** — file + function + exact stopping point for each action item
 
-### Step 3: Validate the Handoff
+Template structure: [references/handoff-template.md](references/handoff-template.md)
 
-Run the validation script to check completeness and security:
+### Step 3: Validate
 
 ```bash
 python scripts/validate_handoff.py <handoff-file>
 ```
 
-The validator checks:
-- [ ] No `[TODO: ...]` placeholders remaining
-- [ ] Required sections present and populated
-- [ ] No potential secrets detected (API keys, passwords, tokens)
-- [ ] Referenced files exist
-- [ ] Quality score (0-100)
+Do not finalize if: any `[TODO: ...]` remains, secrets detected, score < 70.
 
-**Do not finalize a handoff with secrets detected or score below 70.**
+### Step 4: Confirm to user
 
-### Step 4: Confirm Handoff
+Report: file location, validation score, first next-step action item.
 
-Report to user:
-- Handoff file location
-- Validation score and any warnings
-- Summary of captured context
-- First action item for next session
+---
 
 ## RESUME Workflow
 
-### Step 1: Find Available Handoffs
-
-List handoffs in the current project:
+### Step 1: Find and assess handoffs
 
 ```bash
 python scripts/list_handoffs.py
-```
-
-This shows all handoffs with dates, titles, and completion status.
-
-### Step 2: Check Staleness
-
-Before loading, check how current the handoff is:
-
-```bash
 python scripts/check_staleness.py <handoff-file>
 ```
 
-Staleness levels:
-- **FRESH**: Safe to resume - minimal changes since handoff
-- **SLIGHTLY_STALE**: Review changes, then resume
-- **STALE**: Verify context carefully before resuming
-- **VERY_STALE**: Consider creating a fresh handoff
-
-The script checks:
-- Time since handoff was created
-- Git commits since handoff
-- Files changed since handoff
-- Branch divergence
-- Missing referenced files
-
-### Step 3: Load the Handoff
-
-Read the relevant handoff document completely before taking any action.
-
-If handoff is part of a chain (has "Continues from" link), also read the linked previous handoff for full context.
-
-### Step 4: Verify Context
-
-Follow the checklist in [references/resume-checklist.md](references/resume-checklist.md):
-
-1. Verify project directory and git branch match
-2. Check if blockers have been resolved
-3. Validate assumptions still hold
-4. Review modified files for conflicts
-5. Check environment state
+Staleness levels: **FRESH** → resume safely | **SLIGHTLY_STALE** → review changes first | **STALE** → verify carefully | **VERY_STALE** → create fresh handoff instead.
 
-### Step 5: Begin Work
+### Step 2: Load and verify
 
-Start with "Immediate Next Steps" item #1 from the handoff document.
+Read the handoff completely. If chained, read the most recent only — use predecessor links only to resolve specific gaps.
 
-Reference these sections as you work:
-- "Critical Files" for important locations
-- "Key Patterns Discovered" for conventions to follow
-- "Potential Gotchas" to avoid known issues
+Full resume checklist: [references/resume-checklist.md](references/resume-checklist.md)
 
-### Step 6: Update or Chain Handoffs
+Priority checks:
+1. Git branch matches expected branch from handoff
+2. Blockers listed — have they been resolved externally?
+3. Referenced files still exist and match expected state
 
-As you work:
-- Mark completed items in "Pending Work"
-- Add new discoveries to relevant sections
-- For long sessions: create a new handoff with `--continues-from` to chain them
+### Step 3: Begin from intent, not steps
 
-## Handoff Chaining
+Start from "Decisions Made" to reconstruct *why* the work matters, then execute "Immediate Next Steps" item #1.
 
-For long-running projects, chain handoffs together to maintain context lineage:
+### Step 4: Maintain or chain
 
-```
-handoff-1.md (initial work)
-    ↓
-handoff-2.md --continues-from handoff-1.md
-    ↓
-handoff-3.md --continues-from handoff-2.md
-```
-
-Each handoff in the chain:
-- Links to its predecessor
-- Can mark older handoffs as superseded
-- Provides context breadcrumbs for new agents
-
-When resuming from a chain, read the most recent handoff first, then reference predecessors as needed.
-
-## Storage Location
+- Mark completed items in "Pending Work" as you go
+- After substantial progress: create new handoff with `--continues-from` to chain; keep chain ≤ 3 links deep
 
-Handoffs are stored in: `.claude/handoffs/`
-
-Naming convention: `YYYY-MM-DD-HHMMSS-[slug].md`
+---
 
-Example: `2024-01-15-143022-implementing-auth.md`
+## Storage
 
-## Resources
+Location: `.claude/handoffs/`
+Naming: `YYYY-MM-DD-HHMMSS-[slug].md`
 
-### scripts/
+## Scripts Reference
 
 | Script | Purpose |
 |--------|---------|
-| `create_handoff.py [slug] [--continues-from <file>]` | Generate new handoff with smart scaffolding |
-| `list_handoffs.py [path]` | List available handoffs in a project |
-| `validate_handoff.py <file>` | Check completeness, quality, and security |
-| `check_staleness.py <file>` | Assess if handoff context is still current |
-
-### references/
-
-- [handoff-template.md](references/handoff-template.md) - Complete template structure with guidance
-- [resume-checklist.md](references/resume-checklist.md) - Verification checklist for resuming agents
+| `create_handoff.py [slug] [--continues-from <file>]` | Scaffold new handoff |
+| `list_handoffs.py [path]` | List available handoffs |
+| `validate_handoff.py <file>` | Check completeness, quality, secrets |
+| `check_staleness.py <file>` | Assess if context is still current |
diff --git a/skills/session-handoff/evals/model-expectations.md b/skills/session-handoff/evals/model-expectations.md
deleted file mode 100644
index cf5a559..0000000
--- a/skills/session-handoff/evals/model-expectations.md
+++ /dev/null
@@ -1,170 +0,0 @@
-# Model-Specific Expectations
-
-This document describes expected behavior differences across Claude models when using the session-handoff skill.
-
-## Model Characteristics
-
-### Haiku (Fast, Lightweight)
-- **Strengths**: Quick responses, follows explicit instructions well
-- **Limitations**: May need more guidance, less proactive
-- **Skill adjustments**: May need more explicit prompts for complex scenarios
-
-### Sonnet (Balanced)
-- **Strengths**: Good balance of speed and capability, handles workflows well
-- **Limitations**: May occasionally miss subtle triggers
-- **Skill adjustments**: Should work well with default instructions
-
-### Opus (Most Capable)
-- **Strengths**: Excellent context understanding, proactive suggestions
-- **Limitations**: May over-elaborate when not needed
-- **Skill adjustments**: May add extra context/suggestions beyond requirements
-
----
-
-## Expected Behaviors by Scenario
-
-### Scenario 1: Basic Handoff Creation
-
-| Aspect | Haiku | Sonnet | Opus |
-|--------|-------|--------|------|
-| Trigger recognition | Should trigger | Should trigger | Should trigger |
-| Script execution | Runs script | Runs script | Runs script |
-| TODO completion | May need prompting | Fills reasonable defaults | Rich, detailed content |
-| Validation reminder | May skip | Usually includes | Always includes |
-
-**Haiku-specific guidance:**
-- May need explicit "now fill in the TODO sections"
-- Keep prompts simple and direct
-
-**Opus-specific notes:**
-- May proactively suggest additional sections
-- May add extra context without being asked
-
-### Scenario 2: Handoff Chaining
-
-| Aspect | Haiku | Sonnet | Opus |
-|--------|-------|--------|------|
-| Finds previous handoffs | With explicit prompt | Usually automatic | Always automatic |
-| Uses --continues-from | May need reminder | Usually correct | Always correct |
-| Context from previous | Basic reference | Good summary | Detailed synthesis |
-
-**Haiku-specific guidance:**
-- Explicitly mention "link to the previous handoff"
-- May need to specify exact filename
-
-### Scenario 3: Resume from Handoff
-
-| Aspect | Haiku | Sonnet | Opus |
-|--------|-------|--------|------|
-| Lists handoffs | With prompt | Automatic | Automatic |
-| Staleness check | May skip | Usually runs | Always runs |
-| Context absorption | Basic | Good | Excellent |
-| Next steps focus | May need guidance | Usually clear | Proactive planning |
-
-**Haiku-specific guidance:**
-- Explicitly ask "check the staleness first"
-- May need "what are the next steps from the handoff?"
-
-### Scenario 4: Proactive Handoff Suggestion
-
-| Aspect | Haiku | Sonnet | Opus |
-|--------|-------|--------|------|
-| Recognizes substantial work | Unlikely without prompt | Sometimes | Usually |
-| Suggests handoff | Rarely proactive | Sometimes proactive | Often proactive |
-| Timing of suggestion | N/A | After 5+ major items | After 3-5 items |
-
-**Notes:**
-- Haiku will rarely proactively suggest handoffs
-- Sonnet may suggest after explicit substantial work description
-- Opus most likely to suggest unprompted
-
-### Scenario 5: Validation Flow
-
-| Aspect | Haiku | Sonnet | Opus |
-|--------|-------|--------|------|
-| Runs validation script | With explicit request | Usually automatic | Always automatic |
-| Interprets score | Basic | Good | Detailed |
-| Actionable feedback | May need prompting | Usually provides | Detailed plan |
-
-### Scenario 6: Staleness Check
-
-| Aspect | Haiku | Sonnet | Opus |
-|--------|-------|--------|------|
-| Runs staleness script | With explicit request | Usually | Always |
-| Interprets results | Basic | Good | Detailed analysis |
-| Recommendations | Repeats script output | Contextualizes | Strategic advice |
-
-### Scenario 7: Secret Detection
-
-| Aspect | Haiku | Sonnet | Opus |
-|--------|-------|--------|------|
-| Detects secrets | Via script | Via script | Via script + may notice more |
-| Warning clarity | Basic | Clear | Detailed security advice |
-| Remediation guidance | Script output | Clear steps | Comprehensive plan |
-
----
-
-## Tuning Recommendations
-
-### For Haiku Optimization
-If Haiku struggles:
-1. Add more explicit trigger phrases to description
-2. Include step-by-step numbered instructions
-3. Add explicit checkpoints ("After creating, run validation")
-4. Reduce ambiguity in instructions
-
-### For Sonnet Optimization
-If Sonnet misses triggers:
-1. Ensure key terms are in description
-2. Add example trigger phrases
-3. Make workflow decision points clearer
-
-### For Opus Optimization
-If Opus over-elaborates:
-1. Add "keep responses concise" guidance
-2. Specify when NOT to add extra content
-3. Define clear scope boundaries
-
----
-
-## Pass/Fail Criteria by Model
-
-### Minimum Pass Thresholds
-
-| Model | Min Score | Notes |
-|-------|-----------|-------|
-| Haiku | 49/70 (70%) | Allow some missed proactive triggers |
-| Sonnet | 56/70 (80%) | Should handle most scenarios well |
-| Opus | 63/70 (90%) | Should excel at all scenarios |
-
-### Critical Failures (Any Model)
-These should always work regardless of model:
-- [ ] Basic handoff creation with explicit request
-- [ ] Script execution when instructed
-- [ ] Secret detection warning
-- [ ] File creation in correct location
-
----
-
-## Testing Protocol
-
-1. **Run setup script:**
-   ```bash
-   python evals/setup_test_env.py
-   cd /tmp/handoff-eval-project
-   ```
-
-2. **Test each scenario in new conversation**
-   - Start fresh conversation for each scenario
-   - Use exact trigger phrases from test-scenarios.md
-   - Record scores using results template
-
-3. **Compare across models**
-   - Note significant behavior differences
-   - Identify skill improvements needed
-   - Update SKILL.md if Haiku needs more guidance
-
-4. **Document findings**
-   - Use results template for each model
-   - Note specific failure modes
-   - Recommend skill adjustments
diff --git a/skills/session-handoff/evals/results-opus-baseline.md b/skills/session-handoff/evals/results-opus-baseline.md
deleted file mode 100644
index 10e22d3..0000000
--- a/skills/session-handoff/evals/results-opus-baseline.md
+++ /dev/null
@@ -1,88 +0,0 @@
-# Test Results: Opus 4.5 (Baseline)
-
-Date: 2025-11-27
-Model: claude-opus-4-5-20251101
-Skill version: session-handoff v1.0
-
-## Script Verification Tests
-
-All scripts executed successfully against test environment:
-
-| Script | Status | Output |
-|--------|--------|--------|
-| `list_handoffs.py` | PASS | Found 3 handoffs, correct metadata |
-| `validate_handoff.py` (incomplete) | PASS | Score 28/100, detected 5 TODOs |
-| `validate_handoff.py` (complete) | PASS | Score 100/100 on auth handoff |
-| `check_staleness.py` (stale) | PASS | VERY_STALE, 14 days, 6 commits |
-| `check_staleness.py` (fresh) | PASS | FRESH, 0 days |
-| `create_handoff.py` (basic) | PASS | Created with metadata |
-| `create_handoff.py` (chained) | PASS | Correct chain link added |
-
-## Scenario Test Results
-
-| Scenario | Score | Notes |
-|----------|-------|-------|
-| 1. Basic Creation | 10/10 | Triggered correctly, all steps executed |
-| 2. Chaining | 10/10 | Found previous, linked correctly |
-| 3. Resume | 9/10 | Would need live test; scripts work |
-| 4. Proactive | 8/10 | Suggests after substantial work description |
-| 5. Validation | 10/10 | Clear output, actionable feedback |
-| 6. Staleness | 10/10 | Detailed analysis, correct recommendation |
-| 7. Secret Detection | 10/10 | Would detect via script patterns |
-| **Total** | **67/70** | |
-
-## Detailed Observations
-
-### Strengths (Opus)
-- Excellent at following multi-step workflows
-- Proactively runs validation after creation
-- Provides rich context when filling handoff sections
-- Correctly interprets script output and adds context
-- Recognizes trigger phrases reliably
-
-### Areas Working Well
-- Script execution with correct arguments
-- Handoff chain detection and linking
-- Staleness interpretation and recommendations
-- Quality score interpretation
-
-### Potential Improvements Noted
-- Consider adding more explicit "substantial work" definition
-- Could benefit from auto-detecting when context is large
-
-## Test Environment
-
-```
-Location: /tmp/handoff-eval-project
-Git commits: 6
-Sample handoffs: 3 (fresh, stale, incomplete)
-```
-
-## Recommendations
-
-1. **For Haiku testing**: Use more explicit trigger phrases
-2. **For Sonnet testing**: Should work well with current instructions
-3. **Skill is production-ready** for Opus usage
-
----
-
-## How to Run Tests with Other Models
-
-1. Set up test environment:
-   ```bash
-   python /Users/galihcitta/.claude/skills/session-handoff/evals/setup_test_env.py
-   ```
-
-2. Start Claude Code with desired model:
-   ```bash
-   claude --model haiku  # or sonnet
-   ```
-
-3. Navigate to test project:
-   ```bash
-   cd /tmp/handoff-eval-project
-   ```
-
-4. Run scenarios from `test-scenarios.md`
-
-5. Record results using this template
diff --git a/skills/session-handoff/evals/setup_test_env.py b/skills/session-handoff/evals/setup_test_env.py
deleted file mode 100755
index 43b07d0..0000000
--- a/skills/session-handoff/evals/setup_test_env.py
+++ /dev/null
@@ -1,415 +0,0 @@
-#!/usr/bin/env python3
-"""
-Set up a test environment for evaluating the session-handoff skill.
-
-Creates a mock project with:
-- Git repository with commit history
-- Sample source files
-- Sample handoffs (fresh and stale)
-
-Usage:
-    python setup_test_env.py [--path /tmp/handoff-test]
-    python setup_test_env.py --clean  # Remove test environment
-"""
-
-import argparse
-import os
-import shutil
-import subprocess
-from datetime import datetime, timedelta
-from pathlib import Path
-
-
-DEFAULT_TEST_PATH = "/tmp/handoff-eval-project"
-
-
-def run_cmd(cmd: list[str], cwd: str = None) -> bool:
-    """Run a command and return success status."""
-    try:
-        subprocess.run(cmd, cwd=cwd, capture_output=True, check=True)
-        return True
-    except subprocess.CalledProcessError:
-        return False
-
-
-def create_test_project(base_path: str):
-    """Create a mock project structure."""
-    path = Path(base_path)
-
-    # Clean if exists
-    if path.exists():
-        shutil.rmtree(path)
-
-    # Create directories
-    (path / "src").mkdir(parents=True)
-    (path / "tests").mkdir()
-    (path / "config").mkdir()
-
-    # Create sample files
-    (path / "README.md").write_text("""# Test Project
-
-A sample project for testing the session-handoff skill.
-
-## Features
-- User authentication
-- API endpoints
-- Database integration
-""")
-
-    (path / "src" / "index.js").write_text("""// Main entry point
-const express = require('express');
-const app = express();
-
-app.get('/', (req, res) => {
-    res.send('Hello World');
-});
-
-module.exports = app;
-""")
-
-    (path / "src" / "auth.js").write_text("""// Authentication module
-const jwt = require('jsonwebtoken');
-
-function validateToken(token) {
-    // TODO: Implement token validation
-    return true;
-}
-
-function generateToken(user) {
-    return jwt.sign({ id: user.id }, process.env.JWT_SECRET);
-}
-
-module.exports = { validateToken, generateToken };
-""")
-
-    (path / "src" / "database.js").write_text("""// Database connection
-const mongoose = require('mongoose');
-
-async function connect() {
-    await mongoose.connect(process.env.DATABASE_URL);
-}
-
-module.exports = { connect };
-""")
-
-    (path / "tests" / "auth.test.js").write_text("""// Auth tests
-describe('Authentication', () => {
-    test('validates tokens', () => {
-        expect(true).toBe(true);
-    });
-});
-""")
-
-    (path / "config" / "default.json").write_text("""{
-    "port": 3000,
-    "database": {
-        "host": "localhost",
-        "name": "testdb"
-    }
-}
-""")
-
-    (path / "package.json").write_text("""{
-    "name": "test-project",
-    "version": "1.0.0",
-    "main": "src/index.js",
-    "scripts": {
-        "start": "node src/index.js",
-        "test": "jest"
-    }
-}
-""")
-
-    print(f"Created project structure at {path}")
-    return path
-
-
-def init_git_repo(path: Path):
-    """Initialize git repo with commit history."""
-    # Initialize
-    run_cmd(["git", "init"], cwd=str(path))
-    run_cmd(["git", "config", "user.email", "test@example.com"], cwd=str(path))
-    run_cmd(["git", "config", "user.name", "Test User"], cwd=str(path))
-
-    # Initial commit
-    run_cmd(["git", "add", "."], cwd=str(path))
-    run_cmd(["git", "commit", "-m", "Initial commit: project setup"], cwd=str(path))
-
-    # Add more commits to simulate history
-    commits = [
-        ("src/auth.js", "// Added validation logic\n", "Add token validation"),
-        ("src/database.js", "// Added connection pooling\n", "Implement connection pooling"),
-        ("tests/auth.test.js", "// More tests\n", "Add authentication tests"),
-        ("src/index.js", "// Added middleware\n", "Add auth middleware"),
-        ("README.md", "\n## API Docs\n", "Update documentation"),
-    ]
-
-    for file, content, message in commits:
-        file_path = path / file
-        with open(file_path, "a") as f:
-            f.write(content)
-        run_cmd(["git", "add", file], cwd=str(path))
-        run_cmd(["git", "commit", "-m", message], cwd=str(path))
-
-    print(f"Initialized git repo with {len(commits) + 1} commits")
-
-
-def create_sample_handoffs(path: Path):
-    """Create sample handoff documents for testing."""
-    handoffs_dir = path / ".claude" / "handoffs"
-    handoffs_dir.mkdir(parents=True)
-
-    # Fresh handoff (today)
-    now = datetime.now()
-    fresh_name = now.strftime("%Y-%m-%d-%H%M%S") + "-auth-implementation.md"
-    fresh_content = f"""# Handoff: Implementing User Authentication
-
-## Session Metadata
-- Created: {now.strftime("%Y-%m-%d %H:%M:%S")}
-- Project: {path}
-- Branch: main
-- Session duration: 2 hours
-
-## Handoff Chain
-
-- **Continues from**: None (fresh start)
-- **Supersedes**: None
-
-## Current State Summary
-
-Working on implementing JWT-based authentication for the API. Successfully added token generation and basic validation. The middleware integration is partially complete.
-
-## Codebase Understanding
-
-### Architecture Overview
-
-Express.js application with modular structure. Auth logic separated into src/auth.js, database connection in src/database.js.
-
-### Critical Files
-
-| File | Purpose | Relevance |
-|------|---------|-----------|
-| src/auth.js | Authentication logic | Main file being modified |
-| src/index.js | App entry point | Needs middleware integration |
-
-### Key Patterns Discovered
-
-- Using environment variables for secrets (JWT_SECRET, DATABASE_URL)
-- Jest for testing
-
-## Work Completed
-
-### Tasks Finished
-
-- [x] Set up JWT token generation
-- [x] Create basic validation function
-- [ ] Integrate middleware (in progress)
-
-### Files Modified
-
-| File | Changes | Rationale |
-|------|---------|-----------|
-| src/auth.js | Added validateToken, generateToken | Core auth functionality |
-
-### Decisions Made
-
-| Decision | Options Considered | Rationale |
-|----------|-------------------|-----------|
-| Use JWT over sessions | JWT, Sessions, OAuth | Stateless, scales better for API |
-
-## Pending Work
-
-### Immediate Next Steps
-
-1. Complete middleware integration in src/index.js
-2. Add refresh token logic
-3. Write comprehensive tests
-
-### Blockers/Open Questions
-
-- [ ] Need to decide on token expiry time (1h vs 24h)
-
-### Deferred Items
-
-- OAuth integration (future sprint)
-
-## Context for Resuming Agent
-
-### Important Context
-
-The validateToken function in src/auth.js currently returns true always - this is a placeholder that needs real implementation. The JWT_SECRET env var must be set.
-
-### Assumptions Made
-
-- Using HS256 algorithm for JWT
-- Tokens should be passed in Authorization header
-
-### Potential Gotchas
-
-- Don't forget to set JWT_SECRET environment variable
-- Database connection must be established before auth checks
-
-## Environment State
-
-### Tools/Services Used
-
-- Node.js with Express
-- JWT library (jsonwebtoken)
-
-### Active Processes
-
-- None currently running
-
-### Environment Variables
-
-- JWT_SECRET
-- DATABASE_URL
-
-## Related Resources
-
-- JWT documentation: https://jwt.io
-- Express middleware guide
-"""
-    (handoffs_dir / fresh_name).write_text(fresh_content)
-
-    # Stale handoff (2 weeks ago)
-    old_date = now - timedelta(days=14)
-    stale_name = old_date.strftime("%Y-%m-%d-%H%M%S") + "-database-setup.md"
-    stale_content = f"""# Handoff: Database Setup
-
-## Session Metadata
-- Created: {old_date.strftime("%Y-%m-%d %H:%M:%S")}
-- Project: {path}
-- Branch: main
-- Session duration: 1 hour
-
-## Handoff Chain
-
-- **Continues from**: None (fresh start)
-- **Supersedes**: None
-
-## Current State Summary
-
-Set up initial database connection with MongoDB. Basic schema defined but not fully implemented.
-
-## Codebase Understanding
-
-### Architecture Overview
-
-MongoDB database with Mongoose ODM.
-
-### Critical Files
-
-| File | Purpose | Relevance |
-|------|---------|-----------|
-| src/database.js | DB connection | Main database file |
-| src/old-file.js | Legacy code | Was being refactored |
-
-## Pending Work
-
-### Immediate Next Steps
-
-1. Define user schema
-2. Add connection pooling
-3. Implement error handling
-
-## Context for Resuming Agent
-
-### Important Context
-
-Using MongoDB Atlas for hosting. Connection string in DATABASE_URL.
-
-### Assumptions Made
-
-- MongoDB version 5.x
-- Mongoose 7.x
-
-## Environment State
-
-### Environment Variables
-
-- DATABASE_URL
-"""
-    (handoffs_dir / stale_name).write_text(stale_content)
-
-    # Incomplete handoff (with TODOs)
-    incomplete_name = now.strftime("%Y-%m-%d-%H%M%S") + "-incomplete-test.md"
-    incomplete_content = f"""# Handoff: [TASK_TITLE - replace this]
-
-## Session Metadata
-- Created: {now.strftime("%Y-%m-%d %H:%M:%S")}
-- Project: {path}
-- Branch: main
-- Session duration: [estimate how long you worked]
-
-## Current State Summary
-
-[TODO: Write one paragraph describing what was being worked on]
-
-## Codebase Understanding
-
-### Architecture Overview
-
-[TODO: Document key architectural insights]
-
-## Pending Work
-
-### Immediate Next Steps
-
-1. [TODO: Most critical next action]
-2. [TODO: Second priority]
-
-## Context for Resuming Agent
-
-### Important Context
-
-[TODO: This is the MOST IMPORTANT section]
-"""
-    (handoffs_dir / incomplete_name).write_text(incomplete_content)
-
-    print(f"Created 3 sample handoffs:")
-    print(f"  - {fresh_name} (fresh)")
-    print(f"  - {stale_name} (stale, 14 days old)")
-    print(f"  - {incomplete_name} (incomplete, has TODOs)")
-
-
-def clean_test_env(path: str):
-    """Remove test environment."""
-    if Path(path).exists():
-        shutil.rmtree(path)
-        print(f"Cleaned up test environment at {path}")
-    else:
-        print(f"No test environment found at {path}")
-
-
-def main():
-    parser = argparse.ArgumentParser(
-        description="Set up test environment for session-handoff skill"
-    )
-    parser.add_argument(
-        "--path",
-        default=DEFAULT_TEST_PATH,
-        help=f"Path for test project (default: {DEFAULT_TEST_PATH})"
-    )
-    parser.add_argument(
-        "--clean",
-        action="store_true",
-        help="Remove test environment instead of creating"
-    )
-
-    args = parser.parse_args()
-
-    if args.clean:
-        clean_test_env(args.path)
-    else:
-        path = create_test_project(args.path)
-        init_git_repo(path)
-        create_sample_handoffs(path)
-        print(f"\nTest environment ready at: {args.path}")
-        print(f"\nTo test, run:")
-        print(f"  cd {args.path}")
-        print(f"  # Then use Claude Code with the session-handoff skill")
-
-
-if __name__ == "__main__":
-    main()
diff --git a/skills/session-handoff/evals/test-scenarios.md b/skills/session-handoff/evals/test-scenarios.md
deleted file mode 100644
index 4a9f4e2..0000000
--- a/skills/session-handoff/evals/test-scenarios.md
+++ /dev/null
@@ -1,222 +0,0 @@
-# Evaluation Test Scenarios
-
-Test these scenarios across Haiku, Sonnet, and Opus to verify skill effectiveness.
-
-## Test Setup
-
-Before running tests:
-1. Create a test project directory with some files
-2. Initialize git repository
-3. Make some commits
-4. Have the skill available
-
-```bash
-# Setup test environment
-mkdir -p /tmp/handoff-test-project/src
-cd /tmp/handoff-test-project
-git init
-echo "console.log('hello');" > src/index.js
-echo "# Test Project" > README.md
-git add . && git commit -m "Initial commit"
-echo "function test() {}" >> src/index.js
-git add . && git commit -m "Add test function"
-```
-
----
-
-## Scenario 1: Basic Handoff Creation
-
-**Trigger phrase:** "create a handoff"
-
-**User prompt:**
-```
-I've been working on implementing user authentication. I modified src/auth.js
-to add JWT token validation and updated the middleware. Create a handoff so
-I can continue later.
-```
-
-**Expected behavior:**
-- [ ] Skill triggers (recognizes "create a handoff")
-- [ ] Runs `create_handoff.py` script
-- [ ] Creates file in `.claude/handoffs/`
-- [ ] Pre-fills metadata (timestamp, project path, git branch)
-- [ ] Prompts user to complete TODO sections
-- [ ] Mentions validation step
-
-**Model-specific expectations:**
-| Model | Expectation |
-|-------|-------------|
-| Haiku | Should follow script instructions literally, may need prompting for details |
-| Sonnet | Should execute workflow smoothly, fill in reasonable context |
-| Opus | Should proactively add rich context, may suggest improvements |
-
----
-
-## Scenario 2: Handoff with Chaining
-
-**Trigger phrase:** "continue from previous handoff"
-
-**Setup:** First create a handoff using Scenario 1, then:
-
-**User prompt:**
-```
-I'm continuing the auth work from yesterday. Create a new handoff that
-links to the previous one.
-```
-
-**Expected behavior:**
-- [ ] Lists existing handoffs
-- [ ] Uses `--continues-from` flag
-- [ ] Adds "Handoff Chain" section with link
-- [ ] References previous handoff context
-
----
-
-## Scenario 3: Resume from Handoff
-
-**Trigger phrase:** "resume from handoff" or "load handoff"
-
-**Setup:** Have an existing handoff file
-
-**User prompt:**
-```
-I want to continue where I left off. Load my last handoff.
-```
-
-**Expected behavior:**
-- [ ] Runs `list_handoffs.py` to find handoffs
-- [ ] Runs `check_staleness.py` on selected handoff
-- [ ] Reports staleness level
-- [ ] Reads handoff document
-- [ ] Summarizes "Immediate Next Steps"
-- [ ] Follows resume checklist
-
----
-
-## Scenario 4: Proactive Handoff Suggestion
-
-**Trigger:** After substantial work (no explicit request)
-
-**Setup:** Simulate a long session by describing significant work done
-
-**User prompt:**
-```
-Great, we've now:
-1. Refactored the database connection pooling
-2. Fixed the N+1 query in UserService
-3. Added caching layer with Redis
-4. Updated all the tests
-5. Fixed 3 TypeScript errors
-
-What's next?
-```
-
-**Expected behavior:**
-- [ ] Recognizes substantial work done (5+ items)
-- [ ] Proactively suggests creating a handoff
-- [ ] Uses the suggested phrasing from SKILL.md
-
----
-
-## Scenario 5: Validation Flow
-
-**Trigger phrase:** "validate the handoff"
-
-**Setup:** Create a handoff with incomplete sections
-
-**User prompt:**
-```
-I created a handoff but I'm not sure if it's complete. Can you validate it?
-```
-
-**Expected behavior:**
-- [ ] Runs `validate_handoff.py`
-- [ ] Reports quality score
-- [ ] Lists missing/incomplete sections
-- [ ] Warns about any secrets detected
-- [ ] Provides actionable next steps
-
----
-
-## Scenario 6: Staleness Check
-
-**Trigger phrase:** "check if handoff is still valid"
-
-**Setup:** Have an older handoff with several commits since
-
-**User prompt:**
-```
-I have a handoff from last week. Is it still relevant or should I
-create a new one?
-```
-
-**Expected behavior:**
-- [ ] Runs `check_staleness.py`
-- [ ] Reports staleness level (FRESH/SLIGHTLY_STALE/STALE/VERY_STALE)
-- [ ] Lists specific issues (days old, commits since, etc.)
-- [ ] Provides recommendation
-
----
-
-## Scenario 7: Secret Detection
-
-**Trigger:** During handoff creation with sensitive content
-
-**User prompt:**
-```
-Create a handoff. Note that I configured the API with key sk-abc123xyz
-and the database password is "supersecret".
-```
-
-**Expected behavior:**
-- [ ] Creates handoff
-- [ ] Runs validation
-- [ ] Detects potential secrets
-- [ ] Warns user about security risk
-- [ ] Recommends removing sensitive data
-
----
-
-## Scoring Rubric
-
-For each scenario, score:
-
-| Criterion | Points | Description |
-|-----------|--------|-------------|
-| Triggers correctly | 2 | Skill activates on trigger phrase |
-| Follows workflow | 3 | Executes steps in correct order |
-| Uses scripts | 2 | Runs appropriate Python scripts |
-| Output quality | 2 | Produces useful, accurate output |
-| Error handling | 1 | Handles edge cases gracefully |
-| **Total** | **10** | Per scenario |
-
-**Pass threshold:** 7/10 per scenario
-
----
-
-## Results Template
-
-```markdown
-## Test Results: [Model Name]
-
-Date: YYYY-MM-DD
-Model: [haiku/sonnet/opus]
-Skill version: session-handoff
-
-| Scenario | Score | Notes |
-|----------|-------|-------|
-| 1. Basic Creation | /10 | |
-| 2. Chaining | /10 | |
-| 3. Resume | /10 | |
-| 4. Proactive | /10 | |
-| 5. Validation | /10 | |
-| 6. Staleness | /10 | |
-| 7. Secret Detection | /10 | |
-| **Total** | /70 | |
-
-### Issues Found
--
-
-### Recommendations
--
-```
diff --git a/skills/ship-learn-next/SKILL.md b/skills/ship-learn-next/SKILL.md
index f8570fc..a4f0b81 100644
--- a/skills/ship-learn-next/SKILL.md
+++ b/skills/ship-learn-next/SKILL.md
@@ -1,328 +1,132 @@
 ---
 name: ship-learn-next
-description: Transform learning content (like YouTube transcripts, articles, tutorials) into actionable implementation plans using the Ship-Learn-Next framework. Use when user wants to turn advice, lessons, or educational content into concrete action steps, reps, or a learning quest.
+description: "Extract actionable learning reps from any educational content — YouTube transcripts, articles, tutorials, or course notes. Produces a three-part plan: what to Ship (build now to prove understanding), what to Learn (specific gaps to fill), and what to Do Next (next learning step). Use when you have consumed content and want concrete next actions, not a summary. Triggers: make this actionable, turn this into a plan, what should I build from this, learning plan, extract takeaways."
 allowed-tools:
   - Read
   - Write
 ---
 
-# Ship-Learn-Next Action Planner
+## Mindset
 
-This skill helps transform passive learning content into actionable **Ship-Learn-Next cycles** - turning advice and lessons into concrete, shippable iterations.
+**Reps reveal truth; plans reveal preferences.** A user's plan reflects what they wish were true. Their first rep reveals what is actually true. Design Rep 1 specifically to surface the gap.
 
-## When to Use This Skill
+**Scope is the enemy of rep 1.** Most people size their first rep at "learning the skill" rather than "producing one output." A rep that produces zero artifacts teaches zero. The practitioner question is: what is the smallest possible artifact that would falsify or confirm the advice?
 
-Activate when the user:
-- Has a transcript/article/tutorial and wants to "implement the advice"
-- Asks to "turn this into a plan" or "make this actionable"
-- Wants to extract implementation steps from educational content
-- Needs help breaking down big ideas into small, shippable reps
-- Says things like "I watched/read X, now what should I do?"
+**Content type determines extraction strategy.** YouTube speakers bury technique inside story; extract the behavior, not the anecdote. Articles front-load argument and back-load method; read the how-to sections first. Courses are 20% essential, 80% depth; identify the 20% before building the plan.
 
-## Core Framework: Ship-Learn-Next
+**Reflection without measurement drifts to narrative.** Users will say "it went okay" unless prompted for specific numbers (sent/opened/responded, shipped/unshipped, score/10). Build measurement into success criteria at design time, not after.
 
-Every learning quest follows three repeating phases:
+**The second rep is worth more than the first plan.** The plan is a hypothesis. Rep 2 is the first iteration with evidence. Don't over-architect reps 3-5 before the user has shipped once.
 
-1. **SHIP** - Create something real (code, content, product, demonstration)
-2. **LEARN** - Honest reflection on what happened
-3. **NEXT** - Plan the next iteration based on learnings
-
-**Key principle**: 100 reps beats 100 hours of study. Learning = doing better, not knowing more.
-
-## How This Skill Works
-
-### Step 1: Read the Content
-
-Read the file the user provides (transcript, article, notes):
-
-```bash
-# User provides path to file
-FILE_PATH="/path/to/content.txt"
-```
-
-Use the Read tool to analyze the content.
-
-### Step 2: Extract Core Lessons
-
-Identify from the content:
-- **Main advice/lessons**: What are the key takeaways?
-- **Actionable principles**: What can actually be practiced?
-- **Skills being taught**: What would someone learn by doing this?
-- **Examples/case studies**: Real implementations mentioned
-
-**Do NOT**:
-- Summarize everything (focus on actionable parts)
-- List theory without application
-- Include "nice to know" vs "need to practice"
-
-### Step 3: Define the Quest
-
-Help the user frame their learning goal:
-
-Ask:
-1. "Based on this content, what do you want to achieve in 4-8 weeks?"
-2. "What would success look like? (Be specific)"
-3. "What's something concrete you could build/create/ship?"
-
-**Example good quest**: "Ship 10 cold outreach messages and get 2 responses"
-**Example bad quest**: "Learn about sales" (too vague)
-
-### Step 4: Design Rep 1 (The First Iteration)
-
-Break down the quest into the **smallest shippable version**:
-
-Ask:
-- "What's the smallest version you could ship THIS WEEK?"
-- "What do you need to learn JUST to do that?" (not everything)
-- "What would 'done' look like for rep 1?"
-
-**Make it:**
-- Concrete and specific
-- Completable in 1-7 days
-- Produces real evidence/artifact
-- Small enough to not be intimidating
-- Big enough to learn something meaningful
-
-### Step 5: Create the Rep Plan
-
-Structure each rep with:
+---
 
-```markdown
-## Rep 1: [Specific Goal]
+## Navigation
 
-**Ship Goal**: [What you'll create/do]
-**Success Criteria**: [How you'll know it's done]
-**What You'll Learn**: [Specific skills/insights]
-**Resources Needed**: [Minimal - just what's needed for THIS rep]
-**Timeline**: [Specific deadline]
+**Use this skill when**:
+- User provides learning content (transcript, article, notes, course) and wants to act on it
+- User says "make this actionable", "turn this into a plan", "I watched/read X, now what?"
+- User wants to build a skill through practice, not study
+- User has a goal but hasn't started shipping toward it
 
-**Action Steps**:
-1. [Concrete step 1]
-2. [Concrete step 2]
-3. [Concrete step 3]
-...
+**Do NOT use this skill when**:
+- User wants a reading list or course recommendations (this is anti-pattern territory)
+- User wants a study schedule (redirect: study schedules are for exams, reps are for skills)
+- User is in mid-rep and needs help executing (use appropriate domain skill instead)
+- No learning content is provided — ask for it before proceeding
 
-**After Shipping - Reflection Questions**:
-- What actually happened? (Be specific)
-- What worked? What didn't?
-- What surprised you?
-- On a scale of 1-10, how did this rep go?
-- What would you do differently next time?
+**Ambiguous input decision tree**:
 ```
-
-### Step 6: Map Future Reps (2-5)
-
-Based on the content, suggest a progression:
-
-```markdown
-## Rep 2: [Next level]
-**Builds on**: What you learned in Rep 1
-**New challenge**: One new thing to try/improve
-**Expected difficulty**: [Easier/Same/Harder - and why]
-
-## Rep 3: [Continue progression]
-...
+User provides content → Read it, extract lessons, proceed
+User describes content without providing it → Ask for the file/text
+User has a goal but no content → Ask: "What advice/content are you drawing this from?"
+User has both content AND a vague goal → Clarify the goal before designing Rep 1
 ```
 
-**Progression principles**:
-- Each rep adds ONE new element
-- Increase difficulty based on success
-- Reference specific lessons from the content
-- Keep reps shippable (not theoretical)
-
-### Step 7: Connect to Content
-
-For each rep, reference the source material:
-
-- "This implements the [concept] from minute X"
-- "You're practicing the [technique] mentioned in the video"
-- "This tests the advice about [topic]"
-
-**But**: Always emphasize DOING over studying. Point to resources only when needed for the specific rep.
-
-## Conversation Style
-
-**Direct but supportive**:
-- No fluff, but encouraging
-- "Ship it, then we'll improve it"
-- "What's the smallest version you could do this week?"
-
-**Question-driven**:
-- Make them think, don't just tell
-- "What exactly do you want to achieve?" not "Here's what you should do"
-
-**Specific, not generic**:
-- "By Friday, ship one landing page" not "Learn web development"
-- Push for concrete commitments
-
-**Action-oriented**:
-- Always end with "what's next?"
-- Focus on the next rep, not the whole journey
-
-## What NOT to Do
-
-- ❌ Don't create a study plan (create a SHIP plan)
-- ❌ Don't list all resources to read/watch (pick minimal resources for current rep)
-- ❌ Don't make perfect the enemy of shipped
-- ❌ Don't let them plan forever without starting
-- ❌ Don't accept vague goals ("learn X" → "ship Y by Z date")
-- ❌ Don't overwhelm with the full journey (focus on rep 1)
-
-## Key Phrases to Use
-
-- "What's the smallest version you could ship this week?"
-- "What do you need to learn JUST to do that?"
-- "This isn't about perfection - it's rep 1 of 100"
-- "Ship something real, then we'll improve it"
-- "Based on [content], what would you actually DO differently?"
-- "Learning = doing better, not knowing more"
-
-## Example Output Structure
-
-```markdown
-# Your Ship-Learn-Next Quest: [Title]
-
-## Quest Overview
-**Goal**: [What they want to achieve in 4-8 weeks]
-**Source**: [The content that inspired this]
-**Core Lessons**: [3-5 key actionable takeaways from content]
-
 ---
 
-## Rep 1: [Specific, Shippable Goal]
-
-**Ship Goal**: [Concrete deliverable]
-**Timeline**: [This week / By [date]]
-**Success Criteria**:
-- [ ] [Specific thing 1]
-- [ ] [Specific thing 2]
-- [ ] [Specific thing 3]
-
-**What You'll Practice** (from the content):
-- [Skill/concept 1 from source material]
-- [Skill/concept 2 from source material]
+## Philosophy
 
-**Action Steps**:
-1. [Concrete step]
-2. [Concrete step]
-3. [Concrete step]
-4. Ship it (publish/deploy/share/demonstrate)
-
-**Minimal Resources** (only for this rep):
-- [Link or reference - if truly needed]
-
-**After Shipping - Reflection**:
-Answer these questions:
-- What actually happened?
-- What worked? What didn't?
-- What surprised you?
-- Rate this rep: _/10
-- What's one thing to try differently next time?
+The bottleneck is never content — it's reps. Every rep produces more usable learning than any amount of consumption, because the artifact surfaces what the advice actually requires vs. what the user imagined it required.
 
 ---
 
-## Rep 2: [Next Iteration]
-
-**Builds on**: Rep 1 + [what you learned]
-**New element**: [One new challenge/skill]
-**Ship goal**: [Next concrete deliverable]
+## NEVER
 
-[Similar structure...]
+- NEVER design Rep 1 that takes longer than 7 days — because urgency is the mechanism; longer reps let the user exit the loop before the feedback lands, and the plan dies.
+- NEVER make the quest goal a learning outcome ("understand cold email") — because learning outcomes have no artifact and cannot fail, making reflection impossible; goals must produce observable outputs.
+- NEVER list resources beyond what blocks Rep 1 — because every resource added is an invitation to consume instead of ship, which is the exact behavior this framework exists to break.
+- NEVER generate reps 3-5 with full detail before rep 1 ships — because detailed future reps are a comfort object that substitutes for the discomfort of starting; they also become wrong the moment rep 1 reveals real constraints.
+- NEVER accept "this week" as a deadline — because it's a non-commitment; always resolve to a specific day and optionally time, which surfaces whether the user has actually blocked time or is optimistically planning.
+- NEVER skip the reflection structure — because the LEARN phase is where the framework's compounding value lives; a rep without structured reflection is just a task, not a learning cycle.
+- NEVER let a quest contain more than ONE primary skill being practiced — because skill isolation is what makes reps measurable; multi-skill quests produce ambiguous feedback ("did I fail at writing or at research?").
 
 ---
 
-## Rep 3-5: Future Path
+## When Things Go Wrong
 
-**Rep 3**: [Brief description]
-**Rep 4**: [Brief description]
-**Rep 5**: [Brief description]
-
-*(Details will evolve based on what you learn in Reps 1-2)*
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| User returns after rep 1 with "it went fine, nothing to reflect on" | Reflection prompts were too open-ended; no measurement criterion was embedded | Ask for the specific number (emails sent, responses received, score/10, artifacts produced) — vague reports indicate the success criteria weren't concrete enough |
+| Rep 1 isn't shipped after 2+ weeks | Rep was sized too large, or user had an unacknowledged prerequisite (tool, permission, audience) | Diagnose: "What specifically stopped you?" then cut the rep in half or isolate the prerequisite as a mini-rep-0 |
+| Content has no actionable advice — it's all theory | Source is a thought leadership piece or overview, not a practitioner guide | Reframe: what does the user already do that this content comments on? Design a rep around their existing behavior, annotated by the content |
+| User wants to plan reps 6-20 before shipping rep 1 | Anxiety disguised as planning; planning feels safer than shipping | Name it directly: "Planning reps 6-20 is a way to avoid rep 1. What's stopping you from starting rep 1 today?" |
+| Quest goal shifts between reps | User is discovering a more interesting adjacent problem | Acceptable after rep 2+; before that, preserve the original goal so the first rep has a fair test |
 
 ---
 
-## Remember
-
-- This is about DOING, not studying
-- Aim for 100 reps over time (not perfection on rep 1)
-- Each rep = Plan → Do → Reflect → Next
-- You learn by shipping, not by consuming
+## Workflow
 
-**Ready to ship Rep 1?**
-```
+### 1. Ingest the content
+Read the file provided. If no file: ask for it. Do not proceed without source material.
 
-## Processing Different Content Types
+See [references/content-extraction.md](references/content-extraction.md) for extraction patterns by content type.
 
-### YouTube Transcripts
-- Focus on advice, not stories
-- Extract concrete techniques mentioned
-- Identify case studies/examples to replicate
-- Note timestamps for reference later (but don't require watching again)
+### 2. Extract lessons (actionable only)
 
-### Articles/Tutorials
-- Identify the "now do this" parts vs theory
-- Extract the specific workflow/process
-- Find the minimal example to start with
+Three-filter test for each lesson:
+1. **Behavior test**: Is this something the user must *do differently*, not just *know*?
+2. **Rep test**: Can this be practiced in a 1-7 day rep?
+3. **Artifact test**: Does practicing it produce something observable?
 
-### Course Notes
-- What's the smallest project from the course?
-- Which modules are needed for rep 1? (ignore the rest for now)
-- What can be practiced immediately?
+Drop anything that fails filter 1. Downgrade to "resources" anything that fails filters 2-3.
 
-## Success Metrics
+### 3. Define the quest
 
-A good Ship-Learn-Next plan has:
-- ✅ Specific, shippable rep 1 (completable in 1-7 days)
-- ✅ Clear success criteria (user knows when they're done)
-- ✅ Concrete artifacts (something real to show)
-- ✅ Direct connection to source content
-- ✅ Progression path for reps 2-5
-- ✅ Emphasis on action over consumption
-- ✅ Honest reflection built in
-- ✅ Small enough to start today, big enough to learn
+Ask the user: "In 4-8 weeks, what would you have **shipped** that would make this content feel worth your time?"
 
-## Saving the Plan
+Push until you get: a specific artifact + a number + a timeframe.
+- Bad: "Get better at sales"
+- Good: "Book 3 discovery calls from cold outreach in 6 weeks"
 
-**IMPORTANT**: Always save the plan to a file for the user.
+### 4. Size Rep 1
 
-### Filename Convention
+Rep 1 must:
+- Produce ONE artifact (not a list of tasks)
+- Ship in 1-7 days
+- Require only what the user already has access to
+- Be embarrassingly smaller than the quest goal
 
-Always use the format:
-- `Ship-Learn-Next Plan - [Brief Quest Title].md`
+Ask: "What's the smallest version of this that would still teach you something real?"
 
-Examples:
-- `Ship-Learn-Next Plan - Build in Proven Markets.md`
-- `Ship-Learn-Next Plan - Learn React.md`
-- `Ship-Learn-Next Plan - Cold Email Outreach.md`
+Use the domain sizing table in [references/rep-templates.md](references/rep-templates.md) if the user's domain is listed.
 
-**Quest title should be**:
-- Brief (3-6 words)
-- Descriptive of the main goal
-- Based on the content's core lesson/theme
+### 5. Build the plan
 
-### What to Save
+Use templates from [references/rep-templates.md](references/rep-templates.md).
 
-**Complete plan including**:
-- Quest overview with goal and source
-- All reps (1-5) with full details
-- Action steps and reflection questions
-- Timeline commitments
-- Reference to source material
+Resolve every deadline to a specific day. Replace "this week" with "by [day]".
 
-**Format**: Always save as Markdown (`.md`) for readability
+### 6. Save the plan
 
-## After Creating the Plan
+Filename: `Ship-Learn-Next Plan - [Quest Title].md`
+Quest title: 3-6 words, outcome-focused (not topic-focused).
 
-**Display to user**:
-1. Show them you've saved the plan: "✓ Saved to: [filename]"
-2. Give a brief overview of the quest
-3. Highlight Rep 1 (what's due this week)
+Always save before presenting. Then show the user Rep 1 only. Ask: "What day will you ship Rep 1?"
 
-**Then ask**:
-1. "When will you ship Rep 1?"
-2. "What's the one thing that might stop you? How will you handle it?"
-3. "Come back after you ship and we'll reflect + plan Rep 2"
+### 7. Close
 
-**Remember**: You're not creating a curriculum. You're helping them ship something real, learn from it, and ship the next thing.
+After showing Rep 1, ask exactly two questions:
+1. "What day will you ship Rep 1?"
+2. "What's the one thing most likely to stop you — and how will you handle it?"
 
-Let's help them ship.
+Do not ask for a commitment on the full quest. Commitment on rep 1 is sufficient.
diff --git a/skills/ship-learn-next/references/content-extraction.md b/skills/ship-learn-next/references/content-extraction.md
new file mode 100644
index 0000000..5587a80
--- /dev/null
+++ b/skills/ship-learn-next/references/content-extraction.md
@@ -0,0 +1,60 @@
+# Content Extraction by Source Type
+
+## Extraction Decision Tree
+
+```
+Is the lesson a behavior change or a knowledge fact?
+  → Behavior change: KEEP (it's actionable)
+  → Knowledge fact only: DROP (user can't practice a fact)
+
+Does the advice require reps to internalize?
+  → Yes: Include in plan
+  → No (one-time setup): Put in Resources section only
+
+Can the user produce an artifact from this advice within 7 days?
+  → Yes: Design a rep around it
+  → No: Break it down further until they can
+```
+
+## YouTube Transcripts
+
+**Extract**:
+- Specific techniques with named steps (not vague principles)
+- Case studies/examples the speaker used — these become Rep 1 templates
+- Timestamps only when the user will need to re-reference mid-rep
+
+**Drop**:
+- Motivational framing, backstory, intros
+- Theory without a corresponding practice
+- Advice that requires tools/resources not yet available
+
+**Trap**: Speakers often bury the actual technique inside a story. Extract the behavior, not the anecdote.
+
+## Articles / Blog Posts
+
+**Extract**:
+- The "how to" sections verbatim — these map to action steps
+- Any stated metrics ("they sent 50 emails and got 3 responses") — these calibrate Rep 1 success criteria
+- The author's specific sequence, not just their conclusion
+
+**Drop**:
+- Statistical claims without a corresponding action
+- Comparisons to other methods (irrelevant until Rep 3+)
+
+## Course Notes
+
+**Extract**:
+- The smallest project from the course
+- Which modules enable Rep 1 (ignore the rest until needed)
+- Any exercises or drills — these are pre-designed reps
+
+**Pattern**: Most courses are structured as 20% essential + 80% depth. Identify the 20% for Rep 1.
+
+## Books / Long-Form
+
+**Extract**:
+- The author's core process (often in one chapter)
+- Any frameworks with named steps
+- Specific examples you can replicate
+
+**Trap**: Books optimized for sales often front-load motivation and back-load the actual technique. Scan the index for "how to" chapters first.
diff --git a/skills/ship-learn-next/references/rep-templates.md b/skills/ship-learn-next/references/rep-templates.md
new file mode 100644
index 0000000..fec082d
--- /dev/null
+++ b/skills/ship-learn-next/references/rep-templates.md
@@ -0,0 +1,86 @@
+# Rep Templates & Output Structure
+
+## Rep Plan Template
+
+```markdown
+## Rep [N]: [Specific, Shippable Goal]
+
+**Ship Goal**: [Concrete deliverable — a URL, file, sent message, published post]
+**Timeline**: By [specific date, not "this week"]
+**Success Criteria**:
+- [ ] [Observable artifact exists]
+- [ ] [Delivery action taken — published/sent/shared]
+- [ ] [One measurement recorded]
+
+**What You'll Practice** (from the content):
+- [Skill/concept 1 mapped to source material]
+- [Skill/concept 2 mapped to source material]
+
+**Action Steps**:
+1. [First concrete action — verb + noun + time estimate]
+2. [Second concrete action]
+3. [Third concrete action]
+4. SHIP: [Specific delivery action — deploy/send/publish/share]
+
+**Minimal Resources** (only what's needed for THIS rep):
+- [Reference only if genuinely blocking — otherwise omit]
+
+**After Shipping — Reflection**:
+- What actually happened? (facts, not feelings)
+- What worked? What didn't?
+- What surprised you?
+- Rep score: _/10
+- One thing to try differently in Rep 2:
+```
+
+## Rep Progression Structure
+
+```markdown
+## Rep 2: [Next Iteration]
+**Builds on**: [Specific insight from Rep 1 reflection]
+**New element**: [ONE new variable introduced]
+**Ship goal**: [Next concrete deliverable]
+
+## Rep 3–5: Future Path
+*(Outline only — details evolve from actual Rep 1-2 learnings)*
+**Rep 3**: [Brief description]
+**Rep 4**: [Brief description]
+**Rep 5**: [Brief description]
+```
+
+## Quest Overview Template
+
+```markdown
+# Ship-Learn-Next Quest: [3-6 word title]
+
+**Goal**: [What you want to achieve — measurable, 4-8 weeks]
+**Source**: [Content that generated this plan]
+**Core Lessons** (actionable only):
+1. [Lesson → How to practice it]
+2. [Lesson → How to practice it]
+3. [Lesson → How to practice it]
+```
+
+## File Naming Convention
+
+`Ship-Learn-Next Plan - [Brief Quest Title].md`
+
+Examples:
+- `Ship-Learn-Next Plan - Cold Email 10 Responses.md`
+- `Ship-Learn-Next Plan - React Portfolio Launch.md`
+- `Ship-Learn-Next Plan - Proven Market Validation.md`
+
+Quest title: 3-6 words, action-oriented, outcome-focused (not topic-focused).
+
+## Rep Sizing Reference
+
+| Domain | Rep 1 Target | Typical Duration |
+|--------|-------------|-----------------|
+| Writing/content | 1 published piece | 2-3 days |
+| Cold outreach | 5-10 sent messages | 1-2 days |
+| Code/software | Deployed hello-world version | 3-5 days |
+| Sales/bizdev | 1 real conversation | 1-3 days |
+| Creative (design, audio) | 1 finished artifact shared | 2-4 days |
+| Teaching/coaching | 1 session delivered | 1 week |
+
+**Calibration rule**: If Rep 1 takes longer than 7 days, it's too big. Cut scope, not quality.
diff --git a/skills/skill-creator/SKILL.md b/skills/skill-creator/SKILL.md
new file mode 100644
index 0000000..1f3dbd5
--- /dev/null
+++ b/skills/skill-creator/SKILL.md
@@ -0,0 +1,158 @@
+---
+name: skill-creator
+description: Create high-quality SKILL.md files that pass skill-judge evaluation. Use when authoring a new skill from scratch, deciding what expert knowledge to encode, structuring SKILL.md sections (Mindset/Navigation/Philosophy/NEVER/When-Things-Go-Wrong), writing descriptions that trigger activation, calibrating knowledge delta, or diagnosing why a draft skill scores poorly. Triggers on phrases like "create a new skill", "write a SKILL.md", "help me author a skill", "what should go in my skill", "how do I write a good skill".
+---
+
+# Skill Creator
+
+Build skills that pass production — not just skills that look good on paper.
+
+---
+
+## Mindset
+
+1. **Description first, body second — always.** The description is the only thing Claude reads before deciding to activate a skill. A perfect body with a vague description is an invisible skill. Start every skill session by nailing the description; everything else is secondary.
+2. **You are writing for Claude, not for a human.** Humans need context, background, definitions. Claude does not. Every line should be a decision framework, a non-obvious heuristic, or a practitioner-only edge case — not an explanation of what the domain is.
+3. **Knowledge delta is the only currency.** Ask for every paragraph: "Does Claude already know this?" If yes, delete it. Claude's training covers documentation, tutorials, and best practices. What it lacks is the earned judgment that comes from repeated failure in a specific domain.
+4. **The NEVER list is half the skill.** Expert knowledge is as much about what to avoid as what to do. A skill without a NEVER section is missing 50% of its value. Each NEVER item must name the specific action AND its non-obvious consequence — not a warning, a fact.
+5. **Progressive disclosure is not optional.** SKILL.md under 500 lines, references/ for depth. But a reference file with no loading trigger in SKILL.md is dead content. Wire it explicitly or don't create it.
+
+---
+
+## Navigation
+
+**Use this skill when**:
+- Authoring a SKILL.md file from scratch
+- Identifying what expert knowledge is worth encoding vs. what Claude already knows
+- Deciding what goes in SKILL.md body vs. references/
+- Writing a description that reliably triggers activation
+- Diagnosing a draft skill that scores poorly on skill-judge
+
+**Do NOT use this skill when**:
+- Evaluating an existing skill for quality (use skill-judge)
+- Refactoring bloated plugin instruction files (use agent-md-refactor)
+- Wiring a skill into a plugin manifest (use plugin-forge)
+
+**Decision tree — what to do first**:
+
+| User starts with... | First action |
+|---------------------|-------------|
+| "I want to create a skill about X" | Ask: what does a practitioner know about X that took years to learn? Start there. |
+| Existing notes / docs / spec to convert | Run knowledge delta scan — mark [E] expert, [A] activation-ok, [R] redundant — before writing a single line of SKILL.md |
+| A draft SKILL.md that scored below B on skill-judge | Load [`references/antipatterns.md`](references/antipatterns.md) — identify which of the three failure patterns the draft matches |
+| A skill that never activates despite good content | Description problem — rewrite description before touching the body |
+
+---
+
+## Philosophy
+
+A skill is a compressed expert brain, not a tutorial. The moment you find yourself explaining what something is, stop — that belongs in documentation, and Claude already read the documentation. The question is never "is this correct?" It is always "would a domain expert say this captures knowledge that took years to earn?"
+
+The test: would someone who has done this work for five years, reading your SKILL.md, say "yes — that captures the hard-won judgment I couldn't find anywhere else"? If the answer is "it's good but I could have read this from the docs", the skill fails.
+
+---
+
+## NEVER
+
+- **NEVER start writing SKILL.md content before the description field is finalized** — the description drives skill activation; if it doesn't reliably trigger on the right phrases, no amount of body quality saves it. A skill that never activates has zero value regardless of its score.
+- **NEVER include a README.md with content that duplicates SKILL.md** — the agent reads SKILL.md exclusively; README.md is for humans browsing a repo. Duplicate content means you'll maintain two versions that diverge, and the agent always gets the one you last edited.
+- **NEVER add a references/ file without a MANDATORY loading trigger in SKILL.md** — files in references/ are only loaded when SKILL.md explicitly tells Claude to load them via a decision tree or workflow step. A list of references at the bottom without "load X when Y" instructions is Pattern 3 (Orphan References) — the files are never read.
+- **NEVER write a NEVER item without a specific non-obvious consequence** — "NEVER do X" with no reason is useless because Claude doesn't know why to avoid it and can't weigh it against competing considerations. The consequence must be non-obvious: something that a reasonable person would not infer without domain experience.
+- **NEVER create a skill by wrapping official documentation** — Claude's training already covers official docs, README files, and tutorials. Wrapping them adds tokens without adding knowledge delta. Provide practitioner judgment: what goes wrong, what edge cases kill you, what the docs don't say.
+- **NEVER write a description without WHAT, WHEN, and trigger KEYWORDS** — descriptions that describe but don't trigger fail in production. The WHEN clause ("Use when...") is mandatory. Domain-specific trigger phrases ("incidents", "playbooks", "deploy", etc.) must appear verbatim so the routing matches natural user phrasing.
+- **NEVER mistake length for quality** — a 50-line skill with pure expert knowledge outperforms a 400-line skill with 70% redundancy. Long skills pay a context cost on every activation. Every line must earn its tokens.
+
+---
+
+## Workflow: Building a Skill from Scratch
+
+### Step 1 — Identify the knowledge worth encoding
+
+Before writing anything, answer these questions:
+- What does an expert in this domain know that took years (not hours) to learn?
+- What are the common mistakes practitioners make that aren't obvious from documentation?
+- What decision frameworks do experts use that they couldn't easily articulate if asked?
+- What are the top 5 things an expert would never do, and WHY (specifically)?
+
+If you can't answer these, the skill isn't ready to write yet.
+
+### Step 2 — Write the description (before anything else)
+
+A valid description answers all three:
+- **WHAT**: what does this skill help with?
+- **WHEN**: what situations trigger its use? Include a "Use when..." clause.
+- **KEYWORDS**: what domain-specific terms would a user naturally say?
+
+Template:
+```
+[Action verbs covering the core function]. Use when [specific trigger scenario 1], [trigger scenario 2], or [trigger scenario 3]. Triggers on phrases like "[natural phrase 1]", "[natural phrase 2]", "[domain term]".
+```
+
+Test: read your description and ask "would someone asking '[trigger phrase]' get routed here?" If not, rewrite.
+
+### Step 3 — Knowledge delta scan on your raw material
+
+If converting notes, docs, or a draft:
+- Mark each paragraph **[E] Expert** — Claude genuinely doesn't know this
+- Mark each paragraph **[A] Activation** — Claude knows but brief reminder is useful
+- Mark each paragraph **[R] Redundant** — Claude definitely knows this
+
+Target ratio: >70% E, <20% A, <10% R. Delete all R content before writing SKILL.md.
+
+### Step 4 — Structure the body
+
+Use the canonical pattern: **Mindset → Navigation → Philosophy → NEVER → Workflow/Process → When Things Go Wrong**
+
+| Section | What it contains | Common mistake |
+|---------|-----------------|----------------|
+| Mindset | 4-6 numbered principles that transfer expert thinking patterns — not facts, not procedures | Making it a list of tips instead of a compressed worldview |
+| Navigation | Use/Don't Use cases + decision tree for ambiguous inputs | Forgetting the "Do NOT use when" clause — causes cross-skill routing errors |
+| Philosophy | 2-4 sentences: the organizing insight that makes the other sections coherent | Writing a mission statement instead of a practitioner truth |
+| NEVER | 5-7 items: specific action + specific non-obvious consequence | Writing 3 obvious warnings instead of 5 practitioner-only failure modes |
+| Workflow/Process | Step-by-step with decision points — use tables and checklists | Prose descriptions instead of structured steps |
+| When Things Go Wrong | Table: Symptom → Likely Cause → Fix | Missing — this is the section most likely to be omitted |
+
+### Step 5 — Wire references/ correctly
+
+If deep reference material exists or is needed:
+1. Create `references/<filename>.md`
+2. In SKILL.md, add an explicit conditional: "Load `references/<filename>.md` when [specific scenario]"
+3. The trigger must appear in a decision tree or workflow step — not in a "see also" footer
+
+### Step 6 — Run the self-test
+
+Before submitting for skill-judge review, ask:
+- Does the description include WHAT + WHEN + keywords?
+- Is SKILL.md under 500 lines?
+- Does every NEVER item state the specific non-obvious consequence?
+- Are all references/ files wired to loading triggers (not just listed)?
+- Does the Mindset section transfer thinking patterns, not just facts?
+- Would a domain expert say "yes, this captures what took me years to learn"?
+
+Scoring this yourself against skill-judge's 8 dimensions before review catches 80% of issues.
+
+---
+
+## The Three Failure Patterns to Avoid
+
+Load [`references/antipatterns.md`](references/antipatterns.md) for full detail on these patterns. Summary:
+
+| Pattern | What it looks like | Why it fails |
+|---------|-------------------|-------------|
+| Tutorial Pattern | SKILL.md explains what the domain is, walks through basics, defines terms | Everything in it is in Claude's training; zero knowledge delta; skills scores F on D1 |
+| Dump Pattern | SKILL.md is every note, link, checklist, and thought you ever had about the topic | 500+ lines, <30% expert content; context cost is massive; no compression of wisdom |
+| Orphan References Pattern | references/ directory exists with detailed files, nothing in SKILL.md points to them conditionally | Reference files are never loaded; effectively invisible; wastes disk and misleads authors |
+
+---
+
+## When Things Go Wrong
+
+| Problem | Likely Cause | Fix |
+|---------|-------------|-----|
+| Skill never activates even though content is relevant | Description missing WHEN clause or domain trigger keywords | Rewrite description: add "Use when...", add 2-3 verbatim trigger phrases users would say |
+| Skill activates for the wrong tasks | Description is too broad — WHAT clause describes too many scenarios | Add "Do NOT use when..." clause to description; tighten the WHEN clause |
+| Skill scored below C on skill-judge D1 (Knowledge Delta) | Body contains basics Claude knows — tutorials, definitions, documentation rewrites | Run knowledge delta scan (Step 3); delete all [R] content; replace with practitioner edge cases |
+| NEVER items scored 1-3/15 by skill-judge D3 | NEVER items are vague warnings without specific consequences | Rewrite each: "NEVER [specific action] — [specific non-obvious consequence that only practitioners know]" |
+| References/ files exist but are never loaded | No conditional loading triggers in SKILL.md body | Add decision tree rows or workflow steps that say exactly when to load each file |
+| Skill looks complete but domain expert says "this is too basic" | Knowledge delta not sourced from actual practitioner experience | Interview a practitioner or draw on first-hand failure modes; replace textbook knowledge with scar tissue |
+| Draft is over 500 lines | Mixed SKILL.md + reference content into one file | Move deep reference material to references/ files with loading triggers; keep SKILL.md to decision frameworks only |
diff --git a/skills/skill-creator/references/antipatterns.md b/skills/skill-creator/references/antipatterns.md
new file mode 100644
index 0000000..282d0e6
--- /dev/null
+++ b/skills/skill-creator/references/antipatterns.md
@@ -0,0 +1,104 @@
+# Skill Anti-Patterns — Detailed Reference
+
+Load this file when: a draft skill matches one of the three patterns below, or when skill-judge scores D1 below 12/20.
+
+---
+
+## Pattern 1: The Tutorial Pattern
+
+**What it looks like**
+
+SKILL.md opens with "What is X?", defines terms, walks through a beginner workflow, and closes with "Next steps." It reads like a blog post or onboarding doc.
+
+**Why it fails**
+
+Claude's training includes most published tutorials, official documentation, and introductory guides for every major domain. A skill that recaps this content provides zero knowledge delta. On skill-judge D1, it will score 4-8/20. On D2, it will score 4-6/15 because it transfers no thinking patterns, only facts Claude already has.
+
+**Diagnostic test**
+
+Search your SKILL.md for these phrases:
+- "X is a..."
+- "To get started..."
+- "The following steps..."
+- "For more information, see..."
+- Any sentence that defines a term or concept
+
+If you find more than 2-3 of these, you have the Tutorial Pattern.
+
+**Fix**
+
+Delete every definition and introductory paragraph. Replace them with:
+- The decision you make that beginners get wrong
+- The edge case the documentation doesn't cover
+- The failure mode that only surfaces in production
+
+Example replacement:
+- Instead of: "A playbook is a structured response procedure for security incidents."
+- Write: "Playbooks fail at the branch points — the places where the procedure says 'if X, do Y' but doesn't say how to determine whether X is true. Every NEVER item should be about a branch point failure, not a general warning."
+
+---
+
+## Pattern 2: The Dump Pattern
+
+**What it looks like**
+
+SKILL.md contains every note, link, checklist, decision tree, script snippet, and passing thought the author ever had about the domain. It is 600-900 lines. Sections are comprehensive. Nothing is missing. Nothing is compressed.
+
+**Why it fails**
+
+Every skill activation loads the full SKILL.md into context. A 700-line skill costs 700 lines of context on every trigger — whether the user needed 50 lines or all 700. The cognitive compression that makes expert knowledge valuable is absent: you have the raw material but not the insight. On skill-judge, it will score 4-8/20 on D1 (volume without delta), 4-8/15 on D5 (progressive disclosure failure), and likely fail D7 (pattern recognition) if sections aren't structured.
+
+**Diagnostic test**
+
+- Line count over 500? Likely Dump Pattern.
+- Does every section have roughly equal length? Dump Pattern.
+- Are there 3+ tables of reference data that never change? Dump Pattern.
+- Could you cut 40% of the content and lose nothing an expert would miss? Dump Pattern.
+
+**Fix**
+
+1. Apply the knowledge delta scan: mark [E], [A], [R] on every paragraph.
+2. Delete all [R] content.
+3. For [A] content: cut by 60%, keep only the trigger/reminder phrases.
+4. For [E] content that is deep reference material (tables, schemas, examples): move to `references/` files with loading triggers.
+5. What remains in SKILL.md should be decision frameworks and heuristics only.
+
+---
+
+## Pattern 3: The Orphan References Pattern
+
+**What it looks like**
+
+The skill has a `references/` directory with 3-5 well-written, detailed files. SKILL.md ends with a "Reference Docs" section listing them. When Claude activates the skill, it reads SKILL.md, sees the list, and does not load the files — because there is no conditional trigger telling it to.
+
+**Why it fails**
+
+Claude does not speculatively load files referenced in a list. It loads files only when explicitly instructed by a conditional in a decision tree or workflow step. A references/ directory without loading triggers is dead content — it wastes disk space, misleads the author into thinking depth is covered, and results in a skill that appears complete but behaves as if the reference files don't exist.
+
+**Diagnostic test**
+
+Look at every reference to a `references/` file in SKILL.md:
+- Is it in a decision tree row with a conditional ("Load X when Y")?
+- Is it in a workflow step with a conditional ("If scenario Z, load references/X.md")?
+
+If it is only in a list, table, or footer without a conditional trigger, it is an orphaned reference.
+
+**Fix**
+
+For each reference file, identify the specific scenario that warrants loading it:
+- "Load `references/scoring-rubric.md` when detailed rubric for any single dimension is needed"
+- "Load `references/failure-patterns.md` when diagnosing a structural failure"
+
+Add these as rows in a decision tree or as conditional steps in a workflow. If you cannot identify a specific scenario, delete the reference file — it is not needed.
+
+---
+
+## Scoring Impact of Each Pattern
+
+| Pattern | D1 Impact | D5 Impact | Other |
+|---------|-----------|-----------|-------|
+| Tutorial | -8 to -12 | none | D2 -6, D8 -4 |
+| Dump | -8 to -12 | -8 to -12 | D7 -4 |
+| Orphan References | none | -8 to -12 | D8 -4 |
+
+A skill with all three patterns simultaneously will score below 60/120 (F grade) regardless of how professional it looks.
diff --git a/skills/skill-judge/README.md b/skills/skill-judge/README.md
deleted file mode 100644
index 9e66a6e..0000000
--- a/skills/skill-judge/README.md
+++ /dev/null
@@ -1,245 +0,0 @@
-# Skill Judge
-
-A comprehensive evaluation framework for assessing Agent Skill quality against official specifications and best practices. This skill provides multi-dimensional scoring and actionable improvement suggestions for SKILL.md files and skill packages.
-
-## Purpose
-
-Skill Judge exists to solve a critical problem: **most Skills waste tokens on knowledge Claude already has**.
-
-The skill helps you evaluate whether a Skill actually adds value by measuring its "knowledge delta" - the gap between what the Skill provides and what Claude already knows. A good Skill should be a compressed expert brain, not a tutorial.
-
-### The Core Formula
-
-> **Good Skill = Expert-only Knowledge - What Claude Already Knows**
-
-This skill helps you identify:
-- Token-wasting redundant content (things Claude already knows)
-- Genuine expert knowledge that adds value
-- Structural issues that prevent Skills from being activated or used effectively
-
-## When to Use
-
-Use Skill Judge when you need to:
-
-- **Review a Skill before publishing**: Evaluate quality and identify improvements
-- **Audit existing Skills**: Systematic assessment against best practices
-- **Improve a SKILL.md file**: Get specific, actionable suggestions
-- **Learn Skill design patterns**: Understand what makes a great Skill
-- **Compare Skills**: Assess relative quality using consistent criteria
-
-**Trigger phrases**:
-- "Evaluate this skill"
-- "Review my SKILL.md"
-- "Audit this skill"
-- "Score this skill"
-- "How can I improve this skill?"
-- "Is this skill well-designed?"
-
-## How It Works
-
-### Evaluation Protocol
-
-1. **First Pass - Knowledge Delta Scan**: Read the SKILL.md and categorize each section as:
-   - **[E] Expert**: Claude genuinely doesn't know this (value-add)
-   - **[A] Activation**: Claude knows but brief reminder is useful (acceptable)
-   - **[R] Redundant**: Claude definitely knows this (should delete)
-
-2. **Structure Analysis**: Check frontmatter validity, line count, reference files, design pattern, and loading triggers
-
-3. **Score Each Dimension**: Evaluate against 8 dimensions with specific evidence and justifications
-
-4. **Calculate Total and Grade**: Sum scores (max 120 points) and assign grade
-
-5. **Generate Report**: Produce structured report with scores, critical issues, and improvements
-
-### The 8 Evaluation Dimensions (120 points total)
-
-| Dimension | Max Points | What It Measures |
-|-----------|------------|------------------|
-| **D1: Knowledge Delta** | 20 | Does the Skill add genuine expert knowledge? (THE CORE DIMENSION) |
-| **D2: Mindset + Procedures** | 15 | Does it transfer expert thinking patterns and domain-specific workflows? |
-| **D3: Anti-Pattern Quality** | 15 | Does it have effective NEVER lists with specific reasons? |
-| **D4: Specification Compliance** | 15 | Is the frontmatter valid? Is the description comprehensive? |
-| **D5: Progressive Disclosure** | 15 | Is content properly layered for on-demand loading? |
-| **D6: Freedom Calibration** | 15 | Is specificity appropriate for task fragility? |
-| **D7: Pattern Recognition** | 10 | Does it follow an established official pattern? |
-| **D8: Practical Usability** | 15 | Can an Agent actually use this Skill effectively? |
-
-### Grading Scale
-
-| Grade | Percentage | Meaning |
-|-------|------------|---------|
-| A | 90%+ (108+) | Excellent - production-ready expert Skill |
-| B | 80-89% (96-107) | Good - minor improvements needed |
-| C | 70-79% (84-95) | Adequate - clear improvement path |
-| D | 60-69% (72-83) | Below Average - significant issues |
-| F | <60% (<72) | Poor - needs fundamental redesign |
-
-## Key Features
-
-### Knowledge Classification System
-
-The skill teaches you to recognize three types of content:
-
-| Type | Definition | Treatment |
-|------|------------|-----------|
-| **Expert** | Claude genuinely doesn't know this | Must keep - this is the Skill's value |
-| **Activation** | Claude knows but may not think of | Keep if brief - serves as reminder |
-| **Redundant** | Claude definitely knows this | Should delete - wastes tokens |
-
-### Five Official Design Patterns
-
-Skill Judge identifies and evaluates against five established patterns:
-
-| Pattern | Lines | Best For | Example |
-|---------|-------|----------|---------|
-| **Mindset** | ~50 | Creative tasks requiring taste | frontend-design |
-| **Navigation** | ~30 | Multiple distinct scenarios | internal-comms |
-| **Philosophy** | ~150 | Art/creation requiring originality | canvas-design |
-| **Process** | ~200 | Complex multi-step projects | mcp-builder |
-| **Tool** | ~300 | Precise operations on specific formats | docx, pdf, xlsx |
-
-### Common Failure Pattern Detection
-
-The skill identifies 9 common failure patterns:
-
-1. **The Tutorial**: Explains basics Claude already knows
-2. **The Dump**: Everything in one 800+ line file
-3. **The Orphan References**: Reference files that never get loaded
-4. **The Checkbox Procedure**: Mechanical steps without thinking frameworks
-5. **The Vague Warning**: "Be careful" without specific guidance
-6. **The Invisible Skill**: Great content but poor description
-7. **The Wrong Location**: Trigger info in body instead of description
-8. **The Over-Engineered**: Unnecessary auxiliary files
-9. **The Freedom Mismatch**: Wrong freedom level for task type
-
-## Usage Examples
-
-### Basic Evaluation
-
-```
-Evaluate the skill at skills/my-new-skill/SKILL.md
-```
-
-### Comparative Analysis
-
-```
-Compare the quality of skills/skill-a and skills/skill-b
-```
-
-### Targeted Improvement
-
-```
-How can I improve the knowledge delta in my skill?
-```
-
-### Pattern Identification
-
-```
-What pattern does this skill follow, and is it the right choice?
-```
-
-## Output
-
-Skill Judge produces a structured evaluation report:
-
-```markdown
-# Skill Evaluation Report: [Skill Name]
-
-## Summary
-- **Total Score**: X/120 (X%)
-- **Grade**: [A/B/C/D/F]
-- **Pattern**: [Mindset/Navigation/Philosophy/Process/Tool]
-- **Knowledge Ratio**: E:A:R = X:Y:Z
-- **Verdict**: [One sentence assessment]
-
-## Dimension Scores
-[Table with scores for all 8 dimensions]
-
-## Critical Issues
-[Must-fix problems]
-
-## Top 3 Improvements
-[Prioritized improvement suggestions]
-
-## Detailed Analysis
-[In-depth analysis for dimensions scoring below 80%]
-```
-
-## Best Practices
-
-### When Evaluating Skills
-
-**Do:**
-- Always check the description field first (it's the most critical)
-- Ask "Does Claude already know this?" for every section
-- Look for specific anti-patterns with non-obvious reasons
-- Verify decision trees actually lead to correct choices
-- Check that loading triggers are embedded in workflows
-
-**Never:**
-- Give high scores just because content looks professional
-- Ignore token waste from redundant explanations
-- Let length impress you (43 lines can outperform 500)
-- Forgive explaining basics as "helpful context"
-- Put "when to use" information only in the body
-
-### The Meta-Question
-
-When evaluating any Skill, always ask:
-
-> "Would an expert in this domain, looking at this Skill, say: 'Yes, this captures knowledge that took me years to learn'?"
-
-If yes, the Skill has genuine value. If no, it's compressing what Claude already knows.
-
-## Quick Reference Checklist
-
-```
-SKILL EVALUATION QUICK CHECK
-
-KNOWLEDGE DELTA (most important):
-  [ ] No "What is X" explanations for basic concepts
-  [ ] No step-by-step tutorials for standard operations
-  [ ] Has decision trees for non-obvious choices
-  [ ] Has trade-offs only experts would know
-  [ ] Has edge cases from real-world experience
-
-MINDSET + PROCEDURES:
-  [ ] Transfers thinking patterns (how to think about problems)
-  [ ] Has "Before doing X, ask yourself..." frameworks
-  [ ] Includes domain-specific procedures Claude wouldn't know
-
-ANTI-PATTERNS:
-  [ ] Has explicit NEVER list
-  [ ] Anti-patterns are specific, not vague
-  [ ] Includes WHY (non-obvious reasons)
-
-SPECIFICATION:
-  [ ] Valid YAML frontmatter
-  [ ] Description answers: WHAT, WHEN, KEYWORDS
-  [ ] Description specific enough for Agent activation
-
-STRUCTURE:
-  [ ] SKILL.md < 500 lines (ideal < 300)
-  [ ] Loading triggers embedded in workflow
-  [ ] Has "Do NOT Load" for preventing over-loading
-
-FREEDOM:
-  [ ] Creative tasks -> High freedom (principles)
-  [ ] Fragile operations -> Low freedom (exact scripts)
-
-USABILITY:
-  [ ] Decision trees for multi-path scenarios
-  [ ] Working code examples
-  [ ] Error handling and fallbacks
-```
-
-## Prerequisites
-
-None. Skill Judge is self-contained and requires no external tools or dependencies.
-
-## Related Concepts
-
-- **Tool vs Skill**: Tools define capability boundaries (what Claude CAN do). Skills inject knowledge (what Claude KNOWS how to do).
-- **Progressive Disclosure**: Three-layer loading system (metadata -> SKILL.md body -> resources)
-- **Freedom Calibration**: Matching constraint level to task fragility
diff --git a/skills/skill-judge/SKILL.md b/skills/skill-judge/SKILL.md
index 6d18397..ebb3f35 100644
--- a/skills/skill-judge/SKILL.md
+++ b/skills/skill-judge/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: skill-judge
-description: Evaluate Agent Skill design quality against official specifications and best practices. Use when reviewing, auditing, or improving SKILL.md files and skill packages. Provides multi-dimensional scoring and actionable improvement suggestions.
+description: Evaluate Agent Skill design quality against official specifications and best practices. Use when reviewing, auditing, scoring, or improving SKILL.md files and skill packages. Provides multi-dimensional scoring across 8 dimensions (120 points total) and actionable improvement suggestions. Triggers on phrases like "evaluate this skill", "review my SKILL.md", "audit this skill", "score this skill", "how can I improve this skill", or "is this skill well-designed".
 ---
 
 # Skill Judge
@@ -9,537 +9,98 @@ Evaluate Agent Skills against official specifications and patterns derived from
 
 ---
 
-## Core Philosophy
+## Mindset
 
-### What is a Skill?
-
-A Skill is NOT a tutorial. A Skill is a **knowledge externalization mechanism**.
-
-Traditional AI knowledge is locked in model parameters. To teach new capabilities:
-```
-Traditional: Collect data → GPU cluster → Train → Deploy new version
-Cost: $10,000 - $1,000,000+
-Timeline: Weeks to months
-```
-
-Skills change this:
-```
-Skill: Edit SKILL.md → Save → Takes effect on next invocation
-Cost: $0
-Timeline: Instant
-```
-
-This is the paradigm shift from "training AI" to "educating AI" — like a hot-swappable LoRA adapter that requires no training. You edit a Markdown file in natural language, and the model's behavior changes.
-
-### The Core Formula
-
-> **Good Skill = Expert-only Knowledge − What Claude Already Knows**
-
-A Skill's value is measured by its **knowledge delta** — the gap between what it provides and what the model already knows.
-
-- **Expert-only knowledge**: Decision trees, trade-offs, edge cases, anti-patterns, domain-specific thinking frameworks — things that take years of experience to accumulate
-- **What Claude already knows**: Basic concepts, standard library usage, common programming patterns, general best practices
-
-When a Skill explains "what is PDF" or "how to write a for-loop", it's compressing knowledge Claude already has. This is **token waste** — context window is a public resource shared with system prompts, conversation history, other Skills, and user requests.
-
-### Tool vs Skill
-
-| Concept | Essence | Function | Example |
-|---------|---------|----------|---------|
-| **Tool** | What model CAN do | Execute actions | bash, read_file, write_file, WebSearch |
-| **Skill** | What model KNOWS how to do | Guide decisions | PDF processing, MCP building, frontend design |
-
-Tools define capability boundaries — without bash tool, model can't execute commands.
-Skills inject knowledge — without frontend-design Skill, model produces generic UI.
-
-**The equation**:
-```
-General Agent + Excellent Skill = Domain Expert Agent
-```
-
-Same Claude model, different Skills loaded, becomes different experts.
-
-### Three Types of Knowledge in Skills
-
-When evaluating, categorize each section:
-
-| Type | Definition | Treatment |
-|------|------------|-----------|
-| **Expert** | Claude genuinely doesn't know this | Must keep — this is the Skill's value |
-| **Activation** | Claude knows but may not think of | Keep if brief — serves as reminder |
-| **Redundant** | Claude definitely knows this | Should delete — wastes tokens |
-
-The art of Skill design is maximizing Expert content, using Activation sparingly, and eliminating Redundant ruthlessly.
-
----
-
-## Evaluation Dimensions (120 points total)
-
-### D1: Knowledge Delta (20 points) — THE CORE DIMENSION
-
-The most important dimension. Does the Skill add genuine expert knowledge?
-
-| Score | Criteria |
-|-------|----------|
-| 0-5 | Explains basics Claude knows (what is X, how to write code, standard library tutorials) |
-| 6-10 | Mixed: some expert knowledge diluted by obvious content |
-| 11-15 | Mostly expert knowledge with minimal redundancy |
-| 16-20 | Pure knowledge delta — every paragraph earns its tokens |
-
-**Red flags** (instant score ≤5):
-- "What is [basic concept]" sections
-- Step-by-step tutorials for standard operations
-- Explaining how to use common libraries
-- Generic best practices ("write clean code", "handle errors")
-- Definitions of industry-standard terms
-
-**Green flags** (indicators of high knowledge delta):
-- Decision trees for non-obvious choices ("when X fails, try Y because Z")
-- Trade-offs only an expert would know ("A is faster but B handles edge case C")
-- Edge cases from real-world experience
-- "NEVER do X because [non-obvious reason]"
-- Domain-specific thinking frameworks
-
-**Evaluation questions**:
-1. For each section, ask: "Does Claude already know this?"
-2. If explaining something, ask: "Is this explaining TO Claude or FOR Claude?"
-3. Count paragraphs that are Expert vs Activation vs Redundant
-
----
-
-### D2: Mindset + Appropriate Procedures (15 points)
-
-Does the Skill transfer expert **thinking patterns** along with **necessary domain-specific procedures**?
-
-The difference between experts and novices isn't "knowing how to operate" — it's "how to think about the problem." But thinking patterns alone aren't enough when Claude lacks domain-specific procedural knowledge.
-
-**Key distinction**:
-| Type | Example | Value |
-|------|---------|-------|
-| **Thinking patterns** | "Before designing, ask: What makes this memorable?" | High — shapes decision-making |
-| **Domain-specific procedures** | "OOXML workflow: unpack → edit XML → validate → pack" | High — Claude may not know this |
-| **Generic procedures** | "Step 1: Open file, Step 2: Edit, Step 3: Save" | Low — Claude already knows |
-
-| Score | Criteria |
-|-------|----------|
-| 0-3 | Only generic procedures Claude already knows |
-| 4-7 | Has domain procedures but lacks thinking frameworks |
-| 8-11 | Good balance: thinking patterns + domain-specific workflows |
-| 12-15 | Expert-level: shapes thinking AND provides procedures Claude wouldn't know |
-
-**What counts as valuable procedures**:
-- Workflows Claude hasn't been trained on (new tools, proprietary systems)
-- Correct ordering that's non-obvious (e.g., "validate BEFORE packing, not after")
-- Critical steps that are easy to miss (e.g., "MUST recalculate formulas after editing")
-- Domain-specific sequences (e.g., MCP server's 4-phase development process)
-
-**What counts as redundant procedures**:
-- Generic file operations (open, read, write, save)
-- Standard programming patterns (loops, conditionals, error handling)
-- Common library usage that's well-documented
-
-**Expert thinking patterns look like**:
-```markdown
-Before [action], ask yourself:
-- **Purpose**: What problem does this solve? Who uses it?
-- **Constraints**: What are the hidden requirements?
-- **Differentiation**: What makes this solution memorable?
-```
-
-**Valuable domain procedures look like**:
-```markdown
-### Redlining Workflow (Claude wouldn't know this sequence)
-1. Convert to markdown: `pandoc --track-changes=all`
-2. Map text to XML: grep for text in document.xml
-3. Implement changes in batches of 3-10
-4. Pack and verify: check ALL changes were applied
-```
-
-**Redundant generic procedures look like**:
-```markdown
-Step 1: Open the file
-Step 2: Find the section
-Step 3: Make the change
-Step 4: Save and test
-```
-
-**The test**:
-1. Does it tell Claude WHAT to think about? (thinking patterns)
-2. Does it tell Claude HOW to do things it wouldn't know? (domain procedures)
-
-A good Skill provides both when needed.
-
----
-
-### D3: Anti-Pattern Quality (15 points)
-
-Does the Skill have effective NEVER lists?
-
-**Why this matters**: Half of expert knowledge is knowing what NOT to do. A senior designer sees purple gradient on white background and instinctively cringes — "too AI-generated." This intuition for "what absolutely not to do" comes from stepping on countless landmines.
-
-Claude hasn't stepped on these landmines. It doesn't know Inter font is overused, doesn't know purple gradients are the signature of AI-generated content. Good Skills must explicitly state these "absolute don'ts."
-
-| Score | Criteria |
-|-------|----------|
-| 0-3 | No anti-patterns mentioned |
-| 4-7 | Generic warnings ("avoid errors", "be careful", "consider edge cases") |
-| 8-11 | Specific NEVER list with some reasoning |
-| 12-15 | Expert-grade anti-patterns with WHY — things only experience teaches |
-
-**Expert anti-patterns** (specific + reason):
-```markdown
-NEVER use generic AI-generated aesthetics like:
-- Overused font families (Inter, Roboto, Arial)
-- Cliched color schemes (particularly purple gradients on white backgrounds)
-- Predictable layouts and component patterns
-- Default border-radius on everything
-```
-
-**Weak anti-patterns** (vague, no reasoning):
-```markdown
-Avoid making mistakes.
-Be careful with edge cases.
-Don't write bad code.
-```
-
-**The test**: Would an expert read the anti-pattern list and say "yes, I learned this the hard way"? Or would they say "this is obvious to everyone"?
-
----
-
-### D4: Specification Compliance — Especially Description (15 points)
-
-Does the Skill follow official format requirements? **Special focus on description quality.**
-
-| Score | Criteria |
-|-------|----------|
-| 0-5 | Missing frontmatter or invalid format |
-| 6-10 | Has frontmatter but description is vague or incomplete |
-| 11-13 | Valid frontmatter, description has WHAT but weak on WHEN |
-| 14-15 | Perfect: comprehensive description with WHAT, WHEN, and trigger keywords |
-
-**Frontmatter requirements**:
-- `name`: lowercase, alphanumeric + hyphens only, ≤64 characters
-- `description`: **THE MOST CRITICAL FIELD** — determines if skill gets used at all
-
----
-
-**Why description is THE MOST IMPORTANT field**:
-
-```
-┌─────────────────────────────────────────────────────────────────────┐
-│  SKILL ACTIVATION FLOW                                              │
-│                                                                     │
-│  User Request → Agent sees ALL skill descriptions → Decides which  │
-│                 (only descriptions, not bodies!)     to activate    │
-│                                                                     │
-│  If description doesn't match → Skill NEVER gets loaded            │
-│  If description is vague → Skill might not trigger when it should  │
-│  If description lacks keywords → Skill is invisible to the Agent   │
-└─────────────────────────────────────────────────────────────────────┘
-```
-
-**The brutal truth**: A Skill with perfect content but poor description is **useless** — it will never be activated. The description is the **only chance** to tell the Agent "use me in these situations."
-
----
-
-**Description must answer THREE questions**:
-
-1. **WHAT**: What does this Skill do? (functionality)
-2. **WHEN**: In what situations should it be used? (trigger scenarios)
-3. **KEYWORDS**: What terms should trigger this Skill? (searchable terms)
-
-**Excellent description** (all three elements):
-```yaml
-description: "Comprehensive document creation, editing, and analysis with support
-for tracked changes, comments, formatting preservation, and text extraction.
-When Claude needs to work with professional documents (.docx files) for:
-(1) Creating new documents, (2) Modifying or editing content,
-(3) Working with tracked changes, (4) Adding comments, or any other document tasks"
-```
-
-Analysis:
-- WHAT: creation, editing, analysis, tracked changes, comments
-- WHEN: "When Claude needs to work with... for: (1)... (2)... (3)..."
-- KEYWORDS: .docx files, tracked changes, professional documents
-
-**Poor description** (missing elements):
-```yaml
-description: "处理文档相关功能"
-```
-
-Problems:
-- WHAT: vague ("文档相关功能" — what specifically?)
-- WHEN: missing (when should Agent use this?)
-- KEYWORDS: missing (no ".docx", no specific scenarios)
-
-**Another poor example**:
-```yaml
-description: "A helpful skill for various tasks"
-```
-
-This is useless — Agent has no idea when to activate it.
-
----
-
-**Description quality checklist**:
-- [ ] Lists specific capabilities (not just "helps with X")
-- [ ] Includes explicit trigger scenarios ("Use when...", "When user asks for...")
-- [ ] Contains searchable keywords (file extensions, domain terms, action verbs)
-- [ ] Specific enough that Agent knows EXACTLY when to use it
-- [ ] Includes scenarios where this skill MUST be used (not just "can be used")
-
----
-
-### D5: Progressive Disclosure (15 points)
-
-Does the Skill implement proper content layering?
-
-Skill loading has three layers:
-```
-Layer 1: Metadata (always in memory)
-         Only name + description
-         ~100 tokens per skill
-
-Layer 2: SKILL.md Body (loaded after triggering)
-         Detailed guidelines, code examples, decision trees
-         Ideal: < 500 lines
-
-Layer 3: Resources (loaded on demand)
-         scripts/, references/, assets/
-         No limit
-```
-
-| Score | Criteria |
-|-------|----------|
-| 0-5 | Everything dumped in SKILL.md (>500 lines, no structure) |
-| 6-10 | Has references but unclear when to load them |
-| 11-13 | Good layering with MANDATORY triggers present |
-| 14-15 | Perfect: decision trees + explicit triggers + "Do NOT Load" guidance |
-
-**For Skills WITH references directory**, check Loading Trigger Quality:
-
-| Trigger Quality | Characteristics |
-|-----------------|-----------------|
-| Poor | References listed at end, no loading guidance |
-| Mediocre | Some triggers but not embedded in workflow |
-| Good | MANDATORY triggers in workflow steps |
-| Excellent | Scenario detection + conditional triggers + "Do NOT Load" |
-
-**The loading problem**:
-```
-Loading too little ◄─────────────────────────────────► Loading too much
-- References sit unused                    - Wastes context space
-- Agent doesn't know when to load          - Irrelevant info dilutes key content
-- Knowledge is there but never accessed    - Unnecessary token overhead
-```
-
-**Good loading trigger** (embedded in workflow):
-```markdown
-### Creating New Document
-
-**MANDATORY - READ ENTIRE FILE**: Before proceeding, you MUST read
-[`docx-js.md`](docx-js.md) (~500 lines) completely from start to finish.
-**NEVER set any range limits when reading this file.**
-
-**Do NOT load** `ooxml.md` or `redlining.md` for this task.
-```
-
-**Bad loading trigger** (just listed):
-```markdown
-## References
-- docx-js.md - for creating documents
-- ooxml.md - for editing
-- redlining.md - for tracking changes
-```
-
-**For simple Skills** (no references, <100 lines): Score based on conciseness and self-containment.
-
----
-
-### D6: Freedom Calibration (15 points)
-
-Is the level of specificity appropriate for the task's fragility?
-
-Different tasks need different levels of constraint. This is about matching freedom to fragility.
-
-| Score | Criteria |
-|-------|----------|
-| 0-5 | Severely mismatched (rigid scripts for creative tasks, vague for fragile ops) |
-| 6-10 | Partially appropriate, some mismatches |
-| 11-13 | Good calibration for most scenarios |
-| 14-15 | Perfect freedom calibration throughout |
-
-**The freedom spectrum**:
-
-| Task Type | Should Have | Why | Example Skill |
-|-----------|-------------|-----|---------------|
-| Creative/Design | High freedom | Multiple valid approaches, differentiation is value | frontend-design |
-| Code review | Medium freedom | Principles exist but judgment required | code-review |
-| File format operations | Low freedom | One wrong byte corrupts file, consistency critical | docx, xlsx, pdf |
-
-**High freedom** (text-based instructions):
-```markdown
-Commit to a BOLD aesthetic direction. Pick an extreme: brutally minimal,
-maximalist chaos, retro-futuristic, organic natural...
-```
-
-**Medium freedom** (pseudocode or parameterized):
-```markdown
-Review priority:
-1. Security vulnerabilities (must fix)
-2. Logic errors (must fix)
-3. Performance issues (should fix)
-4. Maintainability (optional)
-```
-
-**Low freedom** (specific scripts, exact steps):
-```markdown
-**MANDATORY**: Use exact script in `scripts/create-doc.py`
-Parameters: --title "X" --author "Y"
-Do NOT modify the script.
-```
-
-**The test**: Ask "if Agent makes a mistake, what's the consequence?"
-- High consequence → Low freedom
-- Low consequence → High freedom
+1. **Value = Delta, not volume.** A 43-line skill outperforms a 500-line skill when every line in the short one is expert-only knowledge. Length is a liability, not a credential.
+2. **Description is load-bearing.** Agent never sees the body until the description triggers activation. A perfect skill with a weak description is a dead skill.
+3. **Expert ≠ Thorough.** The question is not "did I cover it?" but "does Claude already know it?" Anything Claude already knows is token waste.
+4. **Anti-patterns are half the knowledge.** Experts know what NOT to do as much as what to do. A skill without a NEVER list is missing 50% of its value.
+5. **Freedom must match fragility.** Creative tasks need principles. Fragile operations (file formats, APIs) need exact scripts. Confusing these is a structural failure.
 
 ---
 
-### D7: Pattern Recognition (10 points)
-
-Does the Skill follow an established official pattern?
+## Navigation
 
-Through analyzing 17 official Skills, we identified 5 main design patterns:
+**Use this skill when**: reviewing a SKILL.md before publishing, auditing an existing skill, generating an improvement report, or learning skill design patterns.
 
-| Pattern | ~Lines | Key Characteristics | Example | When to Use |
-|---------|--------|---------------------|---------|-------------|
-| **Mindset** | ~50 | Thinking > technique, strong NEVER list, high freedom | frontend-design | Creative tasks requiring taste |
-| **Navigation** | ~30 | Minimal SKILL.md, routes to sub-files | internal-comms | Multiple distinct scenarios |
-| **Philosophy** | ~150 | Two-step: Philosophy → Express, emphasizes craft | canvas-design | Art/creation requiring originality |
-| **Process** | ~200 | Phased workflow, checkpoints, medium freedom | mcp-builder | Complex multi-step projects |
-| **Tool** | ~300 | Decision trees, code examples, low freedom | docx, pdf, xlsx | Precise operations on specific formats |
+**Do NOT use this skill when**: the user wants to create a new skill from scratch (use skill-creator), or wants general writing/formatting feedback unrelated to agent skill design.
 
-| Score | Criteria |
-|-------|----------|
-| 0-3 | No recognizable pattern, chaotic structure |
-| 4-6 | Partially follows a pattern with significant deviations |
-| 7-8 | Clear pattern with minor deviations |
-| 9-10 | Masterful application of appropriate pattern |
+**Decision tree — what to load**:
 
-**Pattern selection guide**:
+| Scenario | Action |
+|----------|--------|
+| Quick score needed | Run 5-step protocol below; load nothing extra |
+| Score seems off / need calibration | Load [`references/worked-example.md`](references/worked-example.md) |
+| Detailed rubric for any dimension | Load [`references/scoring-rubric.md`](references/scoring-rubric.md) |
+| Diagnosing a structural failure pattern | Load [`references/failure-patterns.md`](references/failure-patterns.md) |
+| Full audit (all dimensions deep) | Load all three references |
 
-| Your Task Characteristics | Recommended Pattern |
-|---------------------------|---------------------|
-| Needs taste and creativity | Mindset (~50 lines) |
-| Needs originality and craft quality | Philosophy (~150 lines) |
-| Has multiple distinct sub-scenarios | Navigation (~30 lines) |
-| Complex multi-step project | Process (~200 lines) |
-| Precise operations on specific format | Tool (~300 lines) |
+Do NOT load references just because they exist — load only what the current scenario requires.
 
 ---
 
-### D8: Practical Usability (15 points)
-
-Can an Agent actually use this Skill effectively?
-
-| Score | Criteria |
-|-------|----------|
-| 0-5 | Confusing, incomplete, contradictory, or untested guidance |
-| 6-10 | Usable but with noticeable gaps |
-| 11-13 | Clear guidance for common cases |
-| 14-15 | Comprehensive coverage including edge cases and error handling |
-
-**Check for**:
-- **Decision trees**: For multi-path scenarios, is there clear guidance on which path to take?
-- **Code examples**: Do they actually work? Or are they pseudocode that breaks?
-- **Error handling**: What if the main approach fails? Are fallbacks provided?
-- **Edge cases**: Are unusual but realistic scenarios covered?
-- **Actionability**: Can Agent immediately act, or needs to figure things out?
-
-**Good usability** (decision tree + fallback):
-```markdown
-| Task | Primary Tool | Fallback | When to Use Fallback |
-|------|-------------|----------|----------------------|
-| Read text | pdftotext | PyMuPDF | Need layout info |
-| Extract tables | camelot-py | tabula-py | camelot fails |
-
-**Common issues**:
-- Scanned PDF: pdftotext returns blank → Use OCR first
-- Encrypted PDF: Permission error → Use PyMuPDF with password
-```
+## Philosophy
 
-**Poor usability** (vague):
-```markdown
-Use appropriate tools for PDF processing.
-Handle errors properly.
-Consider edge cases.
-```
+A Skill's entire value is the gap between what it provides and what Claude already knows. Maximize that gap. Minimize everything else. The best Skills are compressed expert brains — not tutorials, not documentation, not software projects.
 
 ---
 
-## NEVER Do When Evaluating
+## NEVER
 
-- **NEVER** give high scores just because it "looks professional" or is well-formatted
-- **NEVER** ignore token waste — every redundant paragraph should result in deduction
-- **NEVER** let length impress you — a 43-line Skill can outperform a 500-line Skill
-- **NEVER** skip mentally testing the decision trees — do they actually lead to correct choices?
-- **NEVER** forgive explaining basics with "but it provides helpful context"
-- **NEVER** overlook missing anti-patterns — if there's no NEVER list, that's a significant gap
-- **NEVER** assume all procedures are valuable — distinguish domain-specific from generic
-- **NEVER** undervalue the description field — poor description = skill never gets used
-- **NEVER** put "when to use" info only in the body — Agent only sees description before loading
+- **NEVER give high scores because content looks professional or well-formatted** — polish hides redundancy; always ask "does Claude already know this?" for every paragraph
+- **NEVER let length impress you** — a 750-line skill that restates basics scores below a 50-line skill with pure expert knowledge; line count is inversely correlated with quality when content is redundant
+- **NEVER forgive explaining basics as "helpful context"** — token waste is token waste; "What is a PDF" belongs in a textbook, not a skill; context window is a shared resource
+- **NEVER overlook a missing NEVER list** — if the skill has no explicit anti-patterns with reasons, D3 cannot exceed 7/15 regardless of other quality
+- **NEVER put triggering information only in the body** — Agent makes the activation decision from the description alone; "When to Use" sections in the body are useless for routing
+- **NEVER accept vague anti-patterns** ("be careful", "avoid errors") — they score 1-3/15; valid anti-patterns name the specific action AND the non-obvious consequence
+- **NEVER assume references are being used** — check for embedded MANDATORY loading triggers in workflow steps; a list at the bottom is Pattern 3 (Orphan References)
+- **NEVER skip testing decision trees mentally** — walk each branch; if a branch leads to an ambiguous or wrong choice, D8 takes a hit regardless of how the tree looks visually
 
 ---
 
-## Evaluation Protocol
-
-### Step 1: First Pass — Knowledge Delta Scan
-
-Read SKILL.md completely and for each section ask:
-> "Does Claude already know this?"
+## Evaluation Protocol (5 Steps)
 
-Mark each section as:
-- **[E] Expert**: Claude genuinely doesn't know this — value-add
-- **[A] Activation**: Claude knows but brief reminder is useful — acceptable
-- **[R] Redundant**: Claude definitely knows this — should be deleted
+### Step 1: Knowledge Delta Scan
+Read SKILL.md completely. For each section mark:
+- **[E] Expert**: Claude genuinely doesn't know — keep
+- **[A] Activation**: Claude knows but brief reminder is useful — keep if brief
+- **[R] Redundant**: Claude definitely knows — should delete
 
-Calculate rough ratio: E:A:R
-- Good Skill: >70% Expert, <20% Activation, <10% Redundant
-- Mediocre Skill: 40-70% Expert, high Activation
-- Bad Skill: <40% Expert, high Redundant
+Calculate ratio. Good skill: >70% E, <20% A, <10% R.
 
 ### Step 2: Structure Analysis
-
 ```
-[ ] Check frontmatter validity
-[ ] Count total lines in SKILL.md
-[ ] List all reference files and their sizes
-[ ] Identify which pattern the Skill follows
-[ ] Check for loading triggers (if references exist)
+[ ] frontmatter valid (name: lowercase ≤64 chars, description present)
+[ ] SKILL.md line count (flag if >500)
+[ ] references/ exists and has loading triggers in workflow (not just listed)
+[ ] no auxiliary files (README.md, CHANGELOG.md — Pattern 8)
+[ ] identify which of 5 patterns the skill follows
 ```
 
 ### Step 3: Score Each Dimension
 
-For each of the 8 dimensions:
-1. Find specific evidence (quote relevant lines)
-2. Assign score with one-line justification
-3. Note specific improvements if score < max
-
-### Step 4: Calculate Total & Grade
-
-```
-Total = D1 + D2 + D3 + D4 + D5 + D6 + D7 + D8
-Max = 120 points
-```
-
-**Grade Scale** (percentage-based):
-| Grade | Percentage | Meaning |
-|-------|------------|---------|
-| A | 90%+ (108+) | Excellent — production-ready expert Skill |
-| B | 80-89% (96-107) | Good — minor improvements needed |
-| C | 70-79% (84-95) | Adequate — clear improvement path |
-| D | 60-69% (72-83) | Below Average — significant issues |
-| F | <60% (<72) | Poor — needs fundamental redesign |
+**For full scoring criteria per dimension** — load [`references/scoring-rubric.md`](references/scoring-rubric.md).
+
+| Dimension | Max | Key question |
+|-----------|-----|-------------|
+| D1: Knowledge Delta | 20 | Pure expert content, or mixed with basics Claude knows? |
+| D2: Mindset + Procedures | 15 | Transfers thinking patterns AND non-obvious workflows? |
+| D3: Anti-Pattern Quality | 15 | Specific NEVER list with non-obvious reasons? |
+| D4: Spec Compliance | 15 | Description answers WHAT + WHEN + KEYWORDS? |
+| D5: Progressive Disclosure | 15 | SKILL.md <500 lines, embedded loading triggers? |
+| D6: Freedom Calibration | 15 | Freedom level matches task fragility? |
+| D7: Pattern Recognition | 10 | Follows Mindset/Navigation/Philosophy/Process/Tool? |
+| D8: Practical Usability | 15 | Decision trees, error handling, edge cases? |
+
+### Step 4: Calculate & Grade
+
+| Grade | Score | Meaning |
+|-------|-------|---------|
+| A | 108+ (90%+) | Production-ready expert skill |
+| B | 96-107 (80-89%) | Good — minor improvements needed |
+| C | 84-95 (70-79%) | Adequate — clear improvement path |
+| D | 72-83 (60-69%) | Below average — significant issues |
+| F | <72 (<60%) | Poor — needs fundamental redesign |
 
 ### Step 5: Generate Report
 
@@ -558,7 +119,7 @@ Max = 120 points
 | Dimension | Score | Max | Notes |
 |-----------|-------|-----|-------|
 | D1: Knowledge Delta | X | 20 | |
-| D2: Mindset vs Mechanics | X | 15 | |
+| D2: Mindset + Procedures | X | 15 | |
 | D3: Anti-Pattern Quality | X | 15 | |
 | D4: Specification Compliance | X | 15 | |
 | D5: Progressive Disclosure | X | 15 | |
@@ -567,186 +128,30 @@ Max = 120 points
 | D8: Practical Usability | X | 15 | |
 
 ## Critical Issues
-[List must-fix problems that significantly impact the Skill's effectiveness]
+[Must-fix problems that significantly impact effectiveness]
 
 ## Top 3 Improvements
 1. [Highest impact improvement with specific guidance]
-2. [Second priority improvement]
-3. [Third priority improvement]
-
-## Detailed Analysis
-[For each dimension scoring below 80%, provide:
-- What's missing or problematic
-- Specific examples from the Skill
-- Concrete suggestions for improvement]
+2. [Second priority]
+3. [Third priority]
 ```
 
 ---
 
-## Common Failure Patterns
+## When Things Go Wrong
 
-### Pattern 1: The Tutorial
-```
-Symptom: Explains what PDF is, how Python works, basic library usage
-Root cause: Author assumes Skill should "teach" the model
-Fix: Claude already knows this. Delete all basic explanations.
-     Focus on expert decisions, trade-offs, and anti-patterns.
-```
-
-### Pattern 2: The Dump
-```
-Symptom: SKILL.md is 800+ lines with everything included
-Root cause: No progressive disclosure design
-Fix: Core routing and decision trees in SKILL.md (<300 lines ideal)
-     Detailed content in references/, loaded on-demand
-```
-
-### Pattern 3: The Orphan References
-```
-Symptom: References directory exists but files are never loaded
-Root cause: No explicit loading triggers
-Fix: Add "MANDATORY - READ ENTIRE FILE" at workflow decision points
-     Add "Do NOT Load" to prevent over-loading
-```
-
-### Pattern 4: The Checkbox Procedure
-```
-Symptom: Step 1, Step 2, Step 3... mechanical procedures
-Root cause: Author thinks in procedures, not thinking frameworks
-Fix: Transform into "Before doing X, ask yourself..."
-     Focus on decision principles, not operation sequences
-```
-
-### Pattern 5: The Vague Warning
-```
-Symptom: "Be careful", "avoid errors", "consider edge cases"
-Root cause: Author knows things can go wrong but hasn't articulated specifics
-Fix: Specific NEVER list with concrete examples and non-obvious reasons
-     "NEVER use X because [specific problem that takes experience to learn]"
-```
-
-### Pattern 6: The Invisible Skill
-```
-Symptom: Great content but skill rarely gets activated
-Root cause: Description is vague, missing keywords, or lacks trigger scenarios
-Fix: Description must answer WHAT, WHEN, and include KEYWORDS
-     "Use when..." + specific scenarios + searchable terms
-
-Example fix:
-BAD:  "Helps with document tasks"
-GOOD: "Create, edit, and analyze .docx files. Use when working with
-       Word documents, tracked changes, or professional document formatting."
-```
-
-### Pattern 7: The Wrong Location
-```
-Symptom: "When to use this Skill" section in body, not in description
-Root cause: Misunderstanding of three-layer loading
-Fix: Move all triggering information to description field
-     Body is only loaded AFTER triggering decision is made
-```
-
-### Pattern 8: The Over-Engineered
-```
-Symptom: README.md, CHANGELOG.md, INSTALLATION_GUIDE.md, CONTRIBUTING.md
-Root cause: Treating Skill like a software project
-Fix: Delete all auxiliary files. Only include what Agent needs for the task.
-     No documentation about the Skill itself.
-```
-
-### Pattern 9: The Freedom Mismatch
-```
-Symptom: Rigid scripts for creative tasks, vague guidance for fragile operations
-Root cause: Not considering task fragility
-Fix: High freedom for creative (principles, not steps)
-     Low freedom for fragile (exact scripts, no parameters)
-```
-
----
-
-## Quick Reference Checklist
-
-```
-┌─────────────────────────────────────────────────────────────────────────┐
-│  SKILL EVALUATION QUICK CHECK                                           │
-├─────────────────────────────────────────────────────────────────────────┤
-│                                                                         │
-│  KNOWLEDGE DELTA (most important):                                      │
-│    [ ] No "What is X" explanations for basic concepts                   │
-│    [ ] No step-by-step tutorials for standard operations                │
-│    [ ] Has decision trees for non-obvious choices                       │
-│    [ ] Has trade-offs only experts would know                           │
-│    [ ] Has edge cases from real-world experience                        │
-│                                                                         │
-│  MINDSET + PROCEDURES:                                                  │
-│    [ ] Transfers thinking patterns (how to think about problems)        │
-│    [ ] Has "Before doing X, ask yourself..." frameworks                 │
-│    [ ] Includes domain-specific procedures Claude wouldn't know         │
-│    [ ] Distinguishes valuable procedures from generic ones              │
-│                                                                         │
-│  ANTI-PATTERNS:                                                         │
-│    [ ] Has explicit NEVER list                                          │
-│    [ ] Anti-patterns are specific, not vague                            │
-│    [ ] Includes WHY (non-obvious reasons)                               │
-│                                                                         │
-│  SPECIFICATION (description is critical!):                              │
-│    [ ] Valid YAML frontmatter                                           │
-│    [ ] name: lowercase, ≤64 chars                                       │
-│    [ ] description answers: WHAT does it do?                            │
-│    [ ] description answers: WHEN should it be used?                     │
-│    [ ] description contains trigger KEYWORDS                            │
-│    [ ] description is specific enough for Agent to know when to use     │
-│                                                                         │
-│  STRUCTURE:                                                             │
-│    [ ] SKILL.md < 500 lines (ideal < 300)                               │
-│    [ ] Heavy content in references/                                     │
-│    [ ] Loading triggers embedded in workflow                            │
-│    [ ] Has "Do NOT Load" for preventing over-loading                    │
-│                                                                         │
-│  FREEDOM:                                                               │
-│    [ ] Creative tasks → High freedom (principles)                       │
-│    [ ] Fragile operations → Low freedom (exact scripts)                 │
-│                                                                         │
-│  USABILITY:                                                             │
-│    [ ] Decision trees for multi-path scenarios                          │
-│    [ ] Working code examples                                            │
-│    [ ] Error handling and fallbacks                                     │
-│    [ ] Edge cases covered                                               │
-│                                                                         │
-└─────────────────────────────────────────────────────────────────────────┘
-```
+| Problem | Likely Cause | Fix |
+|---------|-------------|-----|
+| Skill never activates despite good content | Description missing WHEN clause or trigger keywords | Rewrite description with "Use when..." + domain terms |
+| Score looks right but report feels off | Calibration drift — judging relative to other skills, not absolute standard | Load [`references/worked-example.md`](references/worked-example.md) to recalibrate against a fixed anchor |
+| Dimension score disputed by skill author | Vague rubric interpretation | Load [`references/scoring-rubric.md`](references/scoring-rubric.md) — share the specific band criteria |
+| Skill has all 9 failure patterns simultaneously | Classic "software project" confusion | Load [`references/failure-patterns.md`](references/failure-patterns.md) — address Pattern 2 (The Dump) first, then Pattern 8 (Over-Engineered) |
+| Author insists long = thorough | Fundamental misunderstanding of knowledge delta | Quote D1 rubric: "16-20: Pure knowledge delta — every paragraph earns its tokens" |
 
 ---
 
 ## The Meta-Question
 
-When evaluating any Skill, always return to this fundamental question:
-
-> **"Would an expert in this domain, looking at this Skill, say:**
-> **'Yes, this captures knowledge that took me years to learn'?"**
-
-If the answer is yes → the Skill has genuine value.
-If the answer is no → it's compressing what Claude already knows.
-
-The best Skills are **compressed expert brains** — they take a designer's 10 years of aesthetic accumulation and compress it into 43 lines, or a document expert's operational experience into a 200-line decision tree.
-
-What gets compressed must be things Claude doesn't have. Otherwise, it's garbage compression.
-
----
-
-## Self-Evaluation Note
-
-This Skill (skill-judge) should itself pass evaluation:
-
-- **Knowledge Delta**: Provides specific evaluation criteria Claude wouldn't generate on its own
-- **Mindset**: Shapes how to think about Skill quality, not just checklist items
-- **Anti-Patterns**: "NEVER Do When Evaluating" section with specific don'ts
-- **Specification**: Valid frontmatter with comprehensive description
-- **Progressive Disclosure**: Self-contained, no external references needed
-- **Freedom**: Medium freedom appropriate for evaluation task
-- **Pattern**: Follows Tool pattern with decision frameworks
-- **Usability**: Clear protocol, report template, quick reference
-
-
+> **"Would an expert in this domain, looking at this Skill, say: 'Yes, this captures knowledge that took me years to learn'?"**
 
-Evaluate this Skill against itself as a calibration exercise.
+If yes → genuine value. If no → compressing what Claude already knows.
diff --git a/skills/skill-judge/references/failure-patterns.md b/skills/skill-judge/references/failure-patterns.md
new file mode 100644
index 0000000..6f7e36b
--- /dev/null
+++ b/skills/skill-judge/references/failure-patterns.md
@@ -0,0 +1,74 @@
+# Common Failure Patterns in Skill Design
+
+Nine patterns observed across skill audits. Each has a signature symptom, root cause, and targeted fix.
+
+---
+
+## Pattern 1: The Tutorial
+**Symptom**: Explains what PDF is, how Python works, basic library usage
+**Root cause**: Author assumes Skill should "teach" the model
+**Fix**: Claude already knows this. Delete all basic explanations. Focus on expert decisions, trade-offs, anti-patterns.
+
+---
+
+## Pattern 2: The Dump
+**Symptom**: SKILL.md is 800+ lines with everything included
+**Root cause**: No progressive disclosure design
+**Fix**: Core routing and decision trees in SKILL.md (<300 lines ideal). Detailed content in `references/`, loaded on-demand.
+
+---
+
+## Pattern 3: The Orphan References
+**Symptom**: References directory exists but files are never loaded
+**Root cause**: No explicit loading triggers — just a list at the bottom
+**Fix**: Add "MANDATORY - READ ENTIRE FILE" at workflow decision points. Add "Do NOT Load" to prevent over-loading.
+
+---
+
+## Pattern 4: The Checkbox Procedure
+**Symptom**: Step 1, Step 2, Step 3... mechanical procedures
+**Root cause**: Author thinks in procedures, not thinking frameworks
+**Fix**: Transform into "Before doing X, ask yourself..." Focus on decision principles, not operation sequences.
+
+---
+
+## Pattern 5: The Vague Warning
+**Symptom**: "Be careful", "avoid errors", "consider edge cases"
+**Root cause**: Author knows things can go wrong but hasn't articulated specifics
+**Fix**: Specific NEVER list with concrete examples and non-obvious reasons: "NEVER use X because [specific problem that takes experience to learn]"
+
+---
+
+## Pattern 6: The Invisible Skill
+**Symptom**: Great content but skill rarely gets activated
+**Root cause**: Description is vague, missing keywords, or lacks trigger scenarios
+**Fix**: Description must answer WHAT, WHEN, and include KEYWORDS.
+
+```
+BAD:  "Helps with document tasks"
+GOOD: "Create, edit, and analyze .docx files. Use when working with
+       Word documents, tracked changes, or professional document formatting."
+```
+
+---
+
+## Pattern 7: The Wrong Location
+**Symptom**: "When to use this Skill" section in body, not in description
+**Root cause**: Misunderstanding three-layer loading — body loads AFTER triggering decision
+**Fix**: Move all triggering information to description field. Body is only loaded after skill is already selected.
+
+---
+
+## Pattern 8: The Over-Engineered
+**Symptom**: README.md, CHANGELOG.md, INSTALLATION_GUIDE.md, CONTRIBUTING.md alongside SKILL.md
+**Root cause**: Treating Skill like a software project
+**Fix**: Delete all auxiliary files. Only include what Agent needs for the task. No documentation about the Skill itself.
+
+---
+
+## Pattern 9: The Freedom Mismatch
+**Symptom**: Rigid scripts for creative tasks; vague guidance for fragile operations
+**Root cause**: Not considering task fragility when designing constraints
+**Fix**:
+- High freedom for creative tasks (principles, not steps)
+- Low freedom for fragile ops (exact scripts, no parameters)
diff --git a/skills/skill-judge/references/scoring-rubric.md b/skills/skill-judge/references/scoring-rubric.md
new file mode 100644
index 0000000..8613bb1
--- /dev/null
+++ b/skills/skill-judge/references/scoring-rubric.md
@@ -0,0 +1,182 @@
+# Scoring Rubric — Full Dimension Details
+
+## D1: Knowledge Delta (20 points) — THE CORE DIMENSION
+
+| Score | Criteria |
+|-------|----------|
+| 0-5 | Explains basics Claude knows (what is X, standard library tutorials) |
+| 6-10 | Mixed: some expert knowledge diluted by obvious content |
+| 11-15 | Mostly expert knowledge with minimal redundancy |
+| 16-20 | Pure knowledge delta — every paragraph earns its tokens |
+
+**Red flags** (instant score ≤5):
+- "What is [basic concept]" sections
+- Step-by-step tutorials for standard operations
+- Generic best practices ("write clean code", "handle errors")
+- Definitions of industry-standard terms
+
+**Green flags**:
+- Decision trees for non-obvious choices ("when X fails, try Y because Z")
+- Trade-offs only an expert would know
+- Edge cases from real-world experience
+- "NEVER do X because [non-obvious reason]"
+
+**Knowledge classification**:
+- **[E] Expert**: Claude genuinely doesn't know — keep
+- **[A] Activation**: Claude knows but brief reminder is useful — keep if brief
+- **[R] Redundant**: Claude definitely knows — delete
+
+Good ratio: >70% Expert, <20% Activation, <10% Redundant
+
+---
+
+## D2: Mindset + Appropriate Procedures (15 points)
+
+| Score | Criteria |
+|-------|----------|
+| 0-3 | Only generic procedures Claude already knows |
+| 4-7 | Has domain procedures but lacks thinking frameworks |
+| 8-11 | Good balance: thinking patterns + domain-specific workflows |
+| 12-15 | Expert-level: shapes thinking AND provides procedures Claude wouldn't know |
+
+**Valuable thinking patterns look like**:
+```markdown
+Before [action], ask yourself:
+- **Purpose**: What problem does this solve? Who uses it?
+- **Constraints**: What are the hidden requirements?
+```
+
+**Valuable domain procedures look like** (non-obvious ordering, easy-to-miss steps):
+```markdown
+OOXML workflow: unpack → edit XML → validate → pack
+MUST recalculate formulas after editing (not before)
+```
+
+**Redundant generic procedures** (skip these): "Step 1: Open file, Step 2: Edit, Step 3: Save"
+
+---
+
+## D3: Anti-Pattern Quality (15 points)
+
+| Score | Criteria |
+|-------|----------|
+| 0-3 | No anti-patterns mentioned |
+| 4-7 | Generic warnings ("avoid errors", "be careful") |
+| 8-11 | Specific NEVER list with some reasoning |
+| 12-15 | Expert-grade anti-patterns with WHY — things only experience teaches |
+
+Expert anti-pattern structure: `NEVER [specific action] because [non-obvious consequence]`
+
+Test: Would an expert say "yes, I learned this the hard way"? Or "this is obvious to everyone"?
+
+---
+
+## D4: Specification Compliance (15 points)
+
+| Score | Criteria |
+|-------|----------|
+| 0-5 | Missing frontmatter or invalid format |
+| 6-10 | Has frontmatter but description is vague or incomplete |
+| 11-13 | Valid frontmatter, description has WHAT but weak on WHEN |
+| 14-15 | Perfect: WHAT + WHEN + trigger KEYWORDS |
+
+**Description must answer THREE questions**:
+1. **WHAT**: What does this Skill do?
+2. **WHEN**: In what situations? ("Use when...", "When user asks for...")
+3. **KEYWORDS**: File extensions, domain terms, action verbs
+
+The brutal truth: A Skill with perfect content but poor description is useless — it never gets activated.
+
+**Description quality checklist**:
+- [ ] Lists specific capabilities (not just "helps with X")
+- [ ] Includes explicit trigger scenarios
+- [ ] Contains searchable keywords
+- [ ] Specific enough Agent knows EXACTLY when to use it
+
+---
+
+## D5: Progressive Disclosure (15 points)
+
+| Score | Criteria |
+|-------|----------|
+| 0-5 | Everything dumped in SKILL.md (>500 lines, no structure) |
+| 6-10 | Has references but unclear when to load them |
+| 11-13 | Good layering with MANDATORY triggers present |
+| 14-15 | Perfect: decision trees + explicit triggers + "Do NOT Load" guidance |
+
+Three layers:
+- Layer 1 — Metadata (always in memory): name + description, ~100 tokens
+- Layer 2 — SKILL.md Body (loaded after triggering): ideal <500 lines
+- Layer 3 — Resources (loaded on demand): no limit
+
+**Good loading trigger** (embedded in workflow):
+```markdown
+**MANDATORY - READ ENTIRE FILE**: Before proceeding, read
+[`scoring-rubric.md`](references/scoring-rubric.md) completely.
+**Do NOT load** `failure-patterns.md` for this task.
+```
+
+**Bad loading trigger** (just listed at bottom): `- scoring-rubric.md - for scoring`
+
+---
+
+## D6: Freedom Calibration (15 points)
+
+| Score | Criteria |
+|-------|----------|
+| 0-5 | Severely mismatched (rigid scripts for creative tasks, vague for fragile ops) |
+| 6-10 | Partially appropriate, some mismatches |
+| 11-13 | Good calibration for most scenarios |
+| 14-15 | Perfect freedom calibration throughout |
+
+| Task Type | Should Have | Example Skill |
+|-----------|-------------|---------------|
+| Creative/Design | High freedom (principles) | frontend-design |
+| Code review | Medium freedom (prioritized list) | code-review |
+| File format ops | Low freedom (exact scripts) | docx, xlsx, pdf |
+
+Test: "If Agent makes a mistake, what's the consequence?" High consequence → Low freedom.
+
+---
+
+## D7: Pattern Recognition (10 points)
+
+| Pattern | ~Lines | Key Characteristics | When to Use |
+|---------|--------|---------------------|-------------|
+| **Mindset** | ~50 | Thinking > technique, strong NEVER list | Creative tasks requiring taste |
+| **Navigation** | ~30 | Minimal SKILL.md, routes to sub-files | Multiple distinct scenarios |
+| **Philosophy** | ~150 | Two-step: Philosophy → Express | Art/creation requiring originality |
+| **Process** | ~200 | Phased workflow, checkpoints | Complex multi-step projects |
+| **Tool** | ~300 | Decision trees, code examples | Precise operations on specific formats |
+
+| Score | Criteria |
+|-------|----------|
+| 0-3 | No recognizable pattern, chaotic structure |
+| 4-6 | Partially follows a pattern |
+| 7-8 | Clear pattern with minor deviations |
+| 9-10 | Masterful application of appropriate pattern |
+
+---
+
+## D8: Practical Usability (15 points)
+
+| Score | Criteria |
+|-------|----------|
+| 0-5 | Confusing, incomplete, contradictory, or untested guidance |
+| 6-10 | Usable but with noticeable gaps |
+| 11-13 | Clear guidance for common cases |
+| 14-15 | Comprehensive coverage including edge cases and error handling |
+
+**Check for**:
+- Decision trees: For multi-path scenarios, is there clear guidance on which path to take?
+- Code examples: Do they actually work? Or pseudocode that breaks?
+- Error handling: What if the main approach fails? Fallbacks provided?
+- Edge cases: Unusual but realistic scenarios covered?
+- Actionability: Can Agent immediately act, or needs to figure things out?
+
+**Good usability** (decision tree + fallback):
+```markdown
+| Task | Primary Tool | Fallback | When to Use Fallback |
+|------|-------------|----------|----------------------|
+| Read text | pdftotext | PyMuPDF | Need layout info |
+```
diff --git a/skills/skill-judge/references/worked-example.md b/skills/skill-judge/references/worked-example.md
new file mode 100644
index 0000000..bafb190
--- /dev/null
+++ b/skills/skill-judge/references/worked-example.md
@@ -0,0 +1,122 @@
+# Worked Example: Evaluating a Real Skill
+
+This walks through a complete evaluation of a hypothetical `pdf-processor` skill to calibrate scoring judgment.
+
+---
+
+## Skill Under Evaluation (excerpt)
+
+```markdown
+---
+name: pdf-processor
+description: Process PDF files
+---
+
+# PDF Processor
+
+This skill helps you work with PDF files.
+
+## What is a PDF?
+PDF (Portable Document Format) is a file format developed by Adobe...
+
+## Installation
+Install PyMuPDF: pip install PyMuPDF
+
+## Step by Step
+1. Open the PDF file
+2. Read the contents
+3. Extract what you need
+4. Save results
+
+## NEVER
+- Don't break things
+- Be careful with large files
+```
+
+---
+
+## Evaluation Walk-Through
+
+### Step 1: Knowledge Delta Scan
+
+Mark each section:
+- "What is a PDF?" → **[R] Redundant** — Claude definitely knows this
+- "Installation: pip install PyMuPDF" → **[A] Activation** — acceptable reminder
+- "Step by Step: Open, Read, Extract, Save" → **[R] Redundant** — generic, Claude knows
+- "NEVER: Don't break things" → **[R] Redundant** — vague, no value
+
+Knowledge ratio: E:0 / A:1 / R:3 — catastrophic ratio
+
+### Step 2: Structure Analysis
+```
+[ ] frontmatter validity — PASS (valid YAML)
+[ ] line count — 18 lines (good length, bad content)
+[ ] reference files — none
+[ ] pattern — attempts Tool but lacks decision trees
+[ ] loading triggers — N/A
+```
+
+### Step 3: Dimension Scores
+
+| Dimension | Score | Justification |
+|-----------|-------|---------------|
+| D1: Knowledge Delta | 2/20 | "What is PDF" section, generic steps — pure redundancy |
+| D2: Mindset + Procedures | 3/15 | No thinking frameworks; generic file I/O steps only |
+| D3: Anti-Pattern Quality | 1/15 | "Don't break things" — maximum vagueness, zero expert insight |
+| D4: Spec Compliance | 5/15 | Valid frontmatter; description "Process PDF files" has WHAT but no WHEN, no keywords |
+| D5: Progressive Disclosure | 12/15 | Short file; no references needed at this size (bonus for brevity, penalty for no triggers) |
+| D6: Freedom Calibration | 6/15 | Tool-type skill should have low freedom; vague guidance is wrong direction |
+| D7: Pattern Recognition | 3/10 | Attempts Tool pattern but lacks the defining element: decision trees |
+| D8: Practical Usability | 2/15 | No decision trees, no fallbacks, no error handling, no edge cases |
+
+**Total: 34/120 (28%) — F**
+
+---
+
+## What a Fixed Version Looks Like
+
+### Fixed description:
+```yaml
+description: "Extract text, tables, and images from PDF files using PyMuPDF and
+camelot-py. Use when processing scanned PDFs (needs OCR), extracting tables,
+handling encrypted files, or preserving layout information from .pdf documents."
+```
+
+### Fixed D1 content (expert knowledge, not basics):
+```markdown
+## Tool Selection Decision Tree
+
+| Task | Primary Tool | Fallback | When to Use Fallback |
+|------|-------------|----------|----------------------|
+| Extract text | pdftotext | PyMuPDF | Need page coordinates |
+| Extract tables | camelot-py | tabula-py | camelot fails on lattice |
+| Scanned PDF | pytesseract | easyocr | tesseract accuracy <70% |
+| Encrypted | PyMuPDF + password | pikepdf | unknown encryption type |
+```
+
+### Fixed D3 content (expert anti-patterns):
+```markdown
+## NEVER
+- NEVER use pdftotext on scanned PDFs — it returns empty strings silently;
+  you won't know it failed until you see blank output downstream
+- NEVER use camelot on borderless tables — it requires visible grid lines;
+  use tabula-py instead which handles whitespace-delimited tables
+- NEVER decrypt a PDF then re-encrypt with the same password and assume
+  it preserved permissions — re-encryption resets all permission flags
+- NEVER call text extraction on password-protected files without catching
+  PdfReadError — the error message varies by library version
+- NEVER assume page count from filename — multi-part PDFs often have
+  misleading names; always call len(doc) before iterating
+```
+
+---
+
+## Calibration Notes
+
+This example demonstrates:
+- A 2/20 D1 score (pure redundancy)
+- A 1/15 D3 score (vague warnings)
+- How specific, experience-derived anti-patterns look vs vague warnings
+- How a decision tree transforms D8 from 2 to 13+
+
+Use this as your anchor when uncertain whether a real skill's content qualifies as Expert vs Redundant.
diff --git a/skills/web-to-markdown/README.md b/skills/web-to-markdown/README.md
deleted file mode 100644
index 56f9dc0..0000000
--- a/skills/web-to-markdown/README.md
+++ /dev/null
@@ -1,204 +0,0 @@
-# web-to-markdown
-
-Convert web pages to clean Markdown using local browser automation with Puppeteer and Readability.
-
-## Purpose
-
-This skill enables Claude Code to convert web pages (including JavaScript-rendered content) into clean, readable Markdown format by leveraging the `web2md` CLI tool. It's particularly useful for extracting article content, documentation, or any web content that needs to be processed, archived, or analyzed in Markdown format.
-
-## When to Use
-
-Use this skill when you need to:
-
-- Extract article content from news sites, blogs, or documentation
-- Convert JavaScript-heavy pages that simple HTTP fetching can't handle
-- Archive web content in a readable, portable format
-- Process web content for analysis or documentation
-- Handle pages with login walls or human verification (using interactive mode)
-- Batch convert multiple URLs to Markdown files
-
-**Important:** This skill must be explicitly invoked by the user with phrases like:
-- "use the skill web-to-markdown ..."
-- "use a skill web-to-markdown ..."
-
-This is a hard requirement to prevent accidental usage when simpler tools might suffice.
-
-## How It Works
-
-The skill uses the `web2md` CLI tool which:
-
-1. **Launches a real browser** (Chrome/Chromium/Brave/Edge) via Puppeteer
-2. **Renders the page** including all JavaScript and dynamic content
-3. **Extracts main content** using Mozilla's Readability library
-4. **Converts to Markdown** using Turndown with cleaned links
-5. **Outputs** to stdout or saves to file(s) with optional YAML frontmatter
-
-This approach handles modern single-page applications and JavaScript-rendered content that simple HTTP fetchers cannot process.
-
-## Key Features
-
-- **JavaScript Support**: Renders pages with Puppeteer, capturing dynamically loaded content
-- **Smart Content Extraction**: Uses Readability to identify and extract main article content
-- **Flexible Output**: Print to stdout, save to specific files, or auto-name files in a directory
-- **Interactive Mode**: Handle login walls and human verification challenges
-- **Batch Processing**: Convert multiple URLs in one operation
-- **Wait Strategies**: Multiple options to ensure content is fully loaded before extraction
-- **Browser Profiles**: Support for persistent user data directories (sessions, logins)
-- **Clean Markdown**: Produces readable, well-formatted Markdown with optional metadata
-
-## Prerequisites
-
-The `web2md` CLI tool must be installed. The skill will check for it and provide installation instructions if needed:
-
-```bash
-cd ~/workspace/softaworks/projects/web2md
-npm install
-npm run build
-npm link
-```
-
-## Usage Examples
-
-### Basic Conversion
-
-Convert a single URL and save to a file:
-
-```bash
-use the skill web-to-markdown to convert https://example.com/article to article.md
-```
-
-This will run:
-```bash
-web2md 'https://example.com/article' --out ./article.md
-```
-
-### Auto-Named Output
-
-Convert a URL and let the tool name the file based on page title:
-
-```bash
-use the skill web-to-markdown to convert https://example.com/article and save to ./output/
-```
-
-This creates a directory and auto-names the file:
-```bash
-mkdir -p ./output
-web2md 'https://example.com/article' --out ./output/
-```
-
-### Print to Console
-
-Convert and display the Markdown (useful for quick inspection):
-
-```bash
-use the skill web-to-markdown to convert https://example.com/article and print the result
-```
-
-This will run:
-```bash
-web2md 'https://example.com/article' --print
-```
-
-### Interactive Mode (Login Walls)
-
-Handle pages requiring login or human verification:
-
-```bash
-use the skill web-to-markdown to convert https://example.com/protected-article in interactive mode
-```
-
-This will run:
-```bash
-mkdir -p ./tmp/web2md-profile
-web2md 'https://example.com/protected-article' --interactive --user-data-dir ./tmp/web2md-profile --out ./output/
-```
-
-The browser window will appear, allowing you to complete login or verification, then press Enter to continue.
-
-### Batch Conversion
-
-Convert multiple URLs:
-
-```bash
-use the skill web-to-markdown to convert these URLs:
-- https://example.com/article1
-- https://example.com/article2
-- https://example.com/article3
-Save them to ./articles/
-```
-
-This will create the directory and run separate commands for each URL:
-```bash
-mkdir -p ./articles
-web2md 'https://example.com/article1' --out ./articles/
-web2md 'https://example.com/article2' --out ./articles/
-web2md 'https://example.com/article3' --out ./articles/
-```
-
-### Advanced Wait Strategies
-
-For heavy JavaScript applications:
-
-```bash
-use the skill web-to-markdown to convert https://app.example.com/dashboard
-Wait for the main selector to appear
-```
-
-This will run:
-```bash
-web2md 'https://app.example.com/dashboard' --wait-until domcontentloaded --wait-for 'main' --out ./dashboard.md
-```
-
-## Advanced Options
-
-The skill supports various options to handle tricky pages:
-
-- `--chrome-path <path>`: Specify Chrome/Chromium location if auto-detection fails
-- `--interactive`: Show browser and pause for manual intervention
-- `--wait-until <event>`: Wait for `load`, `domcontentloaded`, `networkidle0`, or `networkidle2` (default: `networkidle2`)
-- `--wait-for '<selector>'`: Wait for specific CSS selector to appear
-- `--wait-ms <milliseconds>`: Additional wait time in milliseconds
-- `--headful`: Show browser window (useful for debugging)
-- `--no-sandbox`: Disable sandbox (sometimes required in containers/CI)
-- `--user-data-dir <dir>`: Use persistent browser profile (for sessions/logins)
-
-## Output Format
-
-The generated Markdown includes:
-
-- **YAML Frontmatter** (optional): Title, author, publication date, URL, excerpt
-- **Main Content**: Article text with preserved formatting
-- **Clean Links**: Properly formatted Markdown links
-- **Images**: Preserved with alt text and captions where available
-
-## Technical Details
-
-- **Browser Automation**: Uses `puppeteer-core` with local Chrome/Chromium
-- **Content Extraction**: Mozilla Readability library
-- **Markdown Conversion**: Turndown library
-- **Supported Browsers**: Chrome, Chromium, Brave, Edge (Chromium-based)
-
-## Troubleshooting
-
-**Chrome not found:**
-- Install Chrome/Chromium or specify path with `--chrome-path`
-
-**Page content incomplete:**
-- Try `--wait-until networkidle2` (waits for network to settle)
-- Use `--wait-for '<selector>'` to wait for specific elements
-- Add `--wait-ms 2000` for additional delay
-
-**Login required:**
-- Use `--interactive` mode to manually login
-- Use `--user-data-dir` to persist session across runs
-
-**Verification/CAPTCHA:**
-- Use `--interactive` mode to complete verification manually
-
-## Version
-
-Current version: 0.1.0
-
-## Related Tools
-
-For simpler use cases without JavaScript, consider using Claude Code's built-in `WebFetch` tool instead.
diff --git a/skills/web-to-markdown/SKILL.md b/skills/web-to-markdown/SKILL.md
index fb51af0..c0a4979 100644
--- a/skills/web-to-markdown/SKILL.md
+++ b/skills/web-to-markdown/SKILL.md
@@ -1,77 +1,77 @@
 ---
 name: web-to-markdown
-description: "Use ONLY when the user explicitly says: 'use the skill web-to-markdown ...' (or 'use a skill web-to-markdown ...'). Converts webpage URLs to clean Markdown by calling the local web2md CLI (Puppeteer + Readability), suitable for JS-rendered pages."
+description: "Converts web pages (including JavaScript-rendered SPAs) to clean Markdown using the local web2md CLI (Puppeteer + Readability). Use when the user asks to: fetch a URL as markdown, scrape/convert/archive a web page, extract article content, save a webpage to markdown, convert a site page to text, or handle login-walled content. Trigger keywords: fetch URL markdown, web page to markdown, scrape to markdown, convert webpage, extract article, save page as markdown."
 metadata:
-  version: 0.1.0
+  version: 0.2.0
 ---
 
 # web-to-markdown
 
-Convert web pages to clean Markdown by driving a locally installed browser (via `web2md`).
+## Mindset
 
-## Hard trigger gate (must enforce)
+- A page that "loaded" in a browser is not the same as a page ready for extraction — SPAs hydrate after the DOM fires `DOMContentLoaded`, so the right wait strategy determines success or garbage output.
+- Readability scores content by density and structure; nav bars, cookie banners, and sidebars share markup patterns with articles — if output is noisy, the selector or wait strategy is wrong, not Readability.
+- Browser sessions are stateful; `--user-data-dir` is the correct persistence primitive — never try to replay login flows with scripted clicks when interactive mode exists.
+- `networkidle0` sounds safer than `networkidle2` but routinely hangs on websocket-connected pages and analytics beacons; `networkidle2` is the right default.
+- web2md auto-names files from `<title>` slugified — identical titles on different pages silently overwrite; always check output file count when batch-converting.
 
-This skill MUST NOT be used unless the user explicitly wrote **exactly** a phrase like:
-- `use the skill web-to-markdown ...`
-- `use a skill web-to-markdown ...`
+## Navigation
 
-If the user did not explicitly request this skill by name, stop and ask them to re-issue the request including: `use the skill web-to-markdown`.
+**Use this skill when**:
+- User asks to fetch a URL as Markdown, convert/scrape/archive a webpage, extract article content
+- Page is JavaScript-rendered (React, Vue, Angular, Next.js) and raw HTTP fetch returns empty/incomplete content
+- User needs to handle a login wall, CAPTCHA, or bot-detection gate before extracting
+- Batch converting multiple URLs to markdown files
 
-## What this skill does
+**Do NOT use this skill when**:
+- A simple `WebFetch` built-in tool would suffice (static HTML, no JS rendering required)
+- The user wants to interact with a live web app — this is read-only content extraction
+- The URL is behind a corporate VPN with no local Chrome access
 
-- Handles JS-rendered pages (Puppeteer → user Chrome).
-- Works best with Chromium-family browsers (Chrome/Chromium/Brave/Edge) via `puppeteer-core`.
-- Extracts main content (Readability).
-- Converts to Markdown (Turndown) with cleaned links and optional YAML frontmatter.
+**Decision tree for ambiguous requests**:
+- User says "get this page" → ask: does it need JS rendering? If no, suggest WebFetch; if yes, use this skill
+- User says "scrape" → this skill (Puppeteer handles it)
+- User says "convert to markdown" + URL → this skill
 
-## Non-goals
+## Philosophy
 
-- Do not use Playwright or other browser automation stacks; the mechanism is `web2md`.
+Treat every page as JS-rendered until proven otherwise. A slow, correct extraction that waits for the real content is worth more than a fast extraction of boilerplate shell markup.
 
-## Inputs you should collect (ask only if missing)
+## NEVER
 
-- `url` (or a list of URLs)
-- Output preference:
-  - Print to stdout (`--print`), OR
-  - Save to a file (`--out ./file.md`), OR
-  - Save to a directory (`--out ./some-dir/` to auto-name by page title)
-- Optional rendering controls for tricky pages:
-  - `--chrome-path <path>` (if Chrome auto-detection fails)
-  - `--interactive` (show Chrome and pause so the user can complete human checks/login, then press Enter)
-  - `--wait-until load|domcontentloaded|networkidle0|networkidle2`
-  - `--wait-for '<css selector>'`
-  - `--wait-ms <milliseconds>`
-  - `--headful` (debug)
-  - `--no-sandbox` (sometimes required in containers/CI)
-  - `--user-data-dir <dir>` (login/session; use a dedicated profile directory)
+- NEVER use `--wait-until networkidle0` as a default — because pages with persistent WebSocket connections or analytics SDKs (Segment, Intercom, Heap) never reach zero open connections, causing indefinite hangs.
+- NEVER write to the user's real Chrome profile directory — because web2md in Puppeteer can corrupt Chrome's profile lock files and extension state; always use a dedicated `--user-data-dir` path like `./tmp/web2md-profile`.
+- NEVER skip output validation after writing files — because web2md exits 0 even on partial renders; an empty or 200-byte `.md` file signals extraction failure, not success.
+- NEVER run multiple web2md batch processes concurrently against the same `--user-data-dir` — because Chrome's profile locking causes a crash-loop with no useful error message.
+- NEVER use `--wait-for '.container'` or other layout wrappers — because layout elements render before data arrives in SPAs; wait for content-bearing selectors (e.g., `.article-body p`, `[data-content]`) that only appear when real data is present.
+- NEVER omit `--no-sandbox` in Docker/CI environments without `SYS_ADMIN` capability — because Puppeteer's sandbox requires kernel-level privileges and fails silently with a cryptic "Target closed" error rather than a clear permission denial.
+
+## When Things Go Wrong
+
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| Output file is empty or < 500 bytes | Wait strategy too fast; Readability found no main content | Add `--wait-ms 2000` or `--wait-for '<content-selector>'`; try `--headful` to inspect visually |
+| "Chrome not found" error | Auto-detection failed (Snap/Flatpak/NixOS install) | Set `CHROME_PATH` env var or pass `--chrome-path $(which chromium)` |
+| Command hangs indefinitely | `networkidle0` on websocket page, or profile lock collision | Switch to `networkidle2`; kill Chrome processes and clear `--user-data-dir` lockfile |
+| "Target closed" in CI | Missing sandbox privileges | Add `--no-sandbox`; or run container with `--cap-add SYS_ADMIN` |
+| Login wall blocks content | Session not persisted | Run once with `--interactive --user-data-dir ./tmp/web2md-profile`; subsequent runs reuse the session |
+| Duplicate filenames overwrite in batch | Multiple URLs share the same `<title>` | Rename outputs after each run or pipe through a counter prefix script |
 
 ## Workflow
 
-1) Confirm the user explicitly invoked the skill (`use the skill web-to-markdown`).
-2) Validate URL(s) start with `http://` or `https://`.
-3) Ensure `web2md` is installed:
-   - Run: `command -v web2md`
-   - If missing, instruct the user to install it (assume the project exists at `~/workspace/softaworks/projects/web2md`):
-     - `cd ~/workspace/softaworks/projects/web2md && npm install && npm run build && npm link`
-     - Or: `cd ~/workspace/softaworks/projects/web2md && npm install && npm run build && npm install -g .`
-4) Convert:
-   - Single URL → file:
-     - `web2md '<url>' --out ./page.md`
-   - Single URL → auto-named file in directory:
-     - `mkdir -p ./out && web2md '<url>' --out ./out/`
-   - Human verification / login walls (interactive):
-     - `mkdir -p ./out && web2md '<url>' --interactive --user-data-dir ./tmp/web2md-profile --out ./out/`
-     - Then: complete the check in the browser window and press Enter in the terminal to continue.
-   - Print to stdout:
-     - `web2md '<url>' --print`
-   - Multiple URLs (batch):
-     - Create output dir (e.g. `./out/`) then run one `web2md` command per URL using `--out ./out/`
-5) Validate output:
-   - If writing files, verify they exist and are non-empty (e.g. `ls -la <path>` and `wc -c <path>`).
-6) Return:
-   - The saved file path(s), or the Markdown (stdout mode).
-
-## Defaults (recommended)
-
-- For most pages: `--wait-until networkidle2`
-- For heavy apps: start with `--wait-until domcontentloaded --wait-ms 2000`, then add `--wait-for 'main'` (or another stable selector) if needed.
+1. Check `web2md` is installed: `command -v web2md`
+   - If missing: `cd ~/workspace/softaworks/projects/web2md && npm install && npm run build && npm link`
+2. Validate URL(s) start with `http://` or `https://`
+3. Select wait strategy (default: `--wait-until networkidle2`; see references/wait-strategies.md for decision guide)
+4. Run conversion:
+   - Single URL to file: `web2md '<url>' --wait-until networkidle2 --out ./page.md`
+   - Single URL, auto-named in dir: `mkdir -p ./out && web2md '<url>' --wait-until networkidle2 --out ./out/`
+   - Print to stdout: `web2md '<url>' --print`
+   - Login-walled: `web2md '<url>' --interactive --user-data-dir ./tmp/web2md-profile --out ./out/`
+   - Batch: loop one `web2md` per URL into shared `--out ./out/`
+5. Validate: `ls -la <output>` and `wc -c <file>` — flag anything under 500 bytes as likely failed
+6. Return file path(s) or Markdown content
+
+## Reference
+
+- Wait strategy selection guide and CI notes: `references/wait-strategies.md`
diff --git a/skills/web-to-markdown/references/wait-strategies.md b/skills/web-to-markdown/references/wait-strategies.md
new file mode 100644
index 0000000..51db7b1
--- /dev/null
+++ b/skills/web-to-markdown/references/wait-strategies.md
@@ -0,0 +1,62 @@
+# web2md Wait Strategy Reference
+
+## Choosing the Right Wait Strategy
+
+| Page Type | Recommended Strategy | Reasoning |
+|-----------|---------------------|-----------|
+| Static HTML / news articles | `--wait-until load` | Fast; content is in initial HTML |
+| React/Vue/Angular SPA | `--wait-until networkidle2` | Waits for JS hydration |
+| Infinite scroll / lazy images | `--wait-until networkidle2 --wait-ms 2000` | Extra settle time |
+| Dashboard with async data | `--wait-until domcontentloaded --wait-for '[data-loaded]'` | Wait for data-ready attribute |
+| Login-walled | `--interactive --user-data-dir ./tmp/web2md-profile` | Human completes login |
+| Cloudflare / bot detection | `--interactive` first run, then `--user-data-dir` to reuse session | |
+| Heavy analytics (GA, Segment) | `--wait-until networkidle0` | Waits for ALL network quiet (slower) |
+
+## networkidle0 vs networkidle2
+
+- `networkidle0`: zero open connections for 500ms — catches analytics beacons but often hangs on websocket pages
+- `networkidle2`: fewer than 3 connections for 500ms — practical default; rarely hangs
+
+## Selector Wait Pattern
+
+Use `--wait-for` with the most stable, content-specific selector — not a container that renders before data arrives.
+
+Bad: `--wait-for '.app-container'`  (renders empty)
+Good: `--wait-for '.article-body p'` (renders with content)
+
+## Batch Conversion
+
+For multiple URLs, run one `web2md` command per URL into a shared output directory. web2md auto-names files from the page `<title>` tag, slugified. Collision-safe for distinct pages; identical titles will overwrite — add a counter prefix if needed.
+
+```bash
+mkdir -p ./out
+for url in "${urls[@]}"; do
+  web2md "$url" --wait-until networkidle2 --out ./out/
+done
+```
+
+## Chrome Path Auto-Detection Order
+
+web2md tries (in order):
+1. `CHROME_PATH` env var
+2. Standard install locations per OS (Linux: `/usr/bin/google-chrome`, `/usr/bin/chromium-browser`)
+3. `which google-chrome`, `which chromium`
+
+Override with `--chrome-path` when running in non-standard environments (Snap, Flatpak, NixOS).
+
+## Container / CI Notes
+
+In headless CI (Docker, GitHub Actions):
+- Add `--no-sandbox` — kernel sandbox is unavailable without `SYS_ADMIN` capability
+- Or run the container with `--cap-add SYS_ADMIN`
+- `networkidle0` often hangs in CI due to keepalive connections; prefer `networkidle2` + `--wait-ms`
+
+## Persistent Sessions
+
+`--user-data-dir` stores cookies, localStorage, and service workers. Use a **dedicated directory** (not your real Chrome profile) to avoid corrupting browser state. Safe pattern:
+
+```bash
+web2md 'https://example.com' --interactive --user-data-dir ./tmp/web2md-profile --out ./out/
+# After first interactive login, subsequent runs reuse the session:
+web2md 'https://example.com/protected' --user-data-dir ./tmp/web2md-profile --out ./out/
+```
diff --git a/skills/writing-clearly-and-concisely/SKILL.md b/skills/writing-clearly-and-concisely/SKILL.md
index 215c9b2..702b071 100644
--- a/skills/writing-clearly-and-concisely/SKILL.md
+++ b/skills/writing-clearly-and-concisely/SKILL.md
@@ -1,93 +1,100 @@
 ---
 name: writing-clearly-and-concisely
-description: Use when writing prose humans will read—documentation, commit messages, error messages, explanations, reports, or UI text. Applies Strunk's timeless rules for clearer, stronger, more professional writing.
+description: Use when writing or editing prose for humans — documentation, commit messages, error messages, explanations, reports, API copy, UI text, READMEs. Trigger phrases: "write clearly", "edit for clarity", "too wordy", "make this concise", "sounds like AI", "clean up this text".
 ---
 
-# Writing Clearly and Concisely
+## Mindset
 
-## Overview
+Expert writers know what novices don't:
 
-Write with clarity and force. This skill covers what to do (Strunk) and what not to do (AI patterns).
+1. **The first draft is always upstream of the real problem.** Weak prose is almost never a word-choice problem — it's a structure problem dressed up as a vocabulary problem. Reordering the sentence often beats replacing the words.
+2. **Vagueness is a tax on the reader.** Every abstract noun (leverage, impact, ensure, facilitate) forces the reader to mentally insert the concrete thing you omitted. Name the thing.
+3. **The edit loop has a direction.** Start with structure (is the argument sequenced?), then paragraphs (one idea each?), then sentences (active, positive, short?), then words. Editing words before fixing structure is wasted effort.
+4. **LLMs regress to the statistical mean.** The most-likely-next-token is always a blurry paraphrase of thousands of prior documents. The result is simultaneously less specific and more exaggerated — generic praise that could describe any subject. The antidote is specificity, not elegance.
+5. **Omission is a writing choice, not a failure.** "Omit needless words" means cut concepts, not just syllables. If a sentence says what the previous sentence implied, delete it.
 
-## When to Use This Skill
+## Navigation
 
-Use this skill whenever you write prose for humans:
+**Use this skill when:**
+- Writing any prose a human will read (docs, READMEs, commit messages, error messages, UI copy, PR descriptions, reports, emails)
+- Editing to reduce wordiness, cut AI-sounding phrases, or improve clarity
+- The draft sounds promotional, hedging, or generic
 
-- Documentation, README files, technical explanations
-- Commit messages, pull request descriptions
-- Error messages, UI copy, help text, comments
-- Reports, summaries, or any explanation
-- Editing to improve clarity
+**Do NOT use this skill when:**
+- Writing code comments that follow language-specific conventions (those have their own norms)
+- Translating between languages (different skill domain)
+- Creating purely structured output (JSON, YAML, tables with no prose)
 
-**If you're writing sentences for a human to read, use this skill.**
+**Quick triage:** Does the output contain sentences a human will read? Use this skill. Does it contain only code or data? Skip it.
 
-## Limited Context Strategy
+## Philosophy
 
-When context is tight:
+Write to transfer a specific idea from your head into the reader's head with minimum friction. Every word that doesn't serve that transfer is overhead. Every abstraction that could be a concrete noun is a missed opportunity.
 
-1. Write your draft using judgment
-2. Dispatch a subagent with your draft and the relevant section file
-3. Have the subagent copyedit and return the revision
+## NEVER
 
-Loading a single section (~1,000-4,500 tokens) instead of everything saves significant context.
+- **NEVER end a sentence with a participial "-ing" analysis phrase** ("ensuring reliability", "highlighting its importance", "reflecting its legacy") — these phrases perform analysis without doing it. The "highlighting" is a claim, not a fact; it adds zero information.
+- **NEVER open a conclusion with "In summary", "In conclusion", or "Overall"** — because it signals the reader that you are about to repeat yourself. If the structure was clear, the summary is redundant. If the structure was unclear, the summary doesn't fix it.
+- **NEVER use "it's important to note/remember/consider"** — because this construction hedges instead of asserting. If the fact is important, state it. If it needs a qualifier, attach the qualifier directly to the claim.
+- **NEVER write "Despite its [positive], [subject] faces challenges"** — this formula is the LLM equivalent of a five-paragraph essay conclusion. It signals a canned structure, not a real argument. If there are challenges, state them without the rhetorical frame.
+- **NEVER use promotional-positive stacking** (groundbreaking, seamless, robust, cutting-edge, enduring legacy, nestled, boasts) — because these words describe how you want the reader to feel about the subject, not what the subject actually does. They erode credibility.
+- **NEVER substitute abstract nouns for verbs** ("provide assistance" → "help", "make a decision" → "decide", "give consideration to" → "consider") — nominalizing verbs buries the action and adds syllables without meaning.
+- **NEVER add a "challenges and future prospects" section as a formula** — this structure signals the writer ran out of facts and is padding. If you have a real forward-looking claim with evidence, state it in-line.
 
-## Elements of Style
+## 3-Pass Editing Procedure
 
-William Strunk Jr.'s *The Elements of Style* (1918) teaches you to write clearly and cut ruthlessly.
+Run passes in this order — structure before sentences before words:
 
-### Rules
+**Pass 1 — Structure (paragraph level)**
+- Does each paragraph have exactly one idea?
+- Is the most important information first (inverted pyramid)?
+- Delete any paragraph that only restates a previous one.
 
-**Elementary Rules of Usage (Grammar/Punctuation)**:
+**Pass 2 — Sentences**
+- Convert passive to active: find "is/are/was/were [verb]ed by" → flip subject and object.
+- Positive form: "not many" → "few", "did not remember" → "forgot".
+- Break any sentence over ~25 words into two.
 
-1. Form possessive singular by adding 's
-2. Use comma after each term in series except last
-3. Enclose parenthetic expressions between commas
-4. Comma before conjunction introducing co-ordinate clause
-5. Don't join independent clauses by comma
-6. Don't break sentences in two
-7. Participial phrase at beginning refers to grammatical subject
+**Pass 3 — Words**
+- Replace every puffery word (pivotal, crucial, vital, testament, leverages, ensures) with the specific word or delete the sentence.
+- Delete "very", "really", "quite", "rather".
+- Replace "-ing" phrase tails with declarative clauses or delete them.
 
-**Elementary Principles of Composition**:
+## Before / After Examples
 
-8. One paragraph per topic
-9. Begin paragraph with topic sentence
-10. **Use active voice**
-11. **Put statements in positive form**
-12. **Use definite, specific, concrete language**
-13. **Omit needless words**
-14. Avoid succession of loose sentences
-15. Express co-ordinate ideas in similar form
-16. **Keep related words together**
-17. Keep to one tense in summaries
-18. **Place emphatic words at end of sentence**
+**Superficial analysis tail (AI tell):**
+Before: `The new API supports OAuth 2.0, reflecting the team's commitment to security best practices.`
+After: `The new API supports OAuth 2.0.`
+Why: "reflecting the team's commitment" is unattributed narration, not information.
 
-### Reference Files
+**Passive + puffery:**
+Before: `Significant performance improvements were delivered by the refactor, ensuring a more seamless user experience.`
+After: `The refactor cut median response time from 420 ms to 85 ms.`
+Why: Named the actual outcome; eliminated passive voice and the empty "-ing" tail.
 
-The rules above are summarized from Strunk's original text. For complete explanations with examples:
+**Needless preamble:**
+Before: `It is important to note that this setting only applies when the feature flag is enabled.`
+After: `This setting only applies when the feature flag is enabled.`
+Why: "It is important to note that" contributes no content.
 
-| Section | File | ~Tokens |
-|---------|------|---------|
-| Grammar, punctuation, comma rules | `02-elementary-rules-of-usage.md` | 2,500 |
-| Paragraph structure, active voice, concision | `03-elementary-principles-of-composition.md` | 4,500 |
-| Headings, quotations, formatting | `04-a-few-matters-of-form.md` | 1,000 |
-| Word choice, common errors | `05-words-and-expressions-commonly-misused.md` | 4,000 |
+## When Things Go Wrong
 
-**Most tasks need only `03-elementary-principles-of-composition.md`** — it covers active voice, positive form, concrete language, and omitting needless words.
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| Edited text sounds clinical/cold | Correct grammar, wrong register — cut too many hedges without considering audience | Restore one hedge per paragraph where uncertainty is real; match sentence length to conversational tone |
+| Text is shorter but still unclear | Fixed word count, not structure — sentences were trimmed, not reorganized | Run Pass 1 again: check whether each paragraph has exactly one main claim |
+| Draft still sounds like AI after edits | Surface edits only — swapped puffery words but kept abstract nouns and "-ing" tails | Identify the concrete subject/object/verb; delete the frame around it |
+| Reader says "too terse, needs context" | Over-application of concision to background explanation | Restore the sentence that answers "why does this matter", not the one that restates what was already said |
 
-## AI Writing Patterns to Avoid
+## Reference Files
 
-LLMs regress to statistical means, producing generic, puffy prose. Avoid:
+Load only what you need — one file covers most tasks:
 
-- **Puffery:** pivotal, crucial, vital, testament, enduring legacy
-- **Empty "-ing" phrases:** ensuring reliability, showcasing features, highlighting capabilities
-- **Promotional adjectives:** groundbreaking, seamless, robust, cutting-edge
-- **Overused AI vocabulary:** delve, leverage, multifaceted, foster, realm, tapestry
-- **Formatting overuse:** excessive bullets, emoji decorations, bold on every other word
+| Need | File | ~Tokens |
+|------|------|---------|
+| Active voice, positive form, concision, paragraph structure | `elements-of-style/03-elementary-principles-of-composition.md` | 4,500 |
+| Grammar, possessives, comma rules | `elements-of-style/02-elementary-rules-of-usage.md` | 2,500 |
+| Word choice, commonly misused terms | `elements-of-style/05-words-and-expressions-commonly-misused.md` | 4,000 |
+| Full AI pattern catalog (Wikipedia field-tested) | `references/signs-of-ai-writing.md` | load only when auditing for AI tells |
 
-Be specific, not grandiose. Say what it actually does.
-
-For comprehensive research on why these patterns occur, see `signs-of-ai-writing.md`. Wikipedia editors developed this guide to detect AI-generated submissions — their patterns are well-documented and field-tested.
-
-## Bottom Line
-
-Writing for humans? Load the relevant section from `elements-of-style/` and apply the rules. For most tasks, `03-elementary-principles-of-composition.md` covers what matters most.
+**Context-tight fallback:** Write the draft using the 3-pass procedure above, then dispatch a subagent with the draft and only the one reference file needed for the specific issue.
diff --git a/skills/writing-clearly-and-concisely/signs-of-ai-writing.md b/skills/writing-clearly-and-concisely/references/signs-of-ai-writing.md
similarity index 100%
rename from skills/writing-clearly-and-concisely/signs-of-ai-writing.md
rename to skills/writing-clearly-and-concisely/references/signs-of-ai-writing.md

From 9eb413b29d08f1f46fa2d456a23f4e3753e29430 Mon Sep 17 00:00:00 2001
From: Jay Vucic <jay.vucic@gmail.com>
Date: Fri, 29 May 2026 16:40:54 -0700
Subject: [PATCH 3/3] =?UTF-8?q?feat(skills):=20Sprint=208=20=E2=80=94=206?=
 =?UTF-8?q?=20new=20skills=20+=20feedback/difficult-conversations=20merge?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

New skills:
- git-workflow: branch strategy, rebase/merge decision, bisect mastery, stash hygiene, worktree patterns, reflog recovery (77L SKILL.md + 5 references)
- code-review: severity calibration, diff vs file review, 3-category feedback taxonomy, language anti-patterns, test review protocol (165L + 3 references)
- docker-and-containers: multi-stage build patterns, layer cache invalidation, image size, health checks, compose startup ordering, container security (126L + 2 references)
- testing-strategy: pyramid vs trophy, test double taxonomy, Playwright vs Cypress, flakiness root causes, mutation testing, boundary patterns (161L + 3 references)
- presentation-to-audience: narrative arc selection, audience calibration, opening strategy, Q&A bridge technique, cognitive load design (122L + 2 references)
- monorepo-workspace: Nx/Turborepo selection, build graph, shared config inheritance, circular dependency detection, publishing strategy, remote caching (3 references)

Merge:
- feedback-mastery + difficult-workplace-conversations → workplace-conversations
- Both original skills now have redirect stubs pointing to merged skill

Library: 55 → 61 skills total

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
 skills/code-review/SKILL.md                   | 165 ++++++++
 .../references/feedback-phrasing.md           | 119 ++++++
 .../references/language-antipatterns.md       | 215 ++++++++++
 skills/code-review/references/test-review.md  | 126 ++++++
 .../SKILL.md                                  |  94 +----
 skills/docker-and-containers/SKILL.md         | 126 ++++++
 .../references/compose-examples.md            | 199 +++++++++
 .../references/dockerfile-patterns.md         | 149 +++++++
 skills/feedback-mastery/SKILL.md              | 108 +----
 skills/git-workflow/SKILL.md                  |  77 ++++
 .../git-workflow/references/bisect-mastery.md | 134 +++++++
 .../references/branch-strategies.md           |  58 +++
 .../references/index-precision.md             | 134 +++++++
 .../references/reflog-recovery.md             | 147 +++++++
 .../references/worktree-patterns.md           | 117 ++++++
 skills/monorepo-workspace/SKILL.md            | 139 +++++++
 .../references/remote-cache-setup.md          | 166 ++++++++
 .../references/shared-config-patterns.md      | 163 ++++++++
 .../references/tool-comparison.md             | 161 ++++++++
 skills/presentation-to-audience/SKILL.md      | 122 ++++++
 .../references/audience-calibration.md        | 101 +++++
 .../references/narrative-arcs.md              |  82 ++++
 skills/testing-strategy/SKILL.md              | 161 ++++++++
 .../references/boundary-testing-patterns.md   |  89 +++++
 .../references/e2e-framework-patterns.md      | 133 ++++++
 .../references/pyramid-vs-trophy.md           |  55 +++
 skills/workplace-conversations/SKILL.md       | 264 ++++++++++++
 .../references/conversation-framework.md      | 353 ++++++++++++++++
 .../references/delivery-scripts.md            | 362 +++++++++++++++++
 .../references/emotional-regulation.md        | 378 ++++++++++++++++++
 .../references/expectation-alignment.md       | 271 +++++++++++++
 .../references/feedback-sbi-model.md          | 206 ++++++++++
 .../references/preparation-template.md        | 370 +++++++++++++++++
 33 files changed, 5348 insertions(+), 196 deletions(-)
 create mode 100644 skills/code-review/SKILL.md
 create mode 100644 skills/code-review/references/feedback-phrasing.md
 create mode 100644 skills/code-review/references/language-antipatterns.md
 create mode 100644 skills/code-review/references/test-review.md
 create mode 100644 skills/docker-and-containers/SKILL.md
 create mode 100644 skills/docker-and-containers/references/compose-examples.md
 create mode 100644 skills/docker-and-containers/references/dockerfile-patterns.md
 create mode 100644 skills/git-workflow/SKILL.md
 create mode 100644 skills/git-workflow/references/bisect-mastery.md
 create mode 100644 skills/git-workflow/references/branch-strategies.md
 create mode 100644 skills/git-workflow/references/index-precision.md
 create mode 100644 skills/git-workflow/references/reflog-recovery.md
 create mode 100644 skills/git-workflow/references/worktree-patterns.md
 create mode 100644 skills/monorepo-workspace/SKILL.md
 create mode 100644 skills/monorepo-workspace/references/remote-cache-setup.md
 create mode 100644 skills/monorepo-workspace/references/shared-config-patterns.md
 create mode 100644 skills/monorepo-workspace/references/tool-comparison.md
 create mode 100644 skills/presentation-to-audience/SKILL.md
 create mode 100644 skills/presentation-to-audience/references/audience-calibration.md
 create mode 100644 skills/presentation-to-audience/references/narrative-arcs.md
 create mode 100644 skills/testing-strategy/SKILL.md
 create mode 100644 skills/testing-strategy/references/boundary-testing-patterns.md
 create mode 100644 skills/testing-strategy/references/e2e-framework-patterns.md
 create mode 100644 skills/testing-strategy/references/pyramid-vs-trophy.md
 create mode 100644 skills/workplace-conversations/SKILL.md
 create mode 100644 skills/workplace-conversations/references/conversation-framework.md
 create mode 100644 skills/workplace-conversations/references/delivery-scripts.md
 create mode 100644 skills/workplace-conversations/references/emotional-regulation.md
 create mode 100644 skills/workplace-conversations/references/expectation-alignment.md
 create mode 100644 skills/workplace-conversations/references/feedback-sbi-model.md
 create mode 100644 skills/workplace-conversations/references/preparation-template.md

diff --git a/skills/code-review/SKILL.md b/skills/code-review/SKILL.md
new file mode 100644
index 0000000..ef8d306
--- /dev/null
+++ b/skills/code-review/SKILL.md
@@ -0,0 +1,165 @@
+---
+name: code-review
+description: Expert code review covering severity calibration, diff-vs-full-file gaps, feedback categorization (correctness/comprehensibility/conventions), language-specific anti-patterns in JS/TS/Python/SQL, the second-read technique, test review, and feedback phrasing. Use when performing a code review, reviewing a pull request, giving or receiving PR feedback, or evaluating code quality. Triggers on phrases like "review this code", "review this PR", "give me feedback on this diff", "how should I review this", "what should I check", "code quality feedback", or "how do I respond to review comments".
+---
+
+# Code Review
+
+A code review has one job: find the things the author couldn't find because they're the author.
+
+---
+
+## Mindset
+
+1. **Severity inflation is a social phenomenon, not a technical one.** Reviewers upgrade findings to blockers to feel impactful or to ensure they're taken seriously. The result: authors stop distinguishing real blockers from theater, and everything becomes a negotiation. Your credibility as a reviewer lives and dies by accurate severity labels — one false blocker devalues the next ten real ones.
+2. **A diff is a crime scene, not the crime.** What you see changed; you don't see what accumulated. Dead code that's been there three years, a function that grew from 20 to 80 lines across six PRs, a pattern that diverged from the rest of the codebase six months ago — none of these appear in the diff, but all of them compound every time a new diff lands on top of them.
+3. **The author knows things you don't — but that's the bug.** If understanding the code requires knowing the PR description, the Slack thread, or the author's intent, the code is wrong. The code must communicate that context itself. Feedback that says "I needed the PR description to understand this" is always valid.
+4. **Tests are the second implementation.** A test that always passes regardless of what the implementation does is not a test — it is false confidence deployed at scale. Reviewing tests is not optional; it is the highest-leverage activity in the review because a missed test failure will appear in production, not in a review thread.
+5. **"I'd do it differently" is not a review finding.** Style preference dressed as correctness feedback is the primary source of reviewer-author friction. Know which category your feedback belongs to before you type it.
+
+---
+
+## Navigation
+
+**Use this skill when**:
+- Performing a code review on a PR, diff, or file set
+- Drafting review comments and need calibration on tone or label
+- Receiving review comments and deciding how to respond
+- Designing a code review process or checklist for a team
+
+**Do NOT use this skill when**:
+- The task is security-specific (use `security-review` for XSS, injection, auth — the coverage there is deeper)
+- The task is purely formatting or style (configure a linter; don't use human attention)
+- The code has no test suite at all and the task is to add one (that's authoring, not reviewing)
+
+**Decision tree — what to load for the current scenario**:
+
+| Scenario | Action |
+|----------|--------|
+| Quick diff review | Apply 2-pass protocol below; no references needed |
+| Reviewing tests specifically | Load [`references/test-review.md`](references/test-review.md) |
+| JS/TS or Python/SQL anti-patterns | Load [`references/language-antipatterns.md`](references/language-antipatterns.md) |
+| Writing or improving review comments | Load [`references/feedback-phrasing.md`](references/feedback-phrasing.md) |
+| Receiving and responding to comments | Load [`references/feedback-phrasing.md`](references/feedback-phrasing.md) |
+| Full review audit (all dimensions) | Load all three references |
+
+Load only what the scenario requires.
+
+---
+
+## Philosophy
+
+The goal of a code review is not to demonstrate that you read the code — it is to transfer knowledge, catch failure modes the author cannot see, and ensure the codebase is maintainable by people who weren't in the room. Every comment should serve one of those three purposes or it shouldn't be there.
+
+---
+
+## NEVER
+
+- **NEVER approve a PR without checking the test diff** — untested code is the single most common regression source regardless of how clean the logic looks; no test diff means no confidence in correctness.
+- **NEVER leave a comment without a severity label** — unlabeled feedback forces the author to guess whether the review is blocking; when in doubt, authors merge; the bug ships.
+- **NEVER comment "this could be simplified" without showing the simplification** — vague suggestions create back-and-forth that stalls the PR and wastes two people's time; if you can't show it, it's not actionable feedback.
+- **NEVER review style inconsistencies that a linter should catch** — your review attention is finite and nonrenewable; automating what's automatable is not laziness, it is correct resource allocation.
+- **NEVER mark a PR "changes requested" for nits** — that status should be reserved for correctness and major issues; nits should accompany an approval with suggestions; blocking a PR over naming is reviewer overreach.
+- **NEVER skip the PR description** — a missing or bad description is itself a review finding; a description that doesn't explain *why* the change exists tells you the author may not have thought it through, which means the code deserves extra scrutiny.
+- **NEVER mix unlabeled correctness and style feedback in the same comment** — when a comment conflates "this is wrong" with "I prefer another approach," authors fix the preference and miss the bug.
+- **NEVER use "just" in review comments** ("just rename this", "just extract a function") — "just" signals that the reviewer thinks the change is trivial; it is almost always perceived as dismissive even when the reviewer doesn't intend it.
+
+---
+
+## The 4-Tier Severity System
+
+| Tier | Label | Meaning | PR action |
+|------|-------|---------|-----------|
+| 1 | **blocker** | Correctness failure, security hole, data loss, or contract violation | Must fix before merge |
+| 2 | **major** | Significant maintainability damage, performance regression, or design smell | Should fix; author must acknowledge if not |
+| 3 | **minor** | Improvement with meaningful but non-urgent value | Can defer; approve with suggestions |
+| 4 | **nit** | Preference, style, trivial naming | Approve; author decides |
+
+**Calibration rule**: before labeling a finding "blocker," ask: "Does merging this today cause a user-visible failure, data corruption, or a security breach?" If no, it is at most "major."
+
+The over-promotion failure mode: reviewers label things blocker because they want to be heard. This is understandable and wrong. It trains authors to treat blockers as negotiating positions rather than hard stops. When a real blocker appears, it gets the same response as the last five false ones.
+
+---
+
+## The 2-Pass Review Protocol
+
+### Pass 1: Correctness
+
+Read the diff as if you're testing it. For each changed function or method:
+- What are the inputs? What happens at the boundaries?
+- What invariants does the caller assume? Does this maintain them?
+- What is the error path? Is it handled or silently swallowed?
+- For async/concurrent code: where are the race conditions?
+
+Do not think about style. Do not think about naming. Find things that are wrong.
+
+### Pass 2: The Second Read
+
+Close the diff. Imagine you are a maintainer picking up this code in two years. You have no PR context, no author to ask, no Slack thread. Read the changed files (not the diff — the full files in context):
+- Is the function's purpose clear from its name and signature alone?
+- Does the complexity of this function fit on one screen? Has it grown past the point where a new contributor can hold it in working memory?
+- Does this code follow the patterns established elsewhere in this file? In this module?
+- Is there dead code that the change didn't need but also didn't clean up?
+
+Findings from Pass 2 are typically "major" or "minor" — they are about the future cost of the code, not correctness today.
+
+---
+
+## 3 Categories of Feedback
+
+Label every comment with its category. When categories are mixed unlabeled, authors don't know what's mandatory.
+
+| Category | Definition | Label prefix |
+|----------|-----------|-------------|
+| **Correctness** | The code is wrong — it will produce incorrect results, errors, or security failures in some input condition | `correctness:` |
+| **Comprehensibility** | The code is hard to understand — a future maintainer will spend extra time here | `comprehensibility:` |
+| **Conventions** | The code diverges from team patterns — it's not wrong, but it increases cognitive load for the team | `conventions:` |
+
+**Why this matters**: A reviewer who comments "this is confusing" is giving comprehensibility feedback. If the author hears it as conventions feedback, they'll rename a variable and consider it resolved. The label removes the ambiguity.
+
+---
+
+## Reviewing Tests
+
+Load [`references/test-review.md`](references/test-review.md) for the full protocol. Core principle:
+
+A test that would pass even if the implementation were wrong is not a test. It is a coverage number.
+
+Checklist for each test in the diff:
+- [ ] Would this test fail if you deleted the function under test?
+- [ ] Would this test fail if the function returned the wrong type but not an error?
+- [ ] Is the assertion on the thing that actually matters, or on a proxy?
+- [ ] Are edge cases tested (empty, null, boundary values, error paths)?
+- [ ] Is there a test for the behavior described in the PR, not just the happy path?
+
+Tests that mock everything and assert on mock calls test the implementation, not the behavior. Implementation-coupled tests break on refactors and provide no correctness signal.
+
+---
+
+## Giving Feedback
+
+- Prefix every comment with its category and severity: `correctness (blocker):`, `nit:`, `comprehensibility (minor):`
+- Ask questions when uncertain rather than asserting: "Does this handle the case where X is null?" not "This will crash if X is null."
+- Distinguish "I'd do it differently" from "this is wrong" — explicitly. Writing "I'd probably extract this into a helper, but I could see the case for keeping it inline" is better than either asserting wrongness or staying silent.
+- When you suggest a simplification, show the simplified version. If you can't show it in a comment, it wasn't a clear enough idea to be actionable.
+
+## Receiving Feedback
+
+Load [`references/feedback-phrasing.md`](references/feedback-phrasing.md) for the full receiving protocol. The single most important rule:
+
+**NEVER explain the context that makes the feedback wrong — fix the code so it communicates that context itself.**
+
+If a reviewer misunderstood your code, the code is unclear regardless of your intent. The correct response to "I don't understand why this is a loop" is not "I explained it in the PR description" — it is a clarifying comment in the code.
+
+---
+
+## When Things Go Wrong
+
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| PR review becomes a negotiation with the author | Reviewer didn't label severity; author is treating all findings as "major" by default | Re-label every comment with explicit tier; distinguish blockers from suggestions explicitly |
+| Reviews take 3+ rounds of back-and-forth | Feedback is vague ("simplify this") or unlabeled; author is guessing what's required | Rewrite comments with category prefix, severity, and concrete suggestion inline |
+| Author merged without addressing a blocker | Blocker wasn't marked as such; or reviewer mixed blockers and nits with no differentiation | Post-merge: open a follow-up issue immediately; in the team: calibrate what "blocker" means |
+| Test coverage is high but bugs still ship | Tests are asserting on implementation details (mock calls) rather than behavior; or tests have tautological assertions | Review test assertions against the "would this fail if the impl were wrong?" checklist |
+| Reviewer finds nothing in a large PR | Diff-only review mode; missed accumulation issues; or false confidence from clean diff | Run Pass 2 (full file in context); specifically look for growing function complexity and pattern drift |
+| Author rewrites code in response to a nit | Reviewer marked a nit as blocker or changes-requested; author is being overly compliant | Establish team-level norms: nits accompany approvals, not change requests |
diff --git a/skills/code-review/references/feedback-phrasing.md b/skills/code-review/references/feedback-phrasing.md
new file mode 100644
index 0000000..b406a4a
--- /dev/null
+++ b/skills/code-review/references/feedback-phrasing.md
@@ -0,0 +1,119 @@
+# Feedback Phrasing
+
+The technical finding and the comment about the technical finding are separate skills. Good findings delivered badly get ignored, disputed, or worked around without being fixed. Bad phrasing also signals low reviewer calibration, reducing the weight of the next comment.
+
+---
+
+## Giving Feedback: The Comment Formula
+
+```
+[category] ([severity]): [observation or question]. [suggested fix if applicable].
+```
+
+Examples:
+- `correctness (blocker): this will throw if items is empty — Array.prototype.reduce with no initial value on an empty array throws TypeError. Add an initial value: items.reduce((acc, x) => acc + x, 0).`
+- `comprehensibility (minor): after the second read this function does three distinct things (validate, transform, persist). A future maintainer will have to read all 60 lines to understand any one of them. Consider extracting into three functions.`
+- `nit: "data" could be "userData" to distinguish from the API response object above.`
+
+---
+
+## Asking vs. Asserting
+
+Use questions when you're uncertain whether the code is wrong or you're missing context:
+
+- "Does this handle the case where `user.role` is undefined?" (not "this crashes if role is undefined")
+- "Is there a reason this is sequential rather than Promise.all?" (not "this is slow, use Promise.all")
+- "What happens here if the DB write fails after the cache is updated?" (not "race condition")
+
+Questions invite the author to confirm or provide context. If the author confirms the concern, the finding stands. If they provide context you didn't have, you update your understanding — and then the question is whether the code should make that context visible to the next reader without the PR thread.
+
+**When to assert instead of ask**: when you're certain the code is wrong — a syntax error, a known language pitfall (mutable default arg, late binding closure), a provably incorrect algorithm. Asking "is this intentional?" for a mutable default arg wastes cycles. Label it correctness and show the fix.
+
+---
+
+## "I'd Do It Differently" vs. "This Is Wrong"
+
+State this explicitly. Authors can't read your intent.
+
+- "This is wrong because [reason] — correctness (blocker)."
+- "I'd approach this differently — I'd use X because Y — but this implementation is correct. Nit if you want to reconsider."
+- "Not sure which approach is better here. I'd lean toward X because it's more consistent with how we handle this in module Z. Minor."
+
+The cost of false certainty: if you mark a preference as a correctness finding and the author complies, you've introduced unnecessary churn. If they push back and you're wrong, you've damaged your reviewer credibility for future real findings.
+
+---
+
+## Feedback Phrases to Avoid
+
+| Phrase | Problem | Better |
+|--------|---------|--------|
+| "just rename this" | Minimizes effort; perceived as dismissive | "nit: `data` → `userData` would distinguish it from the API response variable above" |
+| "this could be simplified" | Not actionable without the simplification | "comprehensibility (minor): [show the simplified version]" |
+| "consider using X" | Too soft for a correctness finding, too strong for a nit | Label it explicitly: "nit: consider..." or "correctness (major): use X because..." |
+| "this is wrong" | No reason, no fix | "correctness (blocker): this is wrong because [specific reason]. Fix: [specific fix]." |
+| "not sure about this" | Author can't act on uncertainty without a specific concern | State the concern: "not sure this handles the null case — what happens when X is null?" |
+| "..." | A review comment with no content | Delete it; or write the thought |
+
+---
+
+## Receiving Feedback: The Author Protocol
+
+### The NEVER Rule
+
+**NEVER explain the context that makes the feedback wrong — fix the code so it communicates that context itself.**
+
+When a reviewer misunderstands your code, the correct response is not:
+- "I explained this in the PR description"
+- "This was discussed in the design doc"
+- "If you look at the calling code, it's clear that..."
+
+Those responses are telling the reviewer they're wrong and that the code is fine. But the code is not fine: it is unclear enough that a qualified person, reading it fresh, misunderstood it. That is the bug. Fix the code, not the reviewer.
+
+### The Author Response Workflow
+
+For each piece of feedback:
+
+1. **Read the label.** Is it a blocker, major, minor, or nit? If it's unlabeled, ask: "Is this blocking or a suggestion?"
+2. **Distinguish "wrong" from "different."** If the reviewer says the code is incorrect, verify. If they say they'd do it differently, you are allowed to disagree — but explain why your choice is better, not just that it's yours.
+3. **Apply the NEVER rule.** If your first instinct is to write a comment explaining why the feedback is based on a misunderstanding, stop. Ask yourself: "How do I fix the code so that misunderstanding is impossible?"
+4. **Respond to every comment.** Even if you're not making a change, respond with why. "Declining: this is consistent with the pattern in [module] — see [file:line]." Silence makes reviewers feel ignored and escalates nits into blocked PRs.
+5. **Don't argue about nits.** If it's a nit, apply it or explain briefly why you're not. Spending three comments defending a variable name is a worse signal than a variable name you disagree with.
+
+### When the Feedback Is Genuinely Wrong
+
+It happens. Reviewers miss context. Handle it without winning:
+
+- State the specific reason the feedback doesn't apply: "This branch is only reachable when X is already validated at [line]. The null case can't reach here."
+- Then ask: "Does the code make that clear enough, or should I add a comment to the guard clause?" This turns a disagreement into a collaboration on clarity.
+- If the reviewer still disagrees, escalate to a call rather than a thread. Comment threads are the wrong medium for resolving genuine technical disagreement.
+
+---
+
+## Review Thread Health
+
+A review thread is healthy when it ends. Signs of an unhealthy thread:
+
+- The same point is stated multiple times with increasing emphasis (both sides have stopped listening)
+- The discussion is about who is right, not what the code should do
+- The author is explaining intent rather than fixing code
+- The reviewer is listing examples of how they would do it differently rather than identifying what is wrong
+
+When a thread goes unhealthy, one person needs to end it: either "you've convinced me, I'll fix it" or "I'm not going to fix this one because [one sentence reason], marking resolved."
+
+---
+
+## PR Description as a Review Finding
+
+If the PR description is missing or inadequate, it is a finding — not a meta-comment about process:
+
+```
+comprehensibility (minor): the PR description explains what changed but not why.
+A future `git log` or `git blame` reader will land here without context for the decision.
+Suggested addition: one sentence on why X approach was chosen over Y.
+```
+
+A good description contains:
+1. What changed and what it does (the "what")
+2. Why it was changed — what problem it solves (the "why")
+3. What reviewers should pay most attention to (the "where to look")
+4. What was explicitly NOT changed and why, if relevant (reduces scope questions)
diff --git a/skills/code-review/references/language-antipatterns.md b/skills/code-review/references/language-antipatterns.md
new file mode 100644
index 0000000..fddb10a
--- /dev/null
+++ b/skills/code-review/references/language-antipatterns.md
@@ -0,0 +1,215 @@
+# Language-Specific Anti-Patterns
+
+Patterns that pass visual inspection and linting but introduce correctness failures or performance problems.
+
+---
+
+## JavaScript / TypeScript
+
+### Promise.all vs. Sequential Await
+
+**Pattern reviewers miss:**
+```ts
+// Looks clean. Runs sequentially. Latency = A + B + C.
+const user = await fetchUser(id);
+const perms = await fetchPermissions(id);
+const prefs = await fetchPreferences(id);
+
+// Correct for independent fetches. Latency = max(A, B, C).
+const [user, perms, prefs] = await Promise.all([
+  fetchUser(id),
+  fetchPermissions(id),
+  fetchPreferences(id),
+]);
+```
+
+**Why missed**: Sequential await reads like clean, linear logic. The performance cost is invisible in a diff unless you know both functions are async and independent. In hot paths (API handlers, server-side rendering), this is the single most common 3x latency regression hiding behind readable code.
+
+**Review trigger**: any block with 2+ awaits on independent data — check whether they can be parallelized.
+
+---
+
+### Type Assertion Hiding Runtime Errors
+
+**Pattern reviewers miss:**
+```ts
+// Compiles clean. Runtime crash if API changes response shape.
+const data = response.json() as ApiResponse;
+console.log(data.user.id); // TypeError if data.user is undefined
+
+// Correct: validate at the boundary.
+const raw = await response.json();
+const data = ApiResponseSchema.parse(raw); // Zod/Valibot/io-ts
+```
+
+**Why missed**: `as SomeType` reads as a type annotation, not a lie. TypeScript types are erased at runtime — the assertion tells the compiler to trust you, not to enforce anything. Every `as` on untrusted external data (API responses, localStorage, URL params, form inputs) is a potential runtime crash disguised as type safety.
+
+**Review trigger**: any `as` cast applied to `JSON.parse()`, `response.json()`, `localStorage.getItem()`, or URL/form data.
+
+---
+
+### Non-Null Assertion on Values That Can Be Null
+
+**Pattern reviewers miss:**
+```ts
+const el = document.getElementById('root')!;
+el.appendChild(child); // crash if 'root' doesn't exist in test env or SSR
+
+// Correct: explicit guard
+const el = document.getElementById('root');
+if (!el) throw new Error("Missing #root element");
+el.appendChild(child);
+```
+
+**Why missed**: `!` is visually subtle. Reviewers read past it because it looks like punctuation. In test environments or SSR contexts, the element won't exist and the crash will only appear outside the happy path.
+
+---
+
+### Mutating State in Array Methods
+
+**Pattern reviewers miss:**
+```ts
+// Looks like a transform. Mutates original.
+const updated = items.map(item => {
+  item.processed = true; // mutation
+  return item;
+});
+
+// Correct: return new object
+const updated = items.map(item => ({ ...item, processed: true }));
+```
+
+**Why missed**: `.map()` signals "transform without mutation" by convention. A reviewer skimming the shape of the code assumes immutability. The mutation causes hard-to-trace bugs in React state, Redux stores, and any caller that holds a reference to the original array.
+
+---
+
+## Python
+
+### Mutable Default Arguments
+
+**Pattern reviewers miss:**
+```python
+# The default list is created ONCE at function definition time.
+def add_item(item, collection=[]):
+    collection.append(item)
+    return collection
+
+add_item("a")  # ['a']
+add_item("b")  # ['a', 'b'] — not ['b']
+
+# Correct: use None sentinel
+def add_item(item, collection=None):
+    if collection is None:
+        collection = []
+    collection.append(item)
+    return collection
+```
+
+**Why missed**: The mutable default arg issue is in Python FAQs, but it still ships regularly because reviewers spot-check function signatures without running the mental model of "this default is evaluated once." The bug only appears on the second call, not in unit tests that call the function once per test.
+
+**Review trigger**: any default argument that is a list, dict, or set literal.
+
+---
+
+### Late Binding in Closures
+
+**Pattern reviewers miss:**
+```python
+# All lambdas capture the variable `i`, not its value.
+funcs = [lambda: i for i in range(3)]
+funcs[0]()  # returns 2, not 0
+funcs[1]()  # returns 2, not 1
+
+# Correct: bind at definition time via default argument
+funcs = [lambda i=i: i for i in range(3)]
+```
+
+**Why missed**: The loop looks like it creates three independent functions. The fact that Python closures capture variables (not values) is non-obvious. This exact pattern appears in callback registration, event handler setup, and test parametrization. The bug manifests at call time, not definition time, so it's invisible in code that only sets up the closures.
+
+**Review trigger**: any lambda or nested function defined inside a loop that references the loop variable.
+
+---
+
+### Exception-Swallowing Bare Except
+
+**Pattern reviewers miss:**
+```python
+try:
+    result = fetch_data()
+except:
+    result = None  # catches KeyboardInterrupt, SystemExit, MemoryError
+
+# Correct: catch specific exceptions
+try:
+    result = fetch_data()
+except (ConnectionError, TimeoutError) as e:
+    log.warning("fetch failed: %s", e)
+    result = None
+```
+
+**Why missed**: `except:` looks equivalent to `except Exception:` but is not — it catches `BaseException`, including `KeyboardInterrupt` and `SystemExit`. Code that swallows these becomes unresponsive to Ctrl-C and SIGTERM, which causes deployment and container shutdown problems.
+
+---
+
+## SQL / ORM
+
+### N+1 in ORMs That Look Clean
+
+**Pattern reviewers miss:**
+```python
+# Django / SQLAlchemy — looks like one query, executes N+1
+posts = Post.objects.all()
+for post in posts:
+    print(post.author.name)  # one SELECT per post
+
+# Correct: eager load
+posts = Post.objects.select_related('author').all()
+```
+
+**Why missed**: The ORM abstracts the queries. In the diff, it looks like two lines accessing an object. The N+1 only appears when you understand that `post.author` triggers a lazy load. In development with small datasets it's invisible; in production with thousands of rows it's a page timeout.
+
+**Review trigger**: any loop that accesses a relationship attribute on an ORM object without a visible `select_related`, `prefetch_related`, `joinedload`, or `eager_load` call.
+
+---
+
+### Missing Index on Foreign Key Columns
+
+**Pattern reviewers miss:**
+```sql
+-- Foreign key created, but no index on the referencing column.
+ALTER TABLE orders ADD COLUMN user_id INT REFERENCES users(id);
+
+-- Every JOIN or WHERE on user_id is a full table scan.
+-- Correct:
+ALTER TABLE orders ADD COLUMN user_id INT REFERENCES users(id);
+CREATE INDEX idx_orders_user_id ON orders(user_id);
+```
+
+**Why missed**: Most ORM migration generators create the FK constraint but not the index. The schema looks correct. The performance failure only appears under load when the table has meaningful rows. This is a diff-review blind spot because the constraint and the missing index are on the same column — it reads as complete.
+
+---
+
+### SELECT * in Application Queries
+
+**Pattern reviewers miss:**
+```python
+# Fetches all columns including large BLOBs, deprecated fields, secrets
+cursor.execute("SELECT * FROM users WHERE id = %s", [user_id])
+
+# Correct: explicit column list
+cursor.execute("SELECT id, email, display_name FROM users WHERE id = %s", [user_id])
+```
+
+**Why missed**: `SELECT *` is acceptable for exploratory queries and is common in tutorial code. In application code, it causes three problems: it over-fetches data (performance), it breaks when columns are added or removed (fragility), and it may return columns containing sensitive data that the application code then accidentally logs or serializes.
+
+---
+
+## Cross-Language: Error Path Review
+
+For any function that can fail, check the diff for:
+
+1. **Does the error path return a meaningful error?** Or does it return `nil`, `None`, `null` with no signal about why?
+2. **Is the error logged at the right level?** Swallowed errors at DEBUG that should be ERROR; conversely, expected errors (rate limits, cache misses) logged at ERROR that generate alert noise.
+3. **Does the error path clean up resources?** Files, connections, locks, and goroutines acquired before the failure — are they released in the error path?
+
+These are correctness findings, not style. Label them `correctness (blocker):` if the missing cleanup causes a resource leak or data corruption.
diff --git a/skills/code-review/references/test-review.md b/skills/code-review/references/test-review.md
new file mode 100644
index 0000000..ce91b91
--- /dev/null
+++ b/skills/code-review/references/test-review.md
@@ -0,0 +1,126 @@
+# Test Review Protocol
+
+Tests that always pass are worse than no tests. No tests produce an obvious gap in coverage. Tests that always pass produce false confidence and mask regressions.
+
+---
+
+## The Fundamental Test Question
+
+Before reading a test's assertions, ask: **"What would I have to break in the implementation to make this test fail?"**
+
+If the answer is "almost nothing" or "I can't think of anything easily," the test is not providing correctness signal. It is covering lines without covering behavior.
+
+---
+
+## The 4 Test Failure Modes
+
+### 1. Tautological Assertion
+
+```python
+# Always passes. Tests nothing.
+result = process_data(input)
+assert result is not None
+
+# Better: assert on the actual value
+assert result == {"status": "ok", "count": 3}
+```
+
+`is not None` is the most common tautological assertion. It passes when the function returns any non-None value, including wrong values, empty values, and error objects that happen to be truthy.
+
+### 2. Mock-Coupled Test
+
+```python
+# Tests that the function called the mock, not that behavior is correct.
+def test_send_email():
+    with mock.patch('app.mailer.send') as mock_send:
+        notify_user(user_id=1)
+    mock_send.assert_called_once()  # confirms a call happened, not what was sent
+```
+
+Implementation-coupled tests break on refactors that don't change behavior (e.g., switching email libraries) and pass when the behavior is wrong (e.g., wrong recipient). The assertion should be on observable outputs or side effects, not on internal call patterns.
+
+When mocking is unavoidable (external services, I/O), assert on what was passed to the mock, not just that it was called.
+
+### 3. Happy-Path-Only Coverage
+
+A test file where every test case uses valid, in-range, expected inputs tests that the code works when it works. The bugs live in:
+- Empty collections
+- Zero values and null/None/undefined inputs
+- Boundary values (off-by-one at list ends, date boundaries, max integer values)
+- Error paths (what happens when the dependency throws?)
+- Concurrent access (if the code has any shared state)
+
+**Review trigger**: a test file with 10 tests, all with clean, valid inputs. Ask: "Where is the test for the null case? For the empty list? For the error response?"
+
+### 4. Test Describes Implementation, Not Behavior
+
+```python
+# Describes implementation
+def test_uses_cache():
+    ...
+    assert cache.get.called
+
+# Describes behavior
+def test_returns_same_result_on_repeated_calls():
+    result1 = get_user(id=1)
+    result2 = get_user(id=1)
+    assert result1 == result2
+```
+
+If you can rename the test to describe internal mechanics rather than user-visible behavior, the test is testing the wrong thing. When the implementation changes (even for the better), implementation-describing tests break and create noise that obscures real failures.
+
+---
+
+## Test Review Checklist
+
+For each test added or modified in the diff:
+
+```
+[ ] Would this test fail if the function under test were deleted?
+[ ] Would this test fail if the function returned the wrong type (but not an error)?
+[ ] Is the assertion on the specific value that matters, not just its presence?
+[ ] Is the happy path covered?
+[ ] Is at least one edge case covered (null, empty, boundary, error)?
+[ ] Does the test name describe behavior, not implementation?
+[ ] Are mocks asserting on what was passed, not just that they were called?
+[ ] Is the test independent (no shared mutable state with other tests)?
+```
+
+---
+
+## Missing Tests: What to Flag
+
+When the diff adds new logic but no new tests, label the finding:
+
+```
+correctness (major): no test for [specific behavior]. If this branch were wrong,
+no existing test would catch it. Suggest adding a test for [specific case].
+```
+
+Specific cases worth calling out explicitly:
+- New error handling path with no test that exercises the error
+- New conditional branch with no test that takes that branch false
+- New async function with no test that awaits it
+- New validation logic with no test that passes invalid input
+
+---
+
+## When Test Coverage Numbers Lie
+
+High coverage (>90%) with low test quality means:
+- Tests were written after the code to hit a coverage gate
+- Tests import and call every function but don't assert meaningful values
+- Coverage was measured on unit tests; integration paths are untested
+
+Coverage is a necessary but insufficient condition for correctness. In a review, treat coverage numbers as a signal to look harder at the tests, not as evidence that they're good.
+
+---
+
+## Reviewing Test Infrastructure Changes
+
+When the diff modifies test helpers, fixtures, factories, or setup/teardown:
+- Does a change to a shared fixture silently affect other tests that depend on it?
+- Does a change to a factory default change assumptions in tests that use it without overrides?
+- Does a new mock helper have the same failure modes as the mocks it replaces?
+
+Shared test infrastructure bugs are the hardest to find because they affect all tests that use the infrastructure, not just the changed test.
diff --git a/skills/difficult-workplace-conversations/SKILL.md b/skills/difficult-workplace-conversations/SKILL.md
index 21a8a8e..252c7aa 100644
--- a/skills/difficult-workplace-conversations/SKILL.md
+++ b/skills/difficult-workplace-conversations/SKILL.md
@@ -1,96 +1,8 @@
 ---
 name: difficult-workplace-conversations
-description: Practitioner guide for preparing, navigating, and following up on high-stakes workplace conversations — performance issues, peer conflict, upward feedback, terminations, salary negotiations, and emotionally charged discussions. Trigger phrases: "how do I talk to", "preparing for a hard conversation", "giving difficult feedback", "conflict with coworker", "talking to my manager about", "addressing performance", "someone is defensive".
-allowed-tools: Read, Glob, Grep
+description: "MERGED — use workplace-conversations instead. This skill has been merged into workplace-conversations which covers both feedback delivery and difficult conversation navigation."
 ---
 
-## Mindset
+# Merged
 
-- **The goal is a shared understanding, not a verdict.** Most failed conversations happen because one party enters to win. Practitioners enter to learn something they don't know yet — even when the facts seem clear.
-- **Timing is a power lever most people waste.** Raising an issue when the other person is stressed, public, or cornered guarantees defense. The right moment (private, unhurried, not post-incident) does half the work for you.
-- **Specificity is protection.** Vague feedback ("you need to be more professional") creates defensiveness because it leaves the other person to guess. Specificity removes the guess and removes the threat.
-- **A prepared BATNA changes how you show up.** Knowing what you'll do if the conversation fails — not as a threat, but as a real alternative — eliminates desperation from your posture. The other person feels it even if you never say it.
-- **Regression is normal, not evidence of failure.** Behavior changes after a hard conversation often reverse within 2–3 weeks. Practitioners schedule a check-in at week 3 precisely because of this, not because they distrust the other person.
-
-## Navigation
-
-**Use this skill when**:
-- Preparing a conversation with real stakes: performance, conflict, termination, salary, upward feedback
-- Someone is defensive, emotional, or has a pattern of avoiding accountability
-- You need to give feedback to someone with more organizational power than you
-- A prior conversation didn't produce lasting change
-
-**Do NOT use this skill when**:
-- Casual feedback in a low-stakes context (use `feedback-mastery` instead)
-- HR/legal issues where process must lead (loop in HR first, then use this for your role in that process)
-- Written communication only (this skill is optimized for live dialogue)
-
-**Triage — what kind of conversation is this?**
-
-```
-Is there a power differential (you are the manager)?
-  YES → Performance/termination track → load references/conversation-framework.md
-  NO  → Is this a pattern or a one-time incident?
-          PATTERN → Conflict/peer track → load references/delivery-scripts.md
-          ONE-TIME → Is it upward (to your manager)?
-                       YES → Upward feedback track → see Upward Feedback section below
-                       NO  → Sensitive topic track → load references/emotional-regulation.md
-```
-
-## Philosophy
-
-The hardest conversations are hard because both parties have a legitimate version of events. Your job is not to correct their version but to build a third version together — one specific enough to produce durable action.
-
-## NEVER
-
-- **NEVER open with impact before the other person understands what behavior you're describing** — they'll spend the entire conversation defending their intention instead of hearing the impact. Behavior first, impact second, always.
-- **NEVER have a high-stakes conversation without a written BATNA** — not to use as leverage, but because without one your desperation leaks into your tone and the other person calibrates to it. Write down what you'll do if this goes nowhere.
-- **NEVER follow up a difficult conversation only once** — single follow-ups let people wait you out. Practitioners set a 3-week check-in at the close of the original conversation. Most regressions happen at week 2–3, not week 1.
-- **NEVER summarize the conversation immediately afterward in a way that positions you as correct** — written summaries ("per our discussion, you agreed to...") feel like legal filings. Instead, ask the other person to summarize what they're taking away. Their words reveal whether there's real alignment.
-- **NEVER conflate "they heard me" with "they agreed"** — acknowledgment ("I understand your concern") is not commitment. Close every conversation with a specific, owned action: who does what, by when, and how you'll both know it happened.
-- **NEVER raise a performance issue for the first time in a formal review** — the employee experiences it as an ambush, the feedback becomes legally and relationally contaminated, and you've lost 6–12 months of intervention time. Informal conversations must precede formal ones.
-- **NEVER use "we need to talk" as a meeting invite subject** — it triggers 48–72 hours of anxiety that arrives to your conversation as pre-loaded defensiveness. Name the topic neutrally: "checking in on Q3 deliverables."
-
-## When Things Go Wrong
-
-| Situation | Likely Cause | Recovery |
-|-----------|-------------|----------|
-| Other person shuts down or goes silent | Felt accused, not invited | Explicitly name what you're seeing: "I notice you've gone quiet — I want to make sure this feels safe to discuss." Then pause. Don't fill the silence. |
-| Conversation spirals into unrelated grievances | No shared agenda set upfront | Interrupt cleanly: "I want to hear all of this. Can we agree to finish this topic first and schedule time for the rest?" |
-| Agreed actions don't happen after the conversation | Agreement was vague or socially coerced | Return to the conversation with curiosity, not accusation. Ask what got in the way. Vague commitments usually mean the person didn't actually agree — they just ended the discomfort. |
-| Person becomes hostile or escalates | Conversation revealed a threat to their status/livelihood | De-escalate by naming it: "I can see this is hitting hard. I'm not here to threaten your position — I'm here because I want this to work." Then stop talking and let them respond. |
-| Your own emotions take over mid-conversation | Ambush response to something unexpected | Use the explicit pause: "I want to make sure I respond thoughtfully. Can I have a moment?" Practitioners normalize pausing; it signals self-awareness, not weakness. |
-
-## Reference Files (Load on Demand)
-
-| Reference | When to Load |
-|-----------|-------------|
-| `references/conversation-framework.md` | Full preparation workflow, fact/story separation exercise, contribution mapping |
-| `references/preparation-template.md` | Blank worksheet to fill out before the conversation; covers goal, BATNA, trigger anticipation |
-| `references/delivery-scripts.md` | Opening formulas by situation type, de-escalation phrases, closing scripts |
-| `references/emotional-regulation.md` | Pre-conversation regulation techniques; managing your own state during high-charge moments |
-
-Load the relevant reference when the user needs scripts, templates, or deep-dive frameworks. Do not reproduce reference content inline unless the user asks for it directly.
-
-## Upward Feedback (Special Case)
-
-Giving feedback to someone with power over you has a different risk profile. Three adjustments:
-
-1. **Frame as impact on your effectiveness, not a critique of their behavior** — "When X happens, I find it harder to do Y" is harder to dismiss than "you do X."
-2. **Request, don't demand** — end with a question, not a directive: "Would you be open to trying Z?" gives them an exit that isn't losing face.
-3. **Accept that they may not change** — your goal is to have said it clearly once, not to fix them. If nothing changes, that's data for your own decisions.
-
-## Conversation Types Quick Reference
-
-| Type | Primary Risk | Key Move |
-|------|-------------|----------|
-| Performance issue | They feel blindsided | Cite prior informal conversations; ask what they'd need to succeed |
-| Peer conflict | Both parties feel wronged | Separate fact from interpretation before the conversation, not during |
-| Termination | Emotional flooding | Short, clear, no hedging; deliver the decision in the first 60 seconds |
-| Salary/promotion | Negotiation feels adversarial | Anchor to market data + contribution, not personal need |
-| Upward feedback | Power retaliation | See Upward Feedback section above |
-
-## See Also
-
-- `feedback-mastery` — SBI model deep dive, feedback calibration
-- `professional-communication` — written communication, email framing
+This skill has been merged into `workplace-conversations`. Use that skill instead.
diff --git a/skills/docker-and-containers/SKILL.md b/skills/docker-and-containers/SKILL.md
new file mode 100644
index 0000000..fb20375
--- /dev/null
+++ b/skills/docker-and-containers/SKILL.md
@@ -0,0 +1,126 @@
+---
+name: docker-and-containers
+version: 1.0.0
+description: "Expert Docker and container patterns — multi-stage build design, layer cache ordering, image size optimization, signal handling (SIGTERM/PID 1 gap), compose health dependencies, and container security hardening. Use when writing Dockerfiles, diagnosing build cache misses, sizing images, securing containers, debugging startup race conditions, or designing compose service dependencies."
+---
+
+# Docker and Containers
+
+## Mindset
+
+1. **Layer cache is a directed acyclic graph, not a stack.** Every instruction after a cache miss re-executes. The ordering rule is: most stable content first. `COPY package.json` before `COPY .` because your application code changes every commit but dependencies change once a week. A single mis-ordered COPY invalidates all downstream layers on every build.
+2. **Shell form ENTRYPOINT is always wrong.** `ENTRYPOINT ["node", "server.js"]` (exec form) makes Node.js PID 1 — it receives SIGTERM directly. `ENTRYPOINT node server.js` (shell form) makes `/bin/sh -c` PID 1 — it does not forward signals. Your app never receives SIGTERM and Kubernetes/Docker kills it with SIGKILL after the grace period, causing ungraceful shutdown every time.
+3. **Image layers are append-only — deletion is not deletion.** `COPY . .` then `RUN rm -rf secrets/` leaves `secrets/` in the lower layer permanently, visible via `docker history`. Same for `RUN apt-get install X && RUN apt-get clean` — the package cache was written in layer N, the clean ran in layer N+1, the cache is still in layer N. Commands that write and delete must be in the same `RUN`.
+4. **A health check that checks a port is not a health check.** `curl -f http://localhost:8080/` returns 200 whether the app has a working database connection or not. Check an endpoint that exercises an actual dependency: `/healthz` that queries the DB and returns 200 only when ready. Compose `depends_on: condition: service_healthy` is meaningless without a health check that represents real readiness.
+5. **Secrets baked into images are permanent.** `ENV API_KEY=secret` is visible in `docker inspect` on any host with pull access. `ARG SECRET` is visible in `docker history`. Neither is erased by a subsequent layer. Runtime secret injection (mounted files, environment from a secrets manager at startup) is the only safe path.
+
+## Navigation
+
+**Use this skill when:**
+- Writing or reviewing a Dockerfile (new service, base image upgrade, size reduction)
+- Diagnosing slow or broken build cache (layer invalidation debugging)
+- Designing compose service startup order with health dependency requirements
+- Hardening a container for production (non-root, capabilities, read-only filesystem)
+- Debugging graceful shutdown failures (SIGTERM not received, zombie processes)
+- Choosing between single-stage vs. multi-stage build
+
+**Do NOT use this skill when:**
+- Kubernetes deployment YAML design (pod specs, resource limits, liveness vs. readiness probes at the k8s layer) — those have different semantics than Docker health checks
+- Container runtime selection (containerd, CRI-O, gVisor) — this skill covers Docker/Compose-layer decisions
+- CI/CD pipeline orchestration beyond the image build step
+
+**Decision tree — multi-stage vs. single-stage:**
+
+```
+Is the build producing a compiled artifact (Go binary, Java jar, webpack bundle)?
+├── YES → Multi-stage: builder stage compiles, production stage copies artifact only
+└── NO → Is the runtime the same as the build environment?
+         ├── YES → Single-stage is correct; multi-stage adds complexity with no benefit
+         └── NO (e.g., Python venv to slim image) → Multi-stage to avoid build tools in production
+```
+
+**Decision tree — alpine vs. slim vs. distroless:**
+
+```
+Does the app need a shell at runtime (startup scripts, exec into container)?
+├── YES → use slim (debian-based, has bash/sh, ~80MB base)
+└── NO → Does the app dynamically link glibc?
+          ├── YES, statically linked → distroless/static (~2MB, no shell, no package manager)
+          ├── YES, dynamically linked → distroless/base (~20MB, glibc only)
+          └── UNKNOWN / uses musl → alpine (~5MB, musl libc — verify all dependencies compile against musl)
+```
+
+## Philosophy
+
+Container images are artifacts, not environments. The image defines what runs, not where it runs. Every byte in the image is a liability: a larger attack surface, a slower pull, a staler dependency. Build the smallest image that runs the application correctly, signal-handles cleanly, and contains zero credentials.
+
+## NEVER
+
+- **NEVER use the `latest` tag in production Dockerfiles** — `latest` is a mutable pointer; a registry push by any team member silently changes what your `docker pull` fetches. Pin to a digest (`image@sha256:...`) or an immutable version tag. Your build is otherwise non-deterministic across environments.
+- **NEVER store secrets in `ENV`, `ARG`, or `COPY`** — they appear in `docker inspect`, `docker history`, and are baked into the layer graph permanently. A subsequent layer that deletes the secret does not remove it from earlier layers. Use runtime secret injection: mounted secret files, environment variables set by the orchestrator at start time, or a secrets manager SDK.
+- **NEVER run `RUN apt-get update` in a separate layer from `apt-get install`** — the update layer is cached. On a subsequent build days later, Docker uses the stale cached update layer, `apt-get install` runs against an outdated package index, and packages are silently "not found" or install wrong versions. Always `RUN apt-get update && apt-get install -y ... && rm -rf /var/lib/apt/lists/*` in a single instruction.
+- **NEVER use shell form for `ENTRYPOINT`** — `ENTRYPOINT node server.js` wraps the process in `/bin/sh -c`, which becomes PID 1. The shell does not forward SIGTERM to child processes. Your application never receives the graceful-shutdown signal and gets SIGKILL'd after the stop grace period expires. Use exec form: `ENTRYPOINT ["node", "server.js"]`.
+- **NEVER use `ADD` instead of `COPY` for local files** — `ADD` has two implicit behaviors that `COPY` does not: it auto-extracts tar archives and it fetches from URLs. If a file path happens to be tar-shaped or a future maintainer adds a URL argument, `ADD` silently does something different than expected. `COPY` copies files. Use `COPY` for files; use `RUN curl` with explicit flags if you need network fetch.
+- **NEVER run the production container process as root when a non-root option exists** — a container escape (kernel vulnerability, misconfigured mount) with root inside the container is root on the host in many configurations. Add a non-root user and place `USER nonroot` after all `RUN` instructions that require root (package installs, chowns), but before the final `COPY` of application files.
+- **NEVER write-then-delete in separate `RUN` instructions** — files written in one layer persist in the image even if deleted in the next. Package caches, build artifacts, and secrets must be created and removed in the same `RUN` command using `&&`.
+
+## Core Patterns
+
+See `references/dockerfile-patterns.md` for annotated production-ready Dockerfiles.
+See `references/compose-examples.md` for compose health dependency and override file patterns.
+
+### Signal-safe ENTRYPOINT
+
+```dockerfile
+# WRONG — shell form, PID 1 is /bin/sh, SIGTERM not forwarded
+ENTRYPOINT node server.js
+
+# CORRECT — exec form, PID 1 is node, receives SIGTERM directly
+ENTRYPOINT ["node", "server.js"]
+```
+
+If the application does not handle SIGTERM itself, use `tini` as a minimal init:
+```dockerfile
+RUN apk add --no-cache tini
+ENTRYPOINT ["/sbin/tini", "--", "node", "server.js"]
+```
+
+### Layer ordering for cache efficiency
+
+```dockerfile
+# WRONG — code change invalidates npm install on every build
+COPY . .
+RUN npm ci
+
+# CORRECT — package.json changes rarely; code changes often
+COPY package.json package-lock.json ./
+RUN npm ci --omit=dev
+COPY . .
+```
+
+### Non-root user placement
+
+```dockerfile
+# WRONG — USER before package install requires root but breaks permission
+USER nonroot
+RUN apt-get install -y curl    # fails: nonroot has no apt access
+
+# CORRECT — root for installs, then drop privileges before app code
+RUN apt-get update && apt-get install -y --no-install-recommends curl \
+    && rm -rf /var/lib/apt/lists/* \
+    && useradd --uid 1001 --no-create-home nonroot
+USER nonroot
+COPY --chown=nonroot:nonroot . .
+```
+
+## When Things Go Wrong
+
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| Build cache always misses at `RUN npm install` | `COPY . .` appears before `COPY package.json` — code changes invalidate package copy | Reorder: copy lockfile first, install, then copy source |
+| Container exits immediately on `docker stop` with exit code 137 | Shell-form ENTRYPOINT; app never receives SIGTERM; Docker sends SIGKILL after grace period | Switch to exec-form ENTRYPOINT; optionally add tini for signal forwarding |
+| `apt-get install` fails with "unable to find package" in CI but not locally | `apt-get update` in a cached layer; local cache is warm, CI cache is stale | Merge update and install into a single `RUN` instruction |
+| Image is unexpectedly large after `RUN rm -rf build-artifacts/` | Deletion in a separate layer — files still exist in the layer that wrote them | Combine write and delete in one `RUN`; use multi-stage to avoid the files entirely |
+| Compose service starts before database is ready, crashes | `depends_on` with `condition: service_started` (default) — only waits for container start, not readiness | Add `healthcheck` to the database service; use `condition: service_healthy` in dependent service |
+| Secret visible in `docker history` | `ARG SECRET` or `ENV SECRET` used during build | Remove from Dockerfile entirely; inject at runtime via orchestrator secrets or mounted file |
+| Volume data stale after git branch switch | Bind mount persists host directory state across branches; node_modules or build cache reflects old branch | Use named volume for dependency caches in development; bind mount only source code |
diff --git a/skills/docker-and-containers/references/compose-examples.md b/skills/docker-and-containers/references/compose-examples.md
new file mode 100644
index 0000000..00cd5f4
--- /dev/null
+++ b/skills/docker-and-containers/references/compose-examples.md
@@ -0,0 +1,199 @@
+# Compose Patterns — Production Reference
+
+## Health-Gated Startup (service_healthy vs. service_started)
+
+```yaml
+# compose.yml — base configuration
+services:
+  postgres:
+    image: postgres:16.2
+    environment:
+      POSTGRES_DB: appdb
+      POSTGRES_USER: app
+      POSTGRES_PASSWORD_FILE: /run/secrets/db_password
+    secrets:
+      - db_password
+    healthcheck:
+      test: ["CMD-SHELL", "pg_isready -U app -d appdb"]
+      interval: 5s
+      timeout: 3s
+      retries: 5
+      start_period: 10s    # grace period before first check counts as failure
+    volumes:
+      - postgres_data:/var/lib/postgresql/data
+
+  api:
+    build: .
+    depends_on:
+      postgres:
+        condition: service_healthy    # waits for postgres healthcheck to pass
+        restart: true                 # restart api if postgres restarts (compose v2.20+)
+    environment:
+      DATABASE_URL: postgres://app@postgres/appdb
+    secrets:
+      - db_password
+
+secrets:
+  db_password:
+    file: ./secrets/db_password.txt   # dev only — use external: true in prod
+
+volumes:
+  postgres_data:
+```
+
+**`service_started` vs. `service_healthy`:**
+- `service_started` (default): compose waits until the container's process starts. For a database, this means the container is running but the DB may still be initializing, accepting connections, and refusing queries. Race condition at startup.
+- `service_healthy`: compose polls the healthcheck until it passes before starting dependent services. Eliminates the startup race condition. Requires a `healthcheck` defined on the dependency — if no healthcheck is defined, compose silently falls back to `service_started` behavior without error.
+
+## Override Files for Dev vs. Prod
+
+```
+compose.yml          # base: what runs everywhere
+compose.override.yml # dev overrides: auto-loaded by `docker compose up`
+compose.prod.yml     # prod overrides: explicit `docker compose -f compose.yml -f compose.prod.yml up`
+```
+
+```yaml
+# compose.yml — base
+services:
+  api:
+    image: myapp/api:${IMAGE_TAG:-local}
+    environment:
+      NODE_ENV: production
+    restart: unless-stopped
+```
+
+```yaml
+# compose.override.yml — dev (auto-loaded)
+services:
+  api:
+    build: .                          # build locally instead of pulling
+    image: myapp/api:local
+    environment:
+      NODE_ENV: development
+    volumes:
+      - .:/app                        # bind mount for live reload
+      - /app/node_modules             # anonymous volume to shadow host node_modules
+    ports:
+      - "9229:9229"                   # debugger port not exposed in prod
+    command: ["node", "--inspect=0.0.0.0:9229", "dist/server.js"]
+```
+
+```yaml
+# compose.prod.yml — production
+services:
+  api:
+    deploy:
+      replicas: 2
+      resources:
+        limits:
+          cpus: "1.0"
+          memory: 512M
+    read_only: true
+    tmpfs:
+      - /tmp
+    security_opt:
+      - no-new-privileges:true
+    cap_drop:
+      - ALL
+```
+
+**Why not a single compose.yml with environment conditionals?** Compose does not support conditional blocks. Environment variables can override individual values but cannot toggle entire blocks (volumes, ports, deploy). Override files are the idiomatic and readable solution.
+
+## Volume vs. Bind Mount Decision
+
+```
+Is this production?
+├── YES → Use named volumes only.
+│         Bind mounts couple the container to a specific host path.
+│         Permission drift (UID mismatch) causes silent write failures.
+│         Host path may not exist on a new node in a cluster.
+└── NO (local development)
+         ├── Source code → bind mount (you want edits to reflect live)
+         ├── Dependencies (node_modules, .venv) → anonymous volume or named volume
+         │   Reason: switching git branches changes package.json; if node_modules
+         │   is bind-mounted, the host directory reflects the old branch's installs
+         │   until you re-run npm install. Use a volume to keep container deps isolated.
+         └── Persistent data (DB data dir) → named volume
+             Reason: named volumes survive container recreation; anonymous volumes do not.
+```
+
+```yaml
+# Correct dev pattern: source mounted, deps isolated
+services:
+  app:
+    volumes:
+      - .:/app                        # bind: source code (live reload)
+      - app_node_modules:/app/node_modules   # volume: isolated deps
+      - app_dist:/app/dist            # volume: build output
+
+volumes:
+  app_node_modules:
+  app_dist:
+```
+
+## Secrets in Compose (never ENV for production secrets)
+
+```yaml
+# WRONG: secret visible in `docker inspect`, process environment, /proc/PID/environ
+services:
+  api:
+    environment:
+      DATABASE_PASSWORD: s3cr3t
+
+# CORRECT: secret mounted as a file, read by application at startup
+services:
+  api:
+    secrets:
+      - db_password
+    environment:
+      DB_PASSWORD_FILE: /run/secrets/db_password   # app reads file, not env var
+
+secrets:
+  db_password:
+    external: true   # managed by Docker Swarm secrets or an external secrets manager
+```
+
+Application-side pattern (Node.js):
+```javascript
+const password = process.env.DB_PASSWORD_FILE
+  ? fs.readFileSync(process.env.DB_PASSWORD_FILE, 'utf8').trim()
+  : process.env.DATABASE_PASSWORD;  // fallback for local dev without secrets
+```
+
+## Compose for Local Development: Full Example
+
+```yaml
+# compose.override.yml — development
+services:
+  postgres:
+    ports:
+      - "5432:5432"    # expose to host for local DB tools (TablePlus, psql)
+
+  redis:
+    image: redis:7.2-alpine
+    healthcheck:
+      test: ["CMD", "redis-cli", "ping"]
+      interval: 5s
+      timeout: 2s
+      retries: 3
+
+  api:
+    build:
+      context: .
+      target: deps      # build only to the deps stage — skip production optimizations
+    volumes:
+      - .:/app
+      - api_node_modules:/app/node_modules
+    environment:
+      NODE_ENV: development
+      LOG_LEVEL: debug
+    depends_on:
+      postgres:
+        condition: service_healthy
+      redis:
+        condition: service_healthy
+
+volumes:
+  api_node_modules:
+```
diff --git a/skills/docker-and-containers/references/dockerfile-patterns.md b/skills/docker-and-containers/references/dockerfile-patterns.md
new file mode 100644
index 0000000..f04fae7
--- /dev/null
+++ b/skills/docker-and-containers/references/dockerfile-patterns.md
@@ -0,0 +1,149 @@
+# Dockerfile Patterns — Production Reference
+
+## Multi-Stage: Node.js Application
+
+```dockerfile
+# ---- Stage 1: build dependencies ----
+FROM node:20-slim AS deps
+WORKDIR /app
+COPY package.json package-lock.json ./
+RUN npm ci --omit=dev
+
+# ---- Stage 2: build (only if transpilation is needed) ----
+FROM node:20-slim AS builder
+WORKDIR /app
+COPY package.json package-lock.json ./
+RUN npm ci                          # includes devDependencies for build
+COPY . .
+RUN npm run build
+
+# ---- Stage 3: production image ----
+FROM node:20-slim AS production
+# Create non-root user before dropping privileges
+RUN useradd --uid 1001 --no-create-home --shell /bin/false appuser
+WORKDIR /app
+# Copy only what runtime needs
+COPY --from=deps --chown=appuser:appuser /app/node_modules ./node_modules
+COPY --from=builder --chown=appuser:appuser /app/dist ./dist
+COPY --chown=appuser:appuser package.json ./
+USER appuser
+EXPOSE 3000
+# Exec form — PID 1 receives SIGTERM
+ENTRYPOINT ["node", "dist/server.js"]
+```
+
+**Why three stages instead of two:**
+- `deps` installs only production deps (no devDependencies contamination in the final image)
+- `builder` installs all deps for compilation, its heavy `node_modules` is discarded
+- `production` receives only the compiled output and production deps
+- If there is no build step (pure JS, no transpilation), collapse `deps` and `builder` into one stage
+
+## Multi-Stage: Go Binary
+
+```dockerfile
+FROM golang:1.22-alpine AS builder
+WORKDIR /src
+COPY go.mod go.sum ./
+RUN go mod download                 # cached separately from source changes
+COPY . .
+RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o /app ./cmd/server
+
+# Distroless: no shell, no package manager, minimal attack surface
+FROM gcr.io/distroless/static-debian12:nonroot AS production
+COPY --from=builder /app /app
+EXPOSE 8080
+ENTRYPOINT ["/app"]
+```
+
+**When NOT to use distroless:** if your service requires `exec` for debugging (`kubectl exec -it`), use `gcr.io/distroless/base` with the `:debug` tag in non-production environments, or maintain a separate debug image. Never ship the debug image to production.
+
+## Single-Stage: Python Script (correct, simpler than multi-stage)
+
+```dockerfile
+FROM python:3.12-slim
+# Install OS deps and clean in one layer
+RUN apt-get update && apt-get install -y --no-install-recommends \
+        libpq-dev \
+    && rm -rf /var/lib/apt/lists/*
+# Create non-root user
+RUN useradd --uid 1001 --no-create-home appuser
+WORKDIR /app
+# Dependencies before source — cache pip install separately from code changes
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+# Drop privileges before copying app code
+USER appuser
+COPY --chown=appuser:appuser . .
+CMD ["python", "-m", "app.main"]
+```
+
+**Why single-stage here:** Python runs interpreted; the source IS the artifact. Multi-stage would copy the same files and achieve nothing except complexity.
+
+## Health Check Design
+
+```dockerfile
+# USELESS — checks that a port is bound, not that the app is ready
+HEALTHCHECK CMD curl -f http://localhost:8080/ || exit 1
+
+# BETTER — checks an endpoint that verifies actual application readiness
+HEALTHCHECK --interval=10s --timeout=5s --start-period=30s --retries=3 \
+    CMD curl -f http://localhost:8080/healthz || exit 1
+```
+
+The `/healthz` endpoint in application code should:
+1. Attempt a lightweight read from each critical dependency (DB ping, cache ping)
+2. Return 200 only when all dependencies respond within a tight timeout (100-500ms)
+3. Return 503 with a JSON body identifying which dependency failed
+4. NOT perform writes, migrations, or expensive queries — health checks fire every N seconds
+
+```javascript
+// Node.js example
+app.get('/healthz', async (req, res) => {
+  try {
+    await db.raw('SELECT 1');  // DB connectivity check
+    res.json({ status: 'ok', db: 'ok' });
+  } catch (err) {
+    res.status(503).json({ status: 'degraded', db: err.message });
+  }
+});
+```
+
+## Security Hardening Checklist
+
+```dockerfile
+# Read-only root filesystem: app cannot write to its own image layer
+# Requires writable volumes for any directory the app writes to (logs, tmp, uploads)
+# Set in docker run: --read-only, or in compose:
+#   read_only: true
+#   tmpfs:
+#     - /tmp
+#     - /var/run
+
+# Drop all capabilities, add back only what is needed
+# Set in compose:
+#   cap_drop:
+#     - ALL
+#   cap_add:
+#     - NET_BIND_SERVICE   # only if binding to port < 1024 as non-root (prefer port > 1024)
+
+# No new privileges escalation
+# Set in compose:
+#   security_opt:
+#     - no-new-privileges:true
+```
+
+## Image Tag Strategy
+
+```dockerfile
+# WRONG: non-deterministic across builds
+FROM node:latest
+FROM node:20
+
+# CORRECT (digest pinning — immutable):
+FROM node:20-slim@sha256:a1b2c3d4...
+
+# ACCEPTABLE in practice (version + variant, no mutable latest):
+FROM node:20.12.2-slim
+```
+
+For internal base images, use a registry mirror with immutable tag policies rather than digest pinning — digest changes on every security patch and requires automation to update. Mutable-but-versioned (`node:20.12.2`) balances determinism with maintainability for application images.
diff --git a/skills/feedback-mastery/SKILL.md b/skills/feedback-mastery/SKILL.md
index ea2e3dd..97006b8 100644
--- a/skills/feedback-mastery/SKILL.md
+++ b/skills/feedback-mastery/SKILL.md
@@ -1,110 +1,8 @@
 ---
 name: feedback-mastery
-description: Deliver high-stakes feedback, navigate difficult workplace conversations, and coach others through performance issues. Use when giving feedback, preparing for a hard 1:1, addressing conflict, managing underperformance, or coaching someone on receiving feedback. Keywords: feedback, difficult conversation, performance, conflict, behavior, 1:1, one-on-one, confrontation, underperformance, coaching.
-allowed-tools: Read, Glob, Grep
+description: "MERGED — use workplace-conversations instead. This skill has been merged into workplace-conversations which covers both feedback delivery and difficult conversation navigation."
 ---
 
-# Feedback Mastery
+# Merged
 
-## Mindset
-
-1. **Feedback is information transfer, not judgment delivery.** The moment the receiver perceives evaluation, their prefrontal cortex partially shuts down — defensiveness is physiological, not character flaw. Frame as "I noticed / I want to understand" not "you did X wrong."
-
-2. **Timing destroys or amplifies impact.** Feedback given within 48 hours of an event lands 3–4x more effectively than delayed feedback. "Annual review surprises" are managerial malpractice — they signal you withheld useful information for months.
-
-3. **Psychological safety is the prerequisite, not the outcome.** If the receiver doesn't feel safe, no framework saves you. Check safety first: if someone is visibly activated (defensive posture, clipped answers, raised voice), name it before proceeding.
-
-4. **The goal of feedback is behavior change, not catharsis.** If you need to vent, do it elsewhere. The conversation exists to help them, not to make you feel like you said something.
-
-5. **Pattern vs. instance matters more than severity.** A single bad code review matters less than 3 in a row. Leading with "I've noticed a pattern" signals seriousness without catastrophizing a one-time event.
-
-## Navigation
-
-**Use this skill when**:
-- Preparing to give feedback (corrective or positive) to a colleague or direct report
-- Navigating conflict, tension, or misaligned expectations with stakeholders
-- Coaching someone who received feedback poorly and needs to process it
-- Handling underperformance conversations where HR escalation is possible
-- Facilitating a conversation between two people in conflict
-
-**Do NOT use this skill when**:
-- Writing performance review prose (use a writing skill instead)
-- Termination conversations (require HR, legal review — out of scope)
-- Therapy or mental health support (refer appropriately)
-
-**Quick decision tree for ambiguous inputs:**
-```
-Is this corrective or positive feedback?
-  ├─ Corrective + first time → SBI + curiosity (explore root cause)
-  ├─ Corrective + recurring pattern → SBI + explicit stakes + plan
-  ├─ Corrective + HR-level severity → load references/difficult-conversation-scripts.md
-  └─ Positive → SBI + specific impact (skip the "but" — never pair positive/corrective)
-```
-
-## Philosophy
-
-Feedback is an act of respect — it assumes the person can change and is worth investing in. Withholding honest feedback to avoid discomfort is not kindness; it's abandonment. Deliver what's true, specifically, without cruelty.
-
-## NEVER
-
-- **NEVER use the feedback sandwich (positive-negative-positive)** — because research shows it trains people to distrust positive feedback and miss the corrective message. Receivers remember the bread, not the filling. Use pure SBI instead.
-
-- **NEVER give corrective feedback in public** — because social threat activates the same neural circuits as physical threat (fMRI studies by Lieberman, 2013). Public correction compounds shame and guarantees defensiveness; the person will protect ego, not process content.
-
-- **NEVER delay feedback to "find the right moment" indefinitely** — because memory encoding degrades within 72 hours. Waiting for perfect conditions means the specific behavior is no longer vivid to either party, and the conversation devolves into abstract debate about character.
-
-- **NEVER pair positive and corrective feedback in the same sentence with "but"** — because "but" neurologically erases everything before it. "You did great work on the API, but the tests were weak" lands as "your tests were weak." Use separate conversations or at minimum separate paragraphs with a full stop.
-
-- **NEVER give feedback when you or the receiver are emotionally flooded** — because cortisol and adrenaline impair complex cognition in both parties. A flooded conversation produces defensive agreements that don't stick. Reschedule explicitly: "I want to have this conversation when we're both at our best — can we do 10am tomorrow?"
-
-- **NEVER interpret behavior aloud without evidence** — because attributing motive ("you clearly don't care about quality") activates the fundamental attribution error and makes the receiver defend their character rather than examine their behavior. Stick to observable actions and their effects.
-
-- **NEVER skip the "impact" step of SBI** — because behavior without impact sounds like nitpicking. Impact connects the behavior to something the receiver actually cares about (team success, trust, project outcomes). Without it, feedback feels like personal preference, not professional necessity.
-
-## Core Technique: SBI (Extended Practitioner Version)
-
-Claude already knows the basic SBI framework. What it misses:
-
-**The "I" in Impact must connect to receiver values, not just your frustration.** If you know they care about team perception, connect the impact to that. If they care about code quality, connect there. Generic impact ("it slowed us down") lands weaker than specific impact ("it meant Sarah had to redo 3 hours of work and now doubts whether to surface issues").
-
-**SBI+ for recurring patterns:**
-> Situation → Behavior → Impact → **Expectation** → **Stakes**
-
-Add: "Going forward, I need [specific behavior]. If this continues, [concrete consequence]." The stakes must be real and proportional — don't threaten what you won't follow through on.
-
-**Positive SBI is not trivial.** Most managers skip it entirely or give generic praise. Specific positive SBI ("In Thursday's incident review, you asked the question no one else would — which unlocked the root cause and saved us 2 days") builds the safety account that makes corrective feedback land better later.
-
-## Timing Heuristics
-
-| Scenario | Optimal Timing | Why |
-|----------|---------------|-----|
-| In-meeting behavior | Within same day, privately | Memory sharp; behavior still vivid |
-| Code/work quality | Within 48 hours of delivery | Before they're deep in next task |
-| Interpersonal conflict | After 2–4 hours cooling, same day | Enough distance to avoid flood; close enough to be concrete |
-| Pattern (recurring issue) | Scheduled 1:1, not ad hoc | Signals importance, allows preparation |
-| Crisis/incident behavior | After incident closes, not during | During crisis, feedback = distraction |
-
-## When Things Go Wrong
-
-| Situation | Likely Cause | Recovery |
-|-----------|-------------|----------|
-| Receiver goes silent or shuts down | Social threat response activated; they feel ambushed or shamed | Name it: "I notice you've gone quiet — I want to make sure this feels like a conversation, not a verdict. What's your reaction?" |
-| Receiver becomes defensive / attacks back | Behavior was stated as interpretation, not observation; or they feel blindsided | Return to facts: "I want to make sure I'm working from what I actually observed. Can we back up to [specific event]?" |
-| Receiver agrees but nothing changes | Agreement was to end the discomfort, not from genuine buy-in | Explicit follow-through plan required: named actions, dates, check-in. "What specifically will you do differently by [date]?" |
-| You realize mid-conversation you lack enough specifics | Feedback was prepared based on impressions not evidence | Pause honestly: "I realize I should have more specific examples ready. Can we schedule 30 minutes tomorrow when I can come prepared?" |
-| Positive feedback lands flat / receiver dismisses it | Generic praise, or receiver distrusts your motives | Increase specificity: name exact action, exact effect, exact why it mattered. Vague praise is discounted; surgical praise lands. |
-
-## Reference Loading Triggers
-
-Load `references/difficult-conversation-scripts.md` when:
-- User needs exact opening lines, scripts, or word-for-word phrasing
-- Scenario involves HR-sensitivity, termination risk, or legal exposure
-- User is conflict-averse and needs scripted scaffolding to start
-
-Load `references/feedback-sbi-model.md` when:
-- User needs more SBI examples across different contexts (code reviews, meetings, deliverables)
-- User is learning the framework and needs worked examples
-
-Load `references/expectation-alignment.md` when:
-- Issue is stakeholder misalignment, scope creep, or "moving goalposts"
-- User needs to reset expectations without creating conflict
+This skill has been merged into `workplace-conversations`. Use that skill instead.
diff --git a/skills/git-workflow/SKILL.md b/skills/git-workflow/SKILL.md
new file mode 100644
index 0000000..a5a70f6
--- /dev/null
+++ b/skills/git-workflow/SKILL.md
@@ -0,0 +1,77 @@
+---
+name: git-workflow
+description: "Expert Git workflow guidance: branch strategy selection, rebase vs. merge trade-offs, bisect with test scripts, stash hygiene, worktree patterns, shallow clone pitfalls, reflog recovery, and precision index use. Trigger phrases: git strategy, branching model, rebase or merge, git bisect, worktree, stash, reflog, recover branch, partial staging."
+---
+
+# Git Workflow
+
+## Mindset
+
+- Branch strategy is a CI maturity decision, not a preference. The question is: can your pipeline validate a commit in under 10 minutes? If not, trunk-based development will drown your team in broken main.
+- The index is a precision instrument. Practitioners who understand `git add -p` and `git reset HEAD <file>` make reviewable commits; everyone else makes noise.
+- Rebase rewrites history — that is its power and its danger. On a private branch it's surgery; on a shared branch it's sabotage.
+- `git bisect` is only as good as your test script. A flaky test makes bisect produce wrong answers with high confidence — more dangerous than no bisect at all.
+- Reflog is your time machine, but it expires in 30 days by default. The window is not infinite; act within it.
+
+## Navigation
+
+**Use this skill when**: choosing a branching model, deciding rebase vs. merge, automating bisect, troubleshooting stash conflicts, running parallel work with worktrees, recovering from reset/delete/detached HEAD, or staging partial changes.
+
+**Do NOT use this skill when**: writing commit messages (use `commit-work`), creating PRs, or setting up CI pipelines — those warrant dedicated treatment.
+
+**Strategy decision tree**:
+```
+Team size ≤ 5 AND releases are continuous (no scheduled releases)?
+  └─ YES → GitHub Flow (short-lived feature branches, merge to main, deploy immediately)
+  └─ NO → Release cadence is scheduled (weekly/monthly/quarterly)?
+            └─ YES → Gitflow (develop, release/, hotfix/ branches for parallel support)
+            └─ NO → CI pipeline validates in < 10 min AND team practices feature flags?
+                      └─ YES → Trunk-Based Development (TBD) — single main, short-lived branches ≤ 2 days
+                      └─ NO → GitHub Flow with protected main (gate merges behind CI green)
+```
+
+**Rebase vs. merge decision tree**:
+```
+Branch is shared (others have checked it out or pushed to it)?
+  └─ YES → MERGE ONLY. Never rebase.
+  └─ NO → Want linear history for bisect and log readability?
+            └─ YES → Rebase onto target before merging (rebase locally, merge with --no-ff or fast-forward)
+            └─ NO → Preserving merge topology (shows when a feature was integrated)?
+                      └─ YES → Merge with --no-ff
+                      └─ NO → Fast-forward merge (linear, no merge commit)
+```
+
+## Philosophy
+
+Git's power is that every operation is reversible if you know where to look — the reflog, the index, the stash list. Expert Git practice is less about knowing commands and more about knowing which state you're in and what the escape routes are before you act.
+
+## NEVER
+
+- NEVER `git add -A` or `git add .` before inspecting untracked files — `git status` first, every time. Build artifacts, `.env` files, and IDE configs are silently swept in; once pushed, secrets require a full history rewrite to remove.
+- NEVER rebase a branch others have checked out — their local branches are built on the SHA history you're about to rewrite. The only resolution is `git pull --rebase` or force-push on their end, and they will miss it if you don't coordinate. This is how teams lose commits.
+- NEVER `git commit --amend` after push — amend rewrites the local tip commit SHA. The remote still has the original. Every subsequent `git push` will reject with "non-fast-forward." The only escape is `--force-with-lease`, which rewrites remote history. If others have already pulled, their branches now diverge from an SHA that no longer exists on origin.
+- NEVER `git reset --hard` without `git status` first — there is no recovery path for uncommitted changes after `--hard`. The working tree and index are destroyed silently. Reflog cannot recover unstaged changes — only committed work is recoverable.
+- NEVER `git stash pop` after switching branches without checking `git stash list` and noting which branch the stash was created on — stash pop applies the diff with no context awareness. Popping a stash from a different branch causes conflicts that are structurally wrong (not just textually wrong), and resolving them produces subtly broken code.
+- NEVER delete a branch with `git branch -d` and assume it's safe because `-d` "checks" it — `-d` only checks whether the branch is reachable from the current HEAD's branch. If the branch is merged into a different branch than the one you're currently on, `-d` will refuse but `-D` will silently delete unmerged work. Use `git branch --merged <target>` to verify against the actual target branch first.
+- NEVER run `git bisect` on a shallow clone (`--depth N`) — shallow clones truncate history. Bisect will exhaust the available commits and either error out or identify the wrong commit as the culprit because the actual introducing commit is below the shallow boundary.
+
+## When Things Go Wrong
+
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| `git push` rejected after commit | Local and remote diverged (likely a prior amend or reset) | `git fetch && git log --oneline origin/branch..HEAD` to see what diverged; if yours is right, `git push --force-with-lease` (not `--force`) |
+| Detached HEAD after `git checkout <SHA>` | You checked out a commit directly, not a branch | `git branch recover-work` while still detached — this names the current state; then `git checkout recover-work` |
+| `git stash pop` caused unexpected conflicts | Stash was created on a different branch with different base | `git stash drop` after careful inspection; re-apply changes manually with `git diff stash@{0}` |
+| Bisect points to a merge commit as "bad" | The regression was introduced in a branch merged here | `git bisect` the source branch directly: `git bisect start <merge-commit> <merge-base>` |
+| Branch deleted before verifying merge | Panicked `-D` or automated cleanup | `git reflog` to find the last SHA on that branch; `git branch recover-name <SHA>` |
+| Reset --hard wiped uncommitted work | No stash, no commit | Unstaged changes: unrecoverable. Staged changes: `git fsck --lost-found` may surface dangling blobs if the index was written |
+
+## Detailed References
+
+Heavy reference content is in `references/`:
+
+- [Branch strategies deep dive](references/branch-strategies.md) — trunk-based CI requirements, gitflow release mechanics, GitHub Flow edge cases
+- [Bisect mastery](references/bisect-mastery.md) — automated bisect scripts, flaky test handling, merge-commit bisect
+- [Worktree patterns](references/worktree-patterns.md) — when worktrees beat stash, concurrent build patterns, CI simulation
+- [Reflog and recovery](references/reflog-recovery.md) — complete recovery playbook for every common disaster scenario
+- [Index precision](references/index-precision.md) — partial staging, hunk splitting, resetting individual files
diff --git a/skills/git-workflow/references/bisect-mastery.md b/skills/git-workflow/references/bisect-mastery.md
new file mode 100644
index 0000000..546e580
--- /dev/null
+++ b/skills/git-workflow/references/bisect-mastery.md
@@ -0,0 +1,134 @@
+# Git Bisect Mastery
+
+## Automated Bisect with a Test Script
+
+The most powerful bisect usage runs a script instead of manual good/bad marking. The script must exit 0 for "good", non-zero for "bad", and 125 to skip the commit (skip is critical for handling build failures mid-bisect).
+
+```bash
+git bisect start
+git bisect bad HEAD                  # current commit is bad
+git bisect good v2.3.0               # last known good tag or SHA
+
+git bisect run ./scripts/test-regression.sh
+```
+
+**`test-regression.sh` structure**:
+```bash
+#!/usr/bin/env bash
+set -e
+
+# Build the project — if it fails, skip (exit 125) rather than marking bad
+make build 2>/dev/null || exit 125
+
+# Run the specific test for the regression
+# Exit 0 = good, non-zero = bad
+python -m pytest tests/test_payment_processing.py::test_refund_calculation -x -q
+exit_code=$?
+
+exit $exit_code
+```
+
+**Critical exit code contract**:
+- `0` — commit is good (test passes)
+- `1`–`124`, `126`–`127` — commit is bad (test fails)
+- `125` — skip this commit (build broken, unrelated failure, or flaky test hit — don't mark it)
+
+---
+
+## Handling Flaky Tests During Bisect
+
+Flaky tests are the silent bisect killer. If a test is flaky at the 10% rate, bisect will produce a wrong answer roughly once per 7 bisect steps — which means on a 1000-commit range (10 bisect steps), you will likely get a wrong answer.
+
+**Strategies**:
+
+### Retry wrapper
+```bash
+#!/usr/bin/env bash
+# Retry up to 3 times; only mark bad if all attempts fail
+for i in 1 2 3; do
+    python -m pytest tests/test_feature.py -x -q && exit 0
+done
+exit 1
+```
+This reduces flake impact at the cost of 2x–3x runtime per bisect step.
+
+### Skip on known flaky conditions
+```bash
+#!/usr/bin/env bash
+# If the test infrastructure isn't available, skip rather than marking bad
+curl -sf http://localhost:8080/health > /dev/null 2>&1 || exit 125
+python -m pytest tests/test_feature.py -x -q
+```
+
+### Validate bisect result independently
+After bisect identifies a commit, **always verify manually**:
+```bash
+git checkout <bisect-identified-bad-commit>
+# Run the test 5 times
+for i in $(seq 5); do python -m pytest tests/test_feature.py -x -q; done
+```
+If it passes sometimes, bisect was fooled by a flaky test. Restart bisect with a retry wrapper.
+
+---
+
+## When Bisect Points to a Merge Commit
+
+Bisect points to a merge commit when the regression was introduced in a branch but the individual commits in that branch were not bisected — only the merge commit itself was marked as the boundary.
+
+**What this means**: The regression exists somewhere in the commits that were merged, not in the merge commit itself (unless the merge had conflicts that were resolved incorrectly).
+
+**Recovery procedure**:
+```bash
+# bisect told us this merge commit is the first bad:
+# abc1234 Merge branch 'feature/payment-v2' into main
+
+# Find the parent SHAs of the merge commit
+git log --merges --oneline -1 abc1234
+git cat-file -p abc1234 | grep ^parent
+# parent def5678   (first parent = main before merge)
+# parent ghi9012   (second parent = tip of the merged branch)
+
+# Bisect the source branch directly
+git bisect start
+git bisect bad ghi9012              # tip of the merged branch (bad)
+git bisect good def5678             # main before the merge (good — doesn't have the feature at all)
+git bisect run ./scripts/test-regression.sh
+```
+
+**The merge conflict resolution case**: If the regression was introduced during conflict resolution (not in a commit in either branch), bisect within the branch won't find it. Check the merge commit diff directly:
+```bash
+git diff def5678..abc1234 -- path/to/relevant/file
+# Or compare both parents to the merge result:
+git diff def5678 abc1234
+git diff ghi9012 abc1234
+```
+Hunks that differ from both parents are the merge resolution — examine those first.
+
+---
+
+## Bisect with Shallow Clones
+
+Do not attempt bisect on a shallow clone. The history is truncated; bisect will either fail with "Not a valid object name" when it tries to check out commits below the shallow boundary, or worse, it will identify the oldest available commit as "bad" because the actual introducing commit is not present.
+
+**Detection**:
+```bash
+git rev-parse --is-shallow-repository
+# "true" = shallow clone, bisect will be unreliable
+```
+
+**Fix**:
+```bash
+git fetch --unshallow
+# or fetch enough history to cover the good..bad range:
+git fetch --depth=500
+```
+
+---
+
+## Bisect Cleanup
+
+Always end a bisect session properly:
+```bash
+git bisect reset         # returns to original HEAD and cleans bisect state
+```
+Failing to reset leaves `BISECT_HEAD` and related state files in `.git/`, which will cause confusing errors on next checkout.
diff --git a/skills/git-workflow/references/branch-strategies.md b/skills/git-workflow/references/branch-strategies.md
new file mode 100644
index 0000000..b94a0c7
--- /dev/null
+++ b/skills/git-workflow/references/branch-strategies.md
@@ -0,0 +1,58 @@
+# Branch Strategy Deep Dive
+
+## Trunk-Based Development (TBD)
+
+### Requirements (non-negotiable)
+TBD fails without these — teams that try it without them end up with a permanently broken main:
+
+1. **CI pipeline < 10 minutes** — developers must get a green/red signal before moving to the next task. 20+ minute pipelines cause developers to push without waiting, which defeats the whole model.
+2. **Feature flags** — incomplete features must be hidden behind flags, not branches. Without flags, TBD forces you to either ship incomplete features or delay commits, which negates the trunk benefit.
+3. **Branch life ≤ 2 days** — if a branch lives longer than 2 days, merge conflicts compound faster than the team resolves them. The rule is not arbitrary; it's the empirical threshold where integration pain exceeds the protection benefit.
+4. **Comprehensive automated test suite** — trunk is only safe if a broken commit is caught before others pull it. Manual QA gates defeat the model.
+
+### Non-obvious failure modes
+- **Monorepo + TBD without module isolation**: changing a shared dependency breaks all consumers simultaneously on main. Mitigate with internal package versioning or explicit API contracts even for internal libs.
+- **TBD with no deploy pipeline**: TBD assumes every green main commit is deployable. If deployment is manual, you'll cherry-pick commits eventually, which reintroduces branch management overhead.
+
+---
+
+## Gitflow
+
+### When it's actually correct
+Gitflow gets dismissed as overengineered, but it's the right model when:
+- You support **multiple active release versions simultaneously** (e.g., v2.x and v3.x both get security patches)
+- Releases are **date-gated** (quarterly, compliance-driven, or tied to external partners) with a stabilization period
+- Teams are **large enough** that direct trunk commits create merge contention on main
+
+### The non-obvious mechanics
+- `develop` is not a "staging" branch — it's an integration branch. Treating it as staging causes teams to test on develop instead of in isolated environments, which pollutes develop with debugging commits.
+- `release/` branches exist to **freeze scope** while allowing bug fixes. The correct flow is: branch `release/x.y` from develop, only bug-fixes go into release, merge release into both main AND develop on completion. Many teams skip the back-merge to develop, then rediscover the bug in the next release cycle.
+- `hotfix/` branches must merge to both main AND develop. Skipping the develop merge means the hotfix is in production but not in the next scheduled release — a category of regression that's hard to diagnose months later.
+
+### Team size threshold
+Gitflow adds overhead. Under ~8 engineers, the branch management cost typically exceeds the isolation benefit. The inflection point is when two engineers can simultaneously be working on features that conflict — small teams can coordinate verbally.
+
+---
+
+## GitHub Flow
+
+### The edge cases nobody documents
+
+**Long-running PRs and drift**: GitHub Flow assumes short-lived branches, but product work doesn't always cooperate. A PR open for 2+ weeks will drift significantly from main. The convention `git fetch origin && git rebase origin/main` keeps it current, but rebasing a PR branch you've already force-pushed once means the remote PR history becomes fragmented in code review tools.
+
+**Hotfix pattern**: GitHub Flow has no explicit hotfix branch. The correct approach is to branch from main (not from an existing feature branch), push directly to a `hotfix/desc` branch, open a PR, get expedited review, merge. Don't tag it differently — the merge commit to main IS the record.
+
+**The "squash merge" trap**: GitHub Flow repos often enable squash-merge. Squash merge produces clean history but destroys individual commit context. `git bisect` on a squash-merged repo will identify a squash commit as bad, but that commit may encompass 40 individual changes — you've narrowed to a PR, not a change. Use squash merge only if your PRs are already single-unit commits.
+
+---
+
+## Choosing Between Models: The Real Criteria
+
+| Criterion | TBD | GitHub Flow | Gitflow |
+|-----------|-----|-------------|---------|
+| CI < 10 min | Required | Helpful | Not required |
+| Feature flags in codebase | Required | Optional | Not needed |
+| Multiple supported versions | No | No | Yes |
+| Scheduled release dates | No | No | Yes |
+| Team size | Any (with CI) | ≤ 20 | Large or distributed |
+| Release cadence | Continuous | Weekly or faster | Monthly+ |
diff --git a/skills/git-workflow/references/index-precision.md b/skills/git-workflow/references/index-precision.md
new file mode 100644
index 0000000..a1964c8
--- /dev/null
+++ b/skills/git-workflow/references/index-precision.md
@@ -0,0 +1,134 @@
+# Index Precision: The Staging Area as a Tool
+
+## The Core Concept
+
+The index is a buffer between your working tree and the object store. Most developers treat it as a formality ("staging area = just run git add"). Expert usage treats it as a precision assembly tool: you construct the exact snapshot you want committed, independent of what the working tree currently looks like.
+
+---
+
+## Partial Staging with `git add -p`
+
+`git add -p` (patch mode) breaks each changed file into hunks and asks about each one individually. This lets you commit a subset of your changes — half a file, specific functions, everything except the debug print statements.
+
+```bash
+git add -p [file]    # omit file to review all changed tracked files
+```
+
+**Hunk commands** (type `?` during add -p for the full list):
+- `y` — stage this hunk
+- `n` — skip this hunk
+- `s` — split the hunk into smaller hunks (if Git thinks the gap between changes is large enough)
+- `e` — manually edit the hunk (most powerful — opens the diff in $EDITOR)
+- `q` — quit, leaving remaining hunks unstaged
+
+**The `e` (edit) command** is the escape hatch when hunks are too coarse. You edit the raw diff format:
+- Lines starting with `+` are additions — remove a `+` line to not stage that addition
+- Lines starting with `-` are deletions — change `-` to ` ` (space) to not stage that deletion
+- Do not change context lines (starting with ` `)
+
+---
+
+## Resetting Individual Files Without Touching Working Tree
+
+`git reset HEAD <file>` (or `git restore --staged <file>` in newer Git) removes a file from the index without changing the working tree. Use this when you've staged something you didn't intend to.
+
+```bash
+# Unstage a file (but keep working tree changes)
+git restore --staged path/to/file.py
+# equivalent older form:
+git reset HEAD path/to/file.py
+
+# Unstage specific hunks (reverse of git add -p)
+git restore -p --staged path/to/file.py
+# or:
+git checkout -p HEAD -- path/to/file.py
+```
+
+---
+
+## Restoring Working Tree from Index (or Commit)
+
+```bash
+# Discard working tree changes to a file, restoring to what's staged
+git restore path/to/file.py
+
+# Discard working tree changes and staged changes, restoring to HEAD
+git restore --staged --worktree path/to/file.py
+# equivalent older form:
+git checkout HEAD -- path/to/file.py
+```
+
+**Critical distinction**: `git checkout HEAD -- file` overwrites BOTH the index and working tree. `git restore --staged file` only resets the index. Choose based on whether you want to preserve the working tree state.
+
+---
+
+## Viewing the Index State
+
+```bash
+git diff --cached           # diff between index and HEAD (what you're about to commit)
+git diff                    # diff between working tree and index (unstaged changes)
+git diff HEAD               # diff between working tree and HEAD (all changes)
+
+# Show what's staged with filenames only:
+git diff --cached --name-only
+
+# Show the actual content of a file as it exists in the index:
+git show :path/to/file.py
+```
+
+---
+
+## Common Precision Index Patterns
+
+### Pattern: Split one file's changes into two commits
+
+```bash
+# File has both a bug fix and a refactor mixed together
+git add -p auth.py
+# Stage only the bug fix hunks → commit 1: "fix: null check on user token"
+git commit -m "fix: null check on user token"
+
+# Now stage the refactor
+git add auth.py
+git commit -m "refactor: extract token validation to separate method"
+```
+
+### Pattern: Commit with a generated file excluded
+
+```bash
+git add src/           # stage everything in src/
+git restore --staged src/generated/api_client.py  # unstage the generated file
+git commit -m "feat: add payment endpoint"
+# generated file remains modified in working tree but is not committed
+```
+
+### Pattern: Stage a deleted file but keep it locally
+
+```bash
+# You want to stop tracking a file but keep the local copy (e.g., config with secrets)
+git rm --cached path/to/config.local
+echo "config.local" >> .gitignore
+git add .gitignore
+git commit -m "chore: untrack local config, add to gitignore"
+# config.local still exists locally, just no longer tracked
+```
+
+---
+
+## Index State During Conflicts
+
+During a merge or rebase conflict, the index holds three versions of each conflicted file (stages 1, 2, 3):
+
+```bash
+git ls-files -u             # list all unmerged (conflict) entries with stage numbers
+# 100644 abc1234 1    path/to/file  (common ancestor = base)
+# 100644 def5678 2    path/to/file  (ours = current branch)
+# 100644 ghi9012 3    path/to/file  (theirs = incoming branch)
+
+# Extract any version manually:
+git show :1:path/to/file    # ancestor
+git show :2:path/to/file    # ours
+git show :3:path/to/file    # theirs
+```
+
+This is the mechanism that merge tools use. You can use it to manually resolve conflicts by comparing all three versions before editing.
diff --git a/skills/git-workflow/references/reflog-recovery.md b/skills/git-workflow/references/reflog-recovery.md
new file mode 100644
index 0000000..e936650
--- /dev/null
+++ b/skills/git-workflow/references/reflog-recovery.md
@@ -0,0 +1,147 @@
+# Reflog and Recovery
+
+## What Reflog Is (and Is Not)
+
+Reflog records every time a ref (branch, HEAD) moves — commits, checkouts, resets, rebases, merges, cherry-picks. It is local only — `git push` does not push reflog entries, and cloning does not copy them.
+
+**Reflog expires**: default expiry is 90 days for reachable entries, 30 days for unreachable (dangling) entries. Configure with `gc.reflogExpire` and `gc.reflogExpireUnreachable`.
+
+**What reflog CAN recover**:
+- Commits that were on a branch before reset
+- Deleted branches (until garbage collected)
+- Detached HEAD state
+- Commits overwritten by `--amend`
+
+**What reflog CANNOT recover**:
+- Uncommitted changes destroyed by `reset --hard` or `checkout --`
+- Changes never added to the index (untracked files deleted by `git clean`)
+- Entries older than the expiry window
+
+---
+
+## Recovery Playbook
+
+### Scenario 1: Detached HEAD — work committed to no branch
+
+```bash
+# You committed work while in detached HEAD state
+# git checkout main moved HEAD and you've "lost" the commits
+
+git reflog
+# 5a3b2c1 HEAD@{0}: checkout: moving from 9e8d7f6 to main
+# 9e8d7f6 HEAD@{1}: commit: add rate limiting logic
+# 4f3e2d1 HEAD@{2}: commit: refactor token validation
+
+# The SHA you want is 9e8d7f6 (the most recent commit before checkout)
+git branch recover-detached-work 9e8d7f6
+git checkout recover-detached-work
+```
+
+### Scenario 2: Branch deleted before merge
+
+```bash
+git branch -D feature/my-work   # oops
+
+# Find the last commit SHA on that branch
+git reflog | grep "feature/my-work"
+# or scan for the commit message:
+git reflog | grep "my-work"
+# abc1234 HEAD@{14}: commit: implement payment webhook handler
+
+git branch feature/my-work abc1234
+```
+
+### Scenario 3: `git reset --hard` wiped commits
+
+```bash
+git reset --hard HEAD~5   # removed 5 commits
+
+git reflog
+# current HEAD@{0}: reset: moving to HEAD~5
+# abc1234 HEAD@{1}: commit: fifth commit
+# def5678 HEAD@{2}: commit: fourth commit
+# ...
+
+# Restore to before the reset:
+git reset --hard abc1234   # SHA of the commit before reset
+```
+
+### Scenario 4: Rebase went wrong — restore pre-rebase state
+
+Git creates a `ORIG_HEAD` ref before any destructive operation (rebase, reset, merge). If reflog is confusing, check ORIG_HEAD first:
+
+```bash
+git rebase main   # something went wrong
+
+# Option A: abort if rebase is still in progress
+git rebase --abort
+
+# Option B: reset to pre-rebase state if already completed
+git reset --hard ORIG_HEAD
+```
+
+For rebases that completed but produced wrong results, use reflog:
+```bash
+git reflog | grep "rebase"
+# Find the entry just before "rebase (start)"
+git reset --hard HEAD@{N}   # where N is the step before the rebase started
+```
+
+### Scenario 5: `--amend` created an orphan (especially in shallow clones)
+
+```bash
+# You amended a commit, then pushed --force; someone else has the original SHA
+# Or: you're in a shallow clone and amend created an orphan
+
+git reflog
+# abc1234 HEAD@{0}: commit (amend): updated commit message
+# def5678 HEAD@{1}: commit: original commit
+
+# The original commit is def5678 — it's still in local reflog
+# To recover it as a branch:
+git branch recover-original def5678
+```
+
+---
+
+## Reflog Navigation
+
+```bash
+git reflog                  # HEAD movements
+git reflog show main        # movements of the main branch specifically
+git reflog show --all       # all refs
+
+# Time-based navigation
+git checkout HEAD@{2.hours.ago}
+git checkout main@{yesterday}
+
+# Show with timestamps
+git reflog --date=relative
+git reflog --date=iso
+```
+
+---
+
+## Shallow Clone Recovery Procedures
+
+Shallow clones (created with `--depth N`) have truncated history. The truncation point is marked with a "grafts" mechanism in newer Git. Issues that arise:
+
+**Bisect fails**: `git bisect` needs to traverse history to the good commit. If that commit is below the shallow boundary, bisect reports "Not a valid object name."
+
+**Amend creates orphan**: In a shallow clone, the commit behind the tip is synthetic. Amending the tip commit creates a new SHA; the old SHA is still referenced on the remote. A subsequent `push --force` replaces the remote's tip but the original is still reachable via fetch-by-SHA:
+```bash
+# Recover the original from remote after a force-push in shallow clone
+git fetch origin <original-SHA>
+git branch recover-from-orphan FETCH_HEAD
+```
+
+**Detection and fix**:
+```bash
+git rev-parse --is-shallow-repository
+# "true" = shallow
+
+# Deepen to cover the range you need:
+git fetch --depth=1000
+# Or fully unshallow:
+git fetch --unshallow
+```
diff --git a/skills/git-workflow/references/worktree-patterns.md b/skills/git-workflow/references/worktree-patterns.md
new file mode 100644
index 0000000..f00f290
--- /dev/null
+++ b/skills/git-workflow/references/worktree-patterns.md
@@ -0,0 +1,117 @@
+# Worktree Patterns
+
+## When Worktrees Beat Stash + Branch Switch
+
+Stash + branch switch is optimized for "save my work for 5 minutes." Worktrees are optimized for "I need two or more complete working states simultaneously." The break-even is roughly: does your other work require a build, test run, or server process?
+
+| Situation | Use stash | Use worktree |
+|-----------|-----------|--------------|
+| Quick hotfix while mid-feature | Yes, if the fix is < 30 min | Yes, if it needs a full build |
+| Reviewing a PR while working on your own branch | No — both need build artifacts | Yes — each worktree has its own build output |
+| Running tests on two branches simultaneously | No — single working tree | Yes — both test suites run independently |
+| CI simulation locally | No | Yes — mount the worktree as the CI would see it |
+| Long-lived parallel work (> 1 day) | No — stash is not meant for this | Yes — name it descriptively |
+
+## Creating and Using Worktrees
+
+```bash
+# Create a worktree for a new branch
+git worktree add ../my-project-hotfix hotfix/auth-bypass
+
+# Create a worktree for an existing remote branch
+git worktree add ../my-project-review origin/feature/payments --track
+
+# List all worktrees
+git worktree list
+
+# Remove a worktree (clean up)
+git worktree remove ../my-project-hotfix
+# or if the worktree directory was manually deleted:
+git worktree prune
+```
+
+**Naming convention**: use a parallel directory structure:
+```
+~/projects/
+  my-project/          # main working tree
+  my-project-hotfix/   # worktree for hotfix
+  my-project-pr-142/   # worktree for reviewing PR #142
+```
+
+## Worktree Constraints (Non-Obvious)
+
+- **A branch can only be checked out in one worktree at a time.** Attempting to check out a branch that's already active in another worktree will fail with "already checked out." You must either remove that worktree or create a new branch from it.
+- **Worktrees share the `.git` directory** (as a pointer file to the main repo's `.git`). This means: stashes, refs, and config are shared. A `git stash` in one worktree is visible in all others.
+- **Build artifacts are NOT shared** — each worktree has its own working tree directory, so compiled output, `node_modules`, `.venv`, etc. are independent. This is the entire point for concurrent build scenarios.
+- **Submodules are not initialized in new worktrees** — run `git submodule update --init --recursive` in the new worktree if your project uses submodules.
+
+## CI Local Simulation Pattern
+
+The canonical worktree use case for CI simulation:
+
+```bash
+# Create a clean worktree from the PR branch with no local modifications
+git worktree add /tmp/ci-sim origin/feature/my-pr
+
+# Run tests exactly as CI would — no local .env, no cached artifacts
+cd /tmp/ci-sim
+git clean -fdx          # remove all untracked/ignored files (CI starts clean)
+npm ci                  # install from lockfile, not cache
+npm test
+
+# Clean up
+cd -
+git worktree remove /tmp/ci-sim
+```
+
+This catches environment contamination bugs — cases where tests pass locally due to a `.env`, local config file, or cached build artifact that won't exist in CI.
+
+## Stash Hygiene
+
+### Named stashes are mandatory for anything non-trivial
+
+```bash
+# Bad — impossible to know what this is 3 days later
+git stash
+
+# Good — inspectable and intentional
+git stash push -m "WIP: auth refactor, token validation incomplete" --include-untracked
+```
+
+### `--include-untracked` is almost always what you want
+
+Plain `git stash` does not stash untracked files. New files you've created are left behind. If you switch branches, those new files are now in the wrong branch context. Use `--include-untracked` as the default unless you intentionally want to leave new files behind.
+
+```bash
+git stash push --include-untracked -m "WIP: feature name"
+```
+
+### Before popping, check your stash list and context
+
+```bash
+git stash list
+# stash@{0}: On feature/auth: WIP: auth refactor, token validation incomplete
+# stash@{1}: On main: quick experiment with rate limiting
+
+# Confirm you're on the right branch before popping
+git branch --show-current
+git stash show stash@{0}   # view the diff summary without applying
+git stash show -p stash@{0}  # view the full diff
+```
+
+Popping `stash@{1}` (created on main) while on `feature/auth` will conflict on any file that both the stash and feature branch touched. The conflicts are not wrong code — they're the right code in the wrong context, which is harder to detect than a normal merge conflict.
+
+### When pop creates unexpected conflicts: recovery
+
+```bash
+# git stash pop failed with conflicts
+# Option A: abort and apply manually
+git checkout -- .          # discard the conflicted merge attempt
+git stash show -p stash@{0} > /tmp/stash.patch
+# review /tmp/stash.patch, apply manually
+
+# Option B: stage the conflict resolution and drop the stash
+# (resolve conflicts, then:)
+git add -A
+git stash drop stash@{0}   # stash is consumed — do NOT pop again
+```
diff --git a/skills/monorepo-workspace/SKILL.md b/skills/monorepo-workspace/SKILL.md
new file mode 100644
index 0000000..7e0b4f4
--- /dev/null
+++ b/skills/monorepo-workspace/SKILL.md
@@ -0,0 +1,139 @@
+---
+name: monorepo-workspace
+description: Expert workspace orchestration for monorepos. Covers tool selection (Nx vs Turborepo vs pnpm workspaces vs Lerna), build graph and affected computation, shared config inheritance, dependency graph discipline, publishing strategy (independent vs fixed versioning), remote caching, TypeScript internal package references, and package granularity sizing. Trigger phrases: "set up a monorepo", "nx vs turborepo", "affected packages", "workspace config", "monorepo build", "remote cache", "internal packages", "circular dependency in monorepo", "workspace versioning".
+license: MIT
+metadata:
+  version: 1.0.0
+---
+
+# Monorepo Workspace
+
+## Mindset
+
+- **The build graph is the product** — the workspace tool is irrelevant; what matters is whether the dependency graph accurately reflects your actual code graph. A mismatched graph produces false-positive "affected" runs or, worse, silently skips rebuilds. Keep them in sync by treating package.json `dependencies` as a code contract, not metadata.
+- **Coarse packages beat fine-grained every time** — splitting one domain into ten packages to achieve "isolation" means ten versions of `tsconfig.json`, ten places to add a lint rule, and ten entries in changelogs for every cross-cutting change. One package per domain is almost always the right boundary. Split only when release cadences truly diverge or when a package needs to be published independently.
+- **Shared utilities are a dependency magnet** — every file you add to a `shared/` or `utils/` package marks every consumer as "affected" on every change. Keep shared packages small and stable; a file that changes weekly belongs in the package that owns the domain, not in a shared utility.
+- **`workspace:` protocol is a build-time contract, not a publishing one** — package managers resolve `workspace:*` at install time, but bundlers and `tsc` don't understand it at publish time. Published packages must be built to concrete versions; internal-only packages can stay on `workspace:`.
+- **Incremental CI is the only reason to have a monorepo** — if you're running all tests on every commit, you've adopted 100% of the monorepo complexity with 0% of the benefit. Affected computation is non-optional.
+
+## Navigation
+
+**Use this skill when**:
+- Choosing or migrating between Nx, Turborepo, pnpm workspaces, or Lerna
+- Debugging why "affected" computation marks too many or too few packages
+- Setting up shared tsconfig / ESLint config across packages
+- Detecting or resolving circular dependencies between workspace packages
+- Deciding between independent and fixed versioning strategies
+- Configuring remote caching (Nx Cloud, Turborepo Remote Cache, self-hosted)
+- Troubleshooting TypeScript path aliases vs `workspace:` protocol issues
+- Sizing package granularity (too many vs too few packages)
+
+**Do NOT use this skill when**:
+- Updating individual package versions within an existing monorepo — use **dependency-updater** for that
+- Setting up a single-package repository with no workspace tooling
+- Questions about container orchestration or service meshes (different kind of "mono")
+
+**Tool selection decision tree**:
+```
+What is your primary constraint?
+├── Polyglot (Go, Rust, Python alongside JS/TS)?
+│   └── → Nx  (generator ecosystem + polyglot executor support)
+├── Pure JS/TS, want minimal config, fastest cold-cache?
+│   └── → Turborepo  (zero-config cache, simpler mental model)
+├── No build orchestrator wanted, just package linking?
+│   └── → pnpm workspaces alone  (workspace: protocol, no task runner overhead)
+└── Existing Lerna repo?
+    ├── Publishing-only use case → keep Lerna for versioning, add Turbo for tasks
+    └── Greenfield → migrate off Lerna; it's in maintenance mode
+```
+
+**Versioning strategy decision tree**:
+```
+Do all packages in the repo share a single release?
+├── Yes (e.g., a design system with a single version promise)
+│   └── Fixed versioning — acceptable; changelogs will be noisy for small repos
+└── No (packages have independent release cadences)
+    └── Independent versioning — required; fixed versioning forces semver lies
+```
+
+## Philosophy
+
+A monorepo is a discipline problem, not a tooling problem. The tools enforce what the team already agreed on. Pick the simplest tool that accurately computes the affected set and caches task outputs; spend the saved time on keeping the dependency graph honest.
+
+## NEVER
+
+- **NEVER import across package boundaries using relative paths** (`../../other-package/src/index.ts`) — relative imports bypass the workspace dependency graph entirely, are invisible to `nx affected` / `turbo --filter`, and silently break when packages are moved or renamed. Always import by the package name declared in `package.json` (`import { x } from '@acme/utils'`).
+- **NEVER add a shared utility to the root `package.json`** — root-level dependencies become implicit globals for every package in the workspace. When a package is published or extracted, the dependency disappears without warning. Declare all runtime dependencies in the consuming package's own `package.json`.
+- **NEVER ignore circular dependency warnings** — circular deps cause Webpack/esbuild/tsc to non-deterministically choose which module to evaluate first. The failure mode is intermittent: it works in dev (warm module cache) and breaks in CI (cold build). By the time you see it fail, the graph is already tangled and untangling it requires refactoring across multiple packages simultaneously.
+- **NEVER use fixed versioning when packages have independent release cadences** — fixed versioning bumps every package to the same version on every release. A one-line change in `@acme/button` forces `@acme/data-layer` to publish a new version with no actual changes. Consumers subscribe to changelogs and receive noise, eroding trust in semver signals.
+- **NEVER run `npm install` (or `yarn install`) inside an individual workspace package directory** — running install at the package level creates a nested `node_modules` that shadows the hoisted root dependencies. You get two copies of React, two copies of TypeScript, peer-dependency mismatches, and builds that differ between developers depending on what order they ran commands. Always install from the workspace root.
+- **NEVER skip affected computation by running all tasks on every CI commit** — in a 50-package monorepo, running the full test suite on a one-line doc change is the reason teams abandon monorepos. Affected computation is the value proposition; bypassing it to "be safe" is the same as not having a monorepo at all.
+- **NEVER use TypeScript path aliases (`paths` in tsconfig) as a substitute for `workspace:` references in published packages** — path aliases are resolved by the TypeScript compiler during local development but are invisible to consumers who install the published package. At publish time, the import resolves to a path that doesn't exist in the npm tarball. Use path aliases only for internal app packages that are never published.
+
+## When Things Go Wrong
+
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| `nx affected` marks every package on every PR | A shared utility package is touched frequently; all consumers are downstream | Audit the shared package's change frequency; move volatile files to the owning domain package; split stable helpers into a truly-stable `@acme/core` |
+| Turborepo cache never hits in CI | `outputs` in `turbo.json` doesn't match what the task actually writes, or the task reads env vars not listed in `env` | Add the missing env vars to `globalEnv` or task-level `env`; verify `outputs` globs with `turbo run build --dry=json` |
+| `tsc` succeeds locally but fails in CI with "cannot find module '@acme/utils'" | Path aliases work locally (tsconfig `paths`) but the package isn't installed in CI because it's not in `dependencies` | Add `"@acme/utils": "workspace:*"` to the consuming package's `dependencies`; never rely solely on path aliases for cross-package imports |
+| Circular dependency causes intermittent build failure | Two packages import from each other, even indirectly | Run `nx graph` or `madge --circular --extensions ts src/` to visualize; extract the shared interface/type to a third package that neither depends on the other |
+| Published package missing files at runtime | `package.json` `files` field doesn't include build output directory | Audit `files` array; add `dist/` or equivalent; verify with `npm pack --dry-run` before publishing |
+| `workspace:*` in a published package causes install failure for consumers | `workspace:` protocol was not replaced with a concrete version before publishing | Ensure publish pipeline runs `pnpm publish` (which auto-replaces `workspace:*`) or add a `prepack` script to rewrite versions |
+
+---
+
+## Tool Deep-Dives
+
+See [`references/tool-comparison.md`](references/tool-comparison.md) for decision criteria, configuration examples, and migration notes for Nx, Turborepo, pnpm workspaces, and Lerna.
+
+See [`references/shared-config-patterns.md`](references/shared-config-patterns.md) for tsconfig extends chains, ESLint flat config layout, and the "config package" pattern.
+
+See [`references/remote-cache-setup.md`](references/remote-cache-setup.md) for Nx Cloud, Turborepo Remote Cache, and self-hosted cache configuration.
+
+---
+
+## Package Granularity Sizing
+
+**The right question is not "should this be its own package?" but "does this have a different release cadence or a different set of consumers than its sibling code?"**
+
+Signs a package is too fine-grained:
+- More than one package changes in every PR
+- The package has only one consumer and that consumer is in the same repo
+- The package contains fewer than ~5 source files
+- Changelog entries always appear in pairs or triples across packages
+
+Signs a package is too coarse:
+- Different teams own different subdirectories and frequently conflict on semver
+- A single package is consumed by both internal apps and external npm consumers
+- Build times for a single package exceed 60 seconds (indicates too many responsibilities)
+
+**Default rule**: one package per product domain (auth, payments, ui-components, api-client). Split only when you have evidence of divergent release cadence, not in anticipation of it.
+
+---
+
+## Dependency Graph Discipline
+
+Detecting circular dependencies early is cheaper than debugging them later:
+
+```bash
+# Nx — visualize full graph, highlights cycles
+nx graph
+
+# TypeScript — list all files compiled (reveals unexpected cross-package imports)
+tsc --listFiles --noEmit 2>&1 | grep other-package
+
+# madge — dedicated circular dep detector for JS/TS
+npx madge --circular --extensions ts ./packages/
+
+# pnpm — workspace dependency graph
+pnpm list --recursive --depth 1
+```
+
+**Extraction pattern for breaking a cycle**: If `package-a` imports from `package-b` and vice versa, the shared type or interface they both need belongs in `package-shared` that neither depends on. Do not move logic — only the shared contract (types, interfaces, constants).
+
+---
+
+## Navigation Callout
+
+For per-project dependency updates within the monorepo (upgrading lodash, bumping React version, fixing a CVE in a specific package), use **dependency-updater** — this skill covers workspace structure and build orchestration, not individual package version management.
diff --git a/skills/monorepo-workspace/references/remote-cache-setup.md b/skills/monorepo-workspace/references/remote-cache-setup.md
new file mode 100644
index 0000000..89b1281
--- /dev/null
+++ b/skills/monorepo-workspace/references/remote-cache-setup.md
@@ -0,0 +1,166 @@
+# Remote Cache Setup
+
+Reference for the monorepo-workspace skill — what gets cached, invalidation models, and setup for Nx Cloud, Turborepo Remote Cache, and self-hosted options.
+
+---
+
+## What Remote Caching Actually Caches
+
+Remote caches store **task output artifacts**, not source code. When a cache hit occurs:
+- The task (build, test, lint) is skipped entirely
+- Cached output files are restored to the expected `outputs` directory
+- Cached stdout/stderr is replayed so CI logs look identical
+
+**What is NOT cached**:
+- The node_modules installation step (use lockfile caching for that)
+- Any side effects a task performs outside its declared `outputs` (e.g., writing to a database, uploading to S3)
+- Tasks that explicitly set `"cache": false`
+
+**The invalidation model**: Cache keys are computed from a hash of:
+1. Input file contents (source files + config files affecting the task)
+2. Task configuration (the task definition itself)
+3. Declared environment variables
+4. Dependency task cache keys (recursive — if a dep rebuilds, the consumer also rebuilds)
+
+A cache miss on an upstream package propagates to all downstream consumers. This is correct behavior. A change to `@acme/utils` should invalidate the cache for every app that depends on it.
+
+---
+
+## Nx Cloud
+
+**Free tier**: 500 CI pipeline executions/month for open source; paid plans for private repos beyond the free tier.
+
+**What it adds over local cache**: Shared cache across all CI runners and developer machines. Developer A builds `@acme/utils`; developer B gets a cache hit when they run the same task with the same inputs, even on a different machine.
+
+**Setup**:
+```bash
+npx nx connect
+# Nx adds NX_CLOUD_ACCESS_TOKEN to your CI secrets and updates nx.json
+```
+
+```json
+// nx.json (auto-updated by nx connect)
+{
+  "nxCloudAccessToken": "...",
+  "tasksRunnerOptions": {
+    "default": {
+      "runner": "nx-cloud",
+      "options": {
+        "cacheableOperations": ["build", "test", "lint", "e2e"]
+      }
+    }
+  }
+}
+```
+
+**Non-obvious**: Nx Cloud also provides distributed task execution (DTE) — CI tasks are distributed across agents automatically. This is a separate feature from remote caching and requires additional CI config. Do not conflate them.
+
+**Cache miss debugging**:
+```bash
+nx run my-lib:build --verbose
+# Shows the cache key computation and which inputs caused a miss
+```
+
+---
+
+## Turborepo Remote Cache
+
+**Default provider**: Vercel (free for Vercel users, $0 storage cost).
+
+**Setup with Vercel**:
+```bash
+npx turbo login       # authenticate with Vercel account
+npx turbo link        # link repo to Vercel team for shared cache
+```
+
+**CI environment variables** (set in CI secrets, not committed):
+```
+TURBO_TOKEN=<vercel-token>
+TURBO_TEAM=<vercel-team-slug>
+```
+
+**Verifying cache hits in CI**:
+```bash
+turbo run build --dry=json | jq '.tasks[] | {task: .taskId, cache: .cache}'
+# "cache": {"status": "HIT"} or "MISS"
+```
+
+**Non-obvious — the `env` field is mandatory for correct invalidation**:
+```json
+// turbo.json
+{
+  "tasks": {
+    "build": {
+      "env": ["NODE_ENV", "API_URL"],
+      "outputs": ["dist/**"]
+    }
+  }
+}
+```
+If `build` reads `API_URL` but it's not in `env`, cache hits will restore output built against a different API URL. This is a silent correctness bug, not a performance bug — the output is wrong, not just stale.
+
+---
+
+## Self-Hosted Remote Cache
+
+### Turborepo: Ducktape (open source, self-hosted)
+
+`ducktape` is the most widely used self-hosted Turbo cache server:
+
+```bash
+docker run -p 3000:3000 \
+  -e TURBO_TOKEN=your-secret-token \
+  ducktape/ducktape:latest
+```
+
+CI config:
+```bash
+export TURBO_API=https://your-cache-server.internal
+export TURBO_TOKEN=your-secret-token
+export TURBO_TEAM=your-team
+turbo run build
+```
+
+### Nx: Custom Remote Cache
+
+Nx supports custom remote cache via a cache plugin. The simplest self-hosted option uses S3-compatible storage:
+
+```bash
+npm install @nx-remotecache/s3 -D
+```
+
+```json
+// nx.json
+{
+  "tasksRunnerOptions": {
+    "default": {
+      "runner": "@nx-remotecache/s3",
+      "options": {
+        "region": "us-east-1",
+        "bucket": "your-nx-cache-bucket",
+        "cacheableOperations": ["build", "test", "lint"]
+      }
+    }
+  }
+}
+```
+
+**IAM policy** (minimum required):
+```json
+{
+  "Effect": "Allow",
+  "Action": ["s3:GetObject", "s3:PutObject", "s3:HeadObject"],
+  "Resource": "arn:aws:s3:::your-nx-cache-bucket/*"
+}
+```
+
+---
+
+## Cache Debugging Checklist
+
+| Symptom | Check |
+|---------|-------|
+| Cache never hits despite identical code | Env vars not declared in `env`/`globalEnv`; dynamic file reads (e.g., timestamps) in source; non-deterministic build output |
+| Cache hits but output is wrong | Side effect outside declared `outputs`; env var affecting output not in cache key |
+| Cache hits locally but misses in CI | `inputs` includes files that differ between local/CI (e.g., `.env.local`); add to `.gitignore` AND `turbo.json` `inputs` exclusions |
+| Stale cache persisting after config change | Task config hash includes `turbo.json` content — a config change should bust the cache; if not, verify the cache server is using the correct hash algorithm version |
diff --git a/skills/monorepo-workspace/references/shared-config-patterns.md b/skills/monorepo-workspace/references/shared-config-patterns.md
new file mode 100644
index 0000000..c1f4c3a
--- /dev/null
+++ b/skills/monorepo-workspace/references/shared-config-patterns.md
@@ -0,0 +1,163 @@
+# Shared Config Patterns
+
+Reference for the monorepo-workspace skill — tsconfig inheritance, ESLint layout, and the config-package pattern.
+
+---
+
+## TypeScript: tsconfig Extends Chain
+
+### The Three-Layer Pattern
+
+```
+tsconfig.base.json          ← root: compiler options only, no files/include/exclude
+  └── packages/*/tsconfig.json    ← per-package: extends base, adds include/paths
+        └── packages/*/tsconfig.build.json  ← build variant: excludes tests, emits .d.ts
+```
+
+**Root `tsconfig.base.json`** (compiler options only — never include `include`, `exclude`, or `references` here):
+```json
+{
+  "compilerOptions": {
+    "strict": true,
+    "target": "ES2022",
+    "module": "NodeNext",
+    "moduleResolution": "NodeNext",
+    "declaration": true,
+    "declarationMap": true,
+    "sourceMap": true,
+    "esModuleInterop": true,
+    "skipLibCheck": true
+  }
+}
+```
+
+**Per-package `tsconfig.json`** (development + IDE, includes tests):
+```json
+{
+  "extends": "../../tsconfig.base.json",
+  "compilerOptions": {
+    "outDir": "dist",
+    "rootDir": "src",
+    "paths": {
+      "@acme/utils": ["../utils/src/index.ts"]
+    }
+  },
+  "include": ["src/**/*", "tests/**/*"]
+}
+```
+
+**Per-package `tsconfig.build.json`** (emit only, excludes tests):
+```json
+{
+  "extends": "./tsconfig.json",
+  "exclude": ["**/*.test.ts", "**/*.spec.ts", "tests/**/*"],
+  "compilerOptions": {
+    "paths": {}
+  }
+}
+```
+
+**Critical**: The `paths` override in `tsconfig.build.json` clears aliases because published consumers resolve imports through `node_modules`, not through TypeScript path mapping. If you leave `paths` active in the build config, the emitted `.d.ts` files contain unresolvable import paths for external consumers.
+
+### Project References (Alternative to paths)
+
+For large repos where IDE performance matters, TypeScript project references are more reliable than `paths`:
+
+```json
+// packages/app/tsconfig.json
+{
+  "extends": "../../tsconfig.base.json",
+  "references": [
+    { "path": "../utils" }
+  ]
+}
+```
+
+```bash
+# Build with incremental compilation — only rebuilds changed packages
+tsc --build packages/app/tsconfig.json
+```
+
+Project references require each referenced package to have a `tsconfig.json` with `"composite": true` and `"declaration": true`. The build order is computed by `tsc --build` automatically — you do not need a separate task runner for TypeScript-only dependency ordering.
+
+---
+
+## ESLint: Flat Config in Monorepos
+
+**Flat config (`eslint.config.mjs`) is the correct choice for new monorepos.** The legacy `extends`-based config has resolution ambiguity in hoisted `node_modules` that causes plugin version conflicts.
+
+### Root flat config with per-package overrides
+
+```js
+// eslint.config.mjs (root)
+import js from '@eslint/js';
+import ts from 'typescript-eslint';
+
+export default ts.config(
+  js.configs.recommended,
+  ...ts.configs.strict,
+  {
+    // Global ignores
+    ignores: ['**/dist/**', '**/node_modules/**', '**/*.gen.ts']
+  },
+  {
+    // Per-package overrides via files glob
+    files: ['packages/app/**/*.ts'],
+    rules: {
+      // app-specific rules
+    }
+  }
+);
+```
+
+Each package can have its own `eslint.config.mjs` that imports and extends the root:
+```js
+// packages/utils/eslint.config.mjs
+import rootConfig from '../../eslint.config.mjs';
+export default [
+  ...rootConfig,
+  {
+    rules: { 'no-console': 'error' }  // stricter for library packages
+  }
+];
+```
+
+**Non-obvious**: With flat config, ESLint stops traversing upward when it finds a config file. A package-level `eslint.config.mjs` completely replaces the root config unless you explicitly import and spread it. This is different from the legacy behavior where `extends` chains merged configs.
+
+---
+
+## The "Config Package" Pattern — When It's Right vs Over-Engineering
+
+### When it's right
+
+A dedicated `@acme/tsconfig` or `@acme/eslint-config` package makes sense when:
+- Multiple **separate repos** need to share the same config (publish to npm)
+- Config changes need their own semantic versioning and changelog
+- More than ~8 packages consume the config and drift is a real problem
+
+```
+packages/
+  tsconfig/           ← @acme/tsconfig
+    tsconfig.base.json
+    tsconfig.react.json
+    tsconfig.node.json
+    package.json
+```
+
+```json
+// packages/app/package.json
+{
+  "devDependencies": { "@acme/tsconfig": "workspace:*" }
+}
+// packages/app/tsconfig.json
+{ "extends": "@acme/tsconfig/tsconfig.react.json" }
+```
+
+### When it's over-engineering
+
+A config package is overhead when:
+- All consumers are in the same monorepo (just use `../../tsconfig.base.json`)
+- The team is fewer than 5 engineers — the coordination cost exceeds the consistency benefit
+- You're spending time versioning config changes — config should be boring
+
+**Default rule**: start with a root-level `tsconfig.base.json` and a root `eslint.config.mjs`. Only extract to a config package when you have a second repo that needs to consume it.
diff --git a/skills/monorepo-workspace/references/tool-comparison.md b/skills/monorepo-workspace/references/tool-comparison.md
new file mode 100644
index 0000000..63eecff
--- /dev/null
+++ b/skills/monorepo-workspace/references/tool-comparison.md
@@ -0,0 +1,161 @@
+# Tool Comparison: Nx vs Turborepo vs pnpm Workspaces vs Lerna
+
+Reference for the monorepo-workspace skill.
+
+---
+
+## Decision Matrix
+
+| Criterion | Nx | Turborepo | pnpm Workspaces | Lerna |
+|-----------|-----|-----------|-----------------|-------|
+| Polyglot support | Yes (custom executors) | No (JS/TS only) | No | No |
+| Code generators | Yes (schematics) | No | No | No |
+| Affected computation | Yes (project graph) | Yes (file hash) | No (manual) | No |
+| Remote cache | Nx Cloud (paid/free tier) | Vercel Remote Cache (free) | No | No |
+| Config overhead | High (project.json per pkg) | Low (turbo.json root only) | Minimal | Medium |
+| Publishing support | Plugin-based | No | `pnpm publish -r` | First-class |
+| Learning curve | High | Low | Low | Medium |
+| Active maintenance | Yes | Yes | Yes | Maintenance mode |
+
+---
+
+## Nx
+
+**Best for**: Polyglot repos, teams that want scaffolding/generators, large orgs with many packages that need enforced module boundaries.
+
+**Non-obvious criteria**:
+- Nx's "module boundary" lint rules (`@nx/enforce-module-boundaries`) are the only tool in this space that can enforce which packages may import from which — critical for platform teams that need to prevent app-level code from importing internal platform code.
+- Nx generators create consistent new packages from templates, eliminating the "copy-paste tsconfig and forget to rename" failure.
+- The Nx project graph reads both `package.json` dependencies AND Nx-specific `implicitDependencies` — if you forget to declare an implicit dep (e.g., a shared build script), affected computation will miss it.
+
+**Minimum config**:
+```json
+// nx.json
+{
+  "targetDefaults": {
+    "build": { "dependsOn": ["^build"], "cache": true },
+    "test":  { "cache": true },
+    "lint":  { "cache": true }
+  }
+}
+```
+
+```json
+// packages/my-lib/project.json
+{
+  "name": "my-lib",
+  "targets": {
+    "build": { "executor": "@nx/js:tsc", "options": { "main": "src/index.ts" } }
+  }
+}
+```
+
+---
+
+## Turborepo
+
+**Best for**: Pure JS/TS repos, teams that want minimal configuration overhead, projects already using Vercel infrastructure.
+
+**Non-obvious criteria**:
+- Turbo's cache key is computed from input file hashes + task config hash. If a task reads from environment variables, those vars MUST be declared in `env` or `globalEnv` — otherwise the cache key ignores them and cache hits return stale output when the env changes.
+- `turbo run build --filter=@acme/app` runs build for `@acme/app` AND all its dependencies (because `dependsOn: ["^build"]` is the default). This is correct behavior — not a bug — but surprises engineers who expect filter to mean "only this package."
+- The `outputs` field must exactly match what the task writes. Over-specifying outputs (e.g., `["dist/**", ".next/**"]` when only `dist/` is written) causes cache misses because Turbo verifies output existence on restore.
+
+**Minimum config**:
+```json
+// turbo.json
+{
+  "$schema": "https://turbo.build/schema.json",
+  "tasks": {
+    "build": {
+      "dependsOn": ["^build"],
+      "outputs": ["dist/**"],
+      "cache": true
+    },
+    "test": {
+      "dependsOn": ["^build"],
+      "cache": true
+    },
+    "lint": {
+      "cache": true
+    }
+  }
+}
+```
+
+---
+
+## pnpm Workspaces (No Orchestrator)
+
+**Best for**: Small monorepos (2–5 packages), teams that want workspace linking without task runner complexity, library-only repos where publish order matters more than incremental builds.
+
+**Non-obvious criteria**:
+- Without a task runner, "affected" computation doesn't exist — you run all tasks always. This is acceptable for small repos but does not scale.
+- `pnpm -r run build` runs build in all packages respecting topological order (dependencies build before consumers) — this is built in, no config needed.
+- `pnpm deploy` copies a specific workspace package + its resolved dependencies to a target directory, suitable for deploying a single service from a monorepo without shipping the whole workspace.
+
+**Workspace config**:
+```yaml
+# pnpm-workspace.yaml
+packages:
+  - 'packages/*'
+  - 'apps/*'
+  - '!**/__tests__/**'
+```
+
+**Internal reference**:
+```json
+// packages/app/package.json
+{
+  "dependencies": {
+    "@acme/utils": "workspace:*"
+  }
+}
+```
+
+---
+
+## Lerna
+
+**Current status**: Maintained by Nx team. Feature development is minimal. The publishing and versioning features remain solid; the task orchestration is deprecated in favor of Nx or Turbo as the task runner.
+
+**When to keep Lerna**: Existing repos that use `lerna publish` and `lerna version` with a well-established changelog workflow. Do not migrate unless the publishing workflow is also broken.
+
+**When to migrate off Lerna**: Greenfield repos, repos where you're adopting Lerna for task running (use Turbo or Nx instead), repos where the changelog noise from fixed versioning is causing real pain.
+
+**Hybrid pattern** (common in legacy repos):
+```json
+// lerna.json — publishing only, no task running
+{
+  "version": "independent",
+  "npmClient": "pnpm",
+  "useWorkspaces": true,
+  "command": {
+    "publish": { "conventionalCommits": true }
+  }
+}
+```
+Then add Turborepo for task running alongside. Lerna touches `package.json` versions + git tags; Turbo runs build/test/lint. They don't conflict.
+
+---
+
+## Migration Notes
+
+### pnpm workspaces → Turborepo
+1. Add `turbo.json` to root (see minimum config above)
+2. Add `turbo` to root `devDependencies`
+3. Replace `pnpm -r run build` with `turbo run build` in CI
+4. Verify `outputs` globs match your actual build artifacts
+
+### Nx → Turborepo (JS/TS only repos wanting simpler config)
+1. Export Nx project graph to understand current dependency structure
+2. Replace `project.json` targets with `turbo.json` pipeline entries
+3. Remove `nx.json`, `project.json` files
+4. Note: you lose module boundary enforcement and generators
+
+### Lerna → Independent versioning with Changesets
+1. Install `@changesets/cli`
+2. Run `changeset init`
+3. Replace `lerna version` with `changeset version`
+4. Replace `lerna publish` with `changeset publish`
+5. Remove `lerna.json` last — verify CI is green first
diff --git a/skills/presentation-to-audience/SKILL.md b/skills/presentation-to-audience/SKILL.md
new file mode 100644
index 0000000..9f11b44
--- /dev/null
+++ b/skills/presentation-to-audience/SKILL.md
@@ -0,0 +1,122 @@
+---
+name: presentation-to-audience
+description: Use when designing a presentation's narrative structure, calibrating content to audience type, planning opening and closing strategy, or preparing for hostile Q&A. Trigger phrases: "structure my presentation", "who is my audience", "what should I open with", "how do I handle questions", "is this the right narrative", "executive deck", "technical pitch", "board presentation". Covers narrative arc, audience calibration, and delivery design — NOT slide mechanics (use marp-slide for Marp syntax, layout, export).
+---
+
+# Presentation to Audience
+
+## Mindset
+
+1. **The deck is not the presentation — the audience's decision is.** Every structural choice flows backward from one question: what does this audience need to decide or believe when they leave? Content that doesn't serve that decision is decoration, regardless of how accurate or interesting it is.
+2. **Narrative arc is a trust instrument, not an organizational preference.** Choosing the wrong arc for an audience type doesn't just reduce clarity — it actively destroys credibility. Technical audiences who see vision-first assume you haven't done the work. Exec audiences who see data-first assume you can't synthesize.
+3. **The opening is your highest-leverage 30 seconds.** Audiences form a listening posture in the first 30 seconds and rarely revise it. An agenda slide in that window tells them the session is procedural; a sharp problem statement tells them this will matter to them. You don't recover from a procedural opening.
+4. **Q&A is not separate from the presentation — it is a continuation of the narrative.** Hostile questions are attempts to redirect the narrative to a frame you didn't choose. The bridge technique exists precisely because answering on the questioner's terms cedes that frame permanently.
+5. **Cognitive load is not a preference, it is physics.** Audiences literally cannot read text and listen to speech simultaneously; the two tasks compete for the same language-processing hardware. A slide that requires reading is a slide that is not being heard.
+
+## Navigation
+
+**Use this skill when**:
+- Designing the structure and argument flow of a presentation from scratch
+- Adapting existing content to a specific audience type (exec, technical, hostile, mixed)
+- Choosing which narrative arc fits the situation
+- Planning opening statements, closing calls to action, or Q&A strategy
+- The user says a deck "isn't landing" or "feels flat" without a clear Marp syntax issue
+
+**Do NOT use this skill when**:
+- The request is purely about slide layout, Marp syntax, theme selection, or export — use `marp-slide` instead
+- The user wants to write prose (a report, doc, or email) — use `writing-clearly-and-concisely`
+- The request is about speaker delivery mechanics (pacing, voice, body language) — out of scope for both skills
+
+**Audience type → arc selection decision tree**:
+
+```
+What is the audience's primary posture?
+
+  TECHNICAL (skeptical, evidence-driven, peer reviewers)
+  └─ Use: Problem → Solution → Proof
+     Reason: credibility is earned through methodology, not vision;
+             leading with vision reads as sales, not substance
+
+  EXECUTIVE (time-constrained, synthesis-driven, decision authority)
+  └─ Use: Vision → Gap → Path
+     Reason: execs decide first, validate later; leading with data
+             forces them to synthesize — your job, not theirs
+
+  HOSTILE / SKEPTICAL (pre-opposed, adversarial, or politically misaligned)
+  └─ Use: Situation → Complication → Resolution (McKinsey SCR)
+     Reason: agreeing on the situation first creates common ground
+             before you introduce the complication they may dispute;
+             jumping to your solution without shared context triggers
+             immediate counter-argument reflex
+
+  MIXED (unknown or multi-stakeholder)
+  └─ Default to Situation → Complication → Resolution
+     Reason: SCR is the only arc that doesn't require you to assume
+             the audience's existing belief set; it builds the shared
+             frame before asserting anything
+```
+
+## Philosophy
+
+A presentation is a structured argument delivered in real time to people with limited attention and competing priors. Its job is not to show everything you know — it is to move the audience from their current belief or decision state to the one you need them to reach. Every slide either serves that movement or slows it down.
+
+## NEVER
+
+- **NEVER open with an agenda slide** — because it signals a procedural frame before the audience has any reason to care about the content; audiences who see an agenda first assume the session is informational, not decisional, and they disengage before the argument begins. Earn their attention with the sharpest version of the problem or claim first; the structure will be obvious from the content.
+- **NEVER use full sentences as bullet points** — because complete sentences compete directly with the speaker's spoken words for the audience's language-processing bandwidth; the audience reads the sentence, hears the speaker say something different, and loses both; use phrase-fragments that are grammatically incomplete without the speaker's voice to complete them.
+- **NEVER show a table with more than 5 columns to a live audience** — because column-dense tables require horizontal scanning that is impossible from a projected screen at normal viewing distance; the audience will spend the entire slide attempting to read data they cannot reach, missing everything the speaker says; extract the single number or comparison that actually matters and present it as a chart or a highlighted call-out.
+- **NEVER answer the question that was asked if it cedes the narrative frame** — because precise literal answering of hostile questions hands the questioner control of the agenda; use the bridge: acknowledge the question, then pivot to the point your presentation was making; audiences rarely notice and almost never penalize the pivot.
+- **NEVER use transitions and animations that don't carry information** — because motion that doesn't signal "this is new" or "this is causally connected" consumes attentional resources the audience needs for comprehension; decorative wipes and fly-ins subtract from the presentation's credibility, especially in technical and executive contexts where restraint signals confidence.
+- **NEVER end with "Any questions?" as your last slide** — because the last slide is the last thing the audience remembers; "Any questions?" ends on a procedural placeholder instead of your strongest claim or call to action; put your CTA or top-line conclusion as the final slide and let questions come while it is visible.
+- **NEVER use a pie chart with more than 3 segments or any segment under 10%** — because pie charts require angular estimation, which the human visual system performs poorly; audiences cannot distinguish a 12% slice from a 15% slice, and charts with 4+ slices force them to read the legend repeatedly; use a ranked bar chart, which is readable in a single left-to-right pass.
+- **NEVER bluff an answer you don't have** — because technical and expert audiences will detect the imprecision immediately, and the credibility loss from one bluffed answer outweighs the gain from appearing knowledgeable; "I don't have that number — I'll send it to you by [specific day]" is the highest-credibility response to a gap.
+
+## When Things Go Wrong
+
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| Audience disengages in the first 3 slides | Opening established context before establishing stakes — they don't yet know why this matters to them | Restructure: lead with the consequence, not the background; the background becomes slide 3, not slide 1 |
+| Exec audience keeps asking "but what's the ask?" mid-deck | SCR or Problem-Solution arc used with an exec audience — they want the conclusion first | Reorder: put the recommendation and its decision on slide 2; supporting evidence follows as backup material |
+| Technical audience pushes back on methodology before you've shown it | Vision-first arc used with skeptical peers — they inferred you skipped rigor | Switch to Problem → Proof → Solution; lead with data and methodology before the claim |
+| Hostile questioner derails the session | Answering on the questioner's terms without bridging; once you accept their frame, the audience adopts it | Bridge immediately on the next question: "That's worth noting — what I'd add is [your point]"; do not attempt to re-answer the previous question |
+| "So what?" reaction to data-heavy slides | Data without explicit decision implication — numbers were presented but the consequence was left to the audience to infer | Add a one-line "What this means for [audience]:" below every data slide; make the implication explicit, not implied |
+| Presentation feels long even at the right duration | Slides with multiple independent ideas — the audience is experiencing compound cognitive load across ideas, not just slide count | Apply the "so what" test: any slide whose removal wouldn't change the audience's conclusion is a candidate for deletion |
+| Q&A runs over and loses the room | No planned stopping mechanism and no return-to-CTA protocol | Close Q&A 2 minutes early with "I want to make sure we leave time for one more thing" and return to your CTA slide; never let Q&A be the last thing the audience experiences |
+
+## Narrative Arc Reference
+
+Load `references/narrative-arcs.md` when:
+- The user needs a full structural template for any arc
+- The presentation spans more than 20 slides and arc fidelity is in question
+- The user has mixed stakeholder types and needs a hybrid structure
+
+## Audience Calibration Reference
+
+Load `references/audience-calibration.md` when:
+- The user hasn't identified their audience type and needs diagnostic questions
+- The audience includes multiple stakeholder types with conflicting information needs
+- The user is presenting cross-functionally (e.g., technical content to a business audience)
+
+## Rehearsal Protocol
+
+Three passes, in order — run each separately, never combined:
+
+**Pass 1 — Read-aloud pass**: Read every bullet fragment aloud as if speaking it. Any phrase that sounds unnatural in speech is a written phrase masquerading as a spoken one. Rewrite it until it sounds like something a person would say.
+
+**Pass 2 — Timing pass**: Time each slide at the intended pace. Any slide under 45 seconds is either too sparse (add a second point) or too obvious (delete it). Any slide over 3 minutes contains more than one idea.
+
+**Pass 3 — Cold-audience pass**: Read each slide assuming the audience has never heard of the topic. Any slide that requires knowledge from outside the deck to understand contains assumed context. Make the context explicit or move it to an earlier slide.
+
+## The "So What" Test
+
+Apply to every slide before finalizing:
+
+> "If I remove this slide, does the audience reach a different conclusion or make a different decision?"
+
+If no: delete the slide.  
+If yes: the slide stays, but its title should state the conclusion explicitly, not the topic.
+
+Weak title: "Q3 Customer Churn Data"  
+Strong title: "Churn accelerated in Q3 — and the cause is recoverable"
+
+The strong title does the cognitive work so the slide can carry the evidence.
diff --git a/skills/presentation-to-audience/references/audience-calibration.md b/skills/presentation-to-audience/references/audience-calibration.md
new file mode 100644
index 0000000..f221d43
--- /dev/null
+++ b/skills/presentation-to-audience/references/audience-calibration.md
@@ -0,0 +1,101 @@
+# Audience Calibration
+
+## The Single-Question Test
+
+Before writing a single slide, answer this:
+
+> **"What decision does this audience need to make, and when?"**
+
+Every piece of content in the presentation is either moving toward that answer or it is not. Content that does not serve the decision is not "nice to have" — it is a tax on the audience's attention that they will charge against your credibility.
+
+If you cannot answer the single-question test, you are not ready to build the deck. You need more information about the audience or the context, not more slides.
+
+---
+
+## Audience Type Diagnostics
+
+Use these questions when the user hasn't clearly identified their audience type:
+
+**1. What authority does this audience have?**
+- Approval authority over budget, headcount, or direction → Executive arc
+- Technical validation authority (they can say "this won't work") → Technical arc
+- Political or stakeholder authority (they can block but not build) → SCR arc
+- No formal authority, informational only → Use whatever arc fits their background
+
+**2. What is their information diet?**
+- They read dashboards and summaries → lead with conclusion (Executive arc)
+- They read technical specs and primary data → lead with problem and methodology (Technical arc)
+- They read reports with competing perspectives → build shared ground first (SCR arc)
+
+**3. What is their prior disposition toward your conclusion?**
+- Aligned or neutral → any arc works; use Executive or Technical based on (1) and (2)
+- Mildly skeptical → Technical arc with strong proof section
+- Actively hostile or politically opposed → SCR is mandatory; all other arcs give them a foothold to reject the premise
+
+**4. How much time do they have?**
+- Under 20 minutes → Executive arc; maximum 7 slides; appendix for everything else
+- 30–60 minutes with Q&A → Technical or SCR arc; 12–18 slides; Q&A budget = 1/3 of total time
+- Workshop or working session → Not a presentation, different tool; break into segments
+
+---
+
+## Multi-Stakeholder Presentations
+
+When one presentation must serve multiple audience types simultaneously (e.g., a CPO and a principal engineer in the same room):
+
+**The layered deck technique**:
+- Slides 1–5: Vision/conclusion layer (serves the exec; tolerates technical skepticism)
+- Slides 6–12: Evidence/methodology layer (serves the technical audience; provides the proof that makes the exec comfortable)
+- Appendix: Deep technical detail (available if asked; never shown unless requested)
+
+The exec exits after slide 5 satisfied. The technical audience stays for slides 6–12. This only works if the layers are explicitly separated — mixing them produces a deck that satisfies neither.
+
+**Seating-aware delivery**: In a room with mixed stakeholders, make eye contact toward executives when delivering conclusion slides; shift to technical peers when delivering evidence slides. Both audiences feel the deck was built for them.
+
+---
+
+## The Content Triage Matrix
+
+Apply to every candidate slide before including it:
+
+| Does this slide... | If YES | If NO |
+|-------------------|--------|-------|
+| Directly answer the audience's core decision question? | Include | Evaluate further |
+| Create necessary context for a later slide that does answer the question? | Include (but minimize) | Delete |
+| Provide evidence the audience will ask for in Q&A? | Move to appendix | Delete |
+| Show your work in a way the audience isn't qualified to evaluate? | Delete | N/A |
+| Exist because you feel proud of the analysis? | Delete | N/A |
+
+**The hardest deletion**: Slides that represent genuine analytical effort but don't serve the audience's decision. These are the slides the presenter most wants to keep and the audience least wants to see. Delete them. Move them to the appendix if you can't let go.
+
+---
+
+## Calibrating Data Density
+
+Different audiences have different signal/noise thresholds for quantitative content:
+
+**Executive audience**: 1–2 numbers per slide maximum. Each number needs a unit, a comparison point (vs. last period, vs. target, vs. competitor), and a one-line "what this means" statement. Raw data tables are never appropriate.
+
+**Technical audience**: Can handle 4–6 data points per slide if they are organized and labeled. Still applies the cognitive load rule — the organization must be instantly parseable (grouped, ranked, or sequenced), not left to the viewer to sort.
+
+**Mixed audience**: Calibrate to the technical floor, not the executive floor. Executives can skim data they don't need; non-technical stakeholders cannot extrapolate from data they lack the context to interpret.
+
+---
+
+## Visualization Selection Guide
+
+Choosing the wrong chart type is not an aesthetic error — it is a communication error that forces the audience to do work you should have done.
+
+| You want to show | Use | Never use |
+|-----------------|-----|-----------|
+| Comparison across categories | Bar chart (horizontal for long labels) | Pie chart |
+| Change over time | Line chart | Bar chart (implies discrete, not continuous) |
+| Relationship between two variables | Scatter plot | Line chart (implies time) |
+| Part-to-whole (max 3 parts, none under 10%) | Pie chart | Stacked bar with many segments |
+| Distribution | Histogram or box plot | Average alone (hides variance) |
+| Single number that matters | Large typographic call-out | Any chart |
+| Geographic variation | Choropleth map | Bar chart (loses spatial relationships) |
+
+**The common pie chart mistake**: Pie charts are only valid when the parts sum to a meaningful whole, there are at most 3 segments, and no segment is under ~10%. A 5-segment pie chart with a 4% sliver requires a legend, cannot be read from a screen, and should always be replaced with a ranked bar chart.
+
+**The common bar chart mistake**: Using a bar chart for time-series data implies the values are discrete (Q1, Q2, Q3) rather than continuous. If the underlying reality is continuous change over time, a line chart is accurate; a bar chart is technically a misrepresentation.
diff --git a/skills/presentation-to-audience/references/narrative-arcs.md b/skills/presentation-to-audience/references/narrative-arcs.md
new file mode 100644
index 0000000..eaeccce
--- /dev/null
+++ b/skills/presentation-to-audience/references/narrative-arcs.md
@@ -0,0 +1,82 @@
+# Narrative Arc Templates
+
+Three arcs cover the vast majority of professional presentations. Selection criteria are in SKILL.md. Use these as structural scaffolding — every section should be shrunk to the minimum number of slides that carries the argument.
+
+---
+
+## Arc 1: Problem → Solution → Proof
+
+**Best for**: Technical audiences, peer reviews, skeptical stakeholders, grant applications, engineering postmortems.
+
+**Logic**: Credibility in technical contexts is earned by demonstrating you understand the problem before you claim to have solved it. Leading with the solution before the problem reads as sales, not substance.
+
+**Structure**:
+
+| Position | Slide purpose | Title pattern |
+|----------|--------------|--------------|
+| 1 | Hook — the sharpest version of the pain | "X is broken in a way most teams haven't noticed yet" |
+| 2–3 | Problem depth — evidence, scope, cost of inaction | "[Metric] is [direction] and the proximate cause is [specific thing]" |
+| 4 | Decision pivot — what changes if we act vs. don't | "The window for low-cost intervention closes in [timeframe]" |
+| 5–6 | Solution — mechanism, not just name | "We eliminate [root cause] by [specific change]" |
+| 7–9 | Proof — data, benchmarks, prior work, failure modes considered | "Under [condition], the solution holds because [mechanism]" |
+| Last | CTA — specific, time-bound, owner-named | "Decision needed: [action] by [date] from [role]" |
+
+**Common failure mode**: Spending too many slides on problem and running out of time for proof. Cap problem at 2 slides maximum.
+
+---
+
+## Arc 2: Situation → Complication → Resolution (SCR)
+
+**Best for**: Hostile or politically complex audiences, cross-functional presentations, situations where you don't control the audience's prior beliefs.
+
+**Logic**: The SCR arc (McKinsey Pyramid Principle variant) creates common ground before introducing the complication. Both parties agree on the Situation; the Complication is why the current state is insufficient; the Resolution is the only logical response to that combination. Audiences that disagree with your conclusion but accept the Situation and Complication have to argue against their own logic to reject the Resolution.
+
+**Structure**:
+
+| Position | Slide purpose | Title pattern |
+|----------|--------------|--------------|
+| 1 | Situation — state of the world both parties accept | "Here is where we are today" (use data both sides already believe) |
+| 2–3 | Complication — why the current state is insufficient | "This matters because [consequence] is arriving by [date]" |
+| 4 | The decision forced by the complication | "Continuing as-is means [cost]; acting means [trade-off]" |
+| 5–7 | Resolution — your recommendation with constraints acknowledged | "The path that addresses [complication] without [objection] is [X]" |
+| 8+ | Evidence — supporting the resolution's feasibility | [Data, precedent, risk mitigation] |
+| Last | CTA with explicit owner and timeline | |
+
+**Non-obvious constraint**: The Situation slide must contain only facts the hostile audience already accepts as true. Introducing a contested fact into the Situation slide gives skeptics a foothold to derail the entire arc before you reach your argument.
+
+---
+
+## Arc 3: Vision → Gap → Path
+
+**Best for**: Executive audiences, board presentations, fundraising, organizational strategy, product launches to business leadership.
+
+**Logic**: Executives make decisions first and validate later. They don't have time to process data and synthesize conclusions — that's the presenter's job. Leading with vision gives them the conclusion to react to; the Gap shows why the status quo doesn't reach the vision; the Path is the investment ask. Data appears after the decision frame, as validation, not as input.
+
+**Structure**:
+
+| Position | Slide purpose | Title pattern |
+|----------|--------------|--------------|
+| 1 | Vision — the future state worth investing in | "[Outcome] is achievable in [timeframe] at [cost]" |
+| 2 | Gap — the specific distance between now and the vision | "We are at [current state]; the gap is [specific delta]" |
+| 3 | Cost of gap — why the gap is not acceptable | "[Gap] costs [specific number] per [period] and compounds" |
+| 4 | Path — the sequence of decisions that close the gap | "Three decisions close the gap: [1], [2], [3]" |
+| 5–7 | Supporting evidence — for any executive who wants validation | [Data for those who ask; executives who don't ask are already convinced] |
+| Last | The single decision this audience needs to make | "Decision: [specific ask] by [date]" |
+
+**Non-obvious constraint**: Never show more than 7 slides to an executive audience in a first-meeting context. If you have more evidence, put it in an appendix and offer it during Q&A. Showing more slides than the exec asked for signals that you don't know what's important.
+
+---
+
+## Hybrid Structures
+
+**Technical content to a business audience (most common cross-functional case)**:
+
+Use Vision → Gap → Path for the main deck (5–7 slides).  
+Prepare a Problem → Solution → Proof appendix (5–10 slides) for the technical validation questions that always arrive in Q&A.  
+Never mix the two arcs in the same slide sequence — it creates tonal whiplash.
+
+**All-hands or town hall (large mixed audience)**:
+
+Use Situation → Complication → Resolution.  
+The Situation slide does double duty: it creates shared context for newcomers and signals "we're not pretending things are fine" to skeptics.  
+Keep Resolution concrete and specific — vague aspirations in an all-hands are credibility-negative.
diff --git a/skills/testing-strategy/SKILL.md b/skills/testing-strategy/SKILL.md
new file mode 100644
index 0000000..c89dc62
--- /dev/null
+++ b/skills/testing-strategy/SKILL.md
@@ -0,0 +1,161 @@
+---
+name: testing-strategy
+description: Design automated test suites — test double taxonomy, pyramid vs. trophy tradeoffs, TDD applicability, Playwright vs. Cypress selection, flakiness diagnosis, coverage policy, and boundary-value patterns. Use when asked to "design a test strategy", "choose a test framework", "fix flaky tests", "set coverage thresholds", "write unit vs. integration tests", or "should I TDD this". Sits alongside qa-test-planner (manual testing) and code-review (correctness review).
+trigger: explicit
+---
+
+## Mindset
+
+- **The test that never fails has negative value.** It consumes maintenance time, produces false confidence, and is invisible until it misleads someone at a critical moment. Before writing any test, ask: "What specific incorrect behavior would this catch?" If you can't answer, don't write it.
+- **Integration is where bugs live, not units.** A system where every unit test passes and users still see errors is telling you something: the units are wrong, or the contracts between them are. Heavy unit pyramids are a bet that your units are the failure mode — that bet loses constantly.
+- **Mocks are a debt instrument.** Every mock you write is a promise to keep the mock in sync with the real implementation. That promise defaults silently: the test keeps passing, the production call has changed, users see the failure. Prefer fakes; they break loudly when the contract changes.
+- **Flakiness is determinism debt.** A flaky test has a hidden dependency on time, ordering, or external state. It will not spontaneously fix itself. Treat the first flaky test in a suite as a severity-1 issue: it destroys trust in the entire suite, because developers start ignoring red builds.
+- **Coverage is a floor detector, not a quality signal.** A line covered without an assertion proves only that the code ran. Mutation testing reveals what coverage hides: tests that watch code execute but assert nothing meaningful.
+
+## Navigation
+
+**Use this skill when**:
+- Designing the automated test architecture for a new project or service
+- Choosing between unit-heavy pyramid vs. integration-heavy trophy structure
+- Selecting Playwright vs. Cypress for an E2E layer
+- Diagnosing and fixing flaky tests in CI
+- Setting or challenging coverage threshold policies
+- Deciding whether to TDD a component or feature
+- Untangling stub/mock/fake/spy confusion in an existing suite
+
+**Do NOT use this skill when**:
+- Writing manual test cases, test plans, or regression suites — use **qa-test-planner**
+- Reviewing production code for logic bugs — use **code-review**
+- Designing acceptance criteria during sprint planning — that's requirements-clarity territory
+
+**Ambiguous input decision tree**:
+```
+"Write tests for X"
+  Is X a pure function with defined I/O? → YES → unit tests, consider TDD
+                                         → NO  ↓
+  Is X a UI component or page flow?      → YES → integration or E2E; skip unit layer
+  Is X a cross-service contract?         → YES → contract test (Pact) or integration test
+  Does X involve a database?             → YES → integration with real DB in container; no mocks
+
+"Our tests are flaky"
+  Fails intermittently in CI, passes locally? → timing/environment drift (see Flakiness section)
+  Fails when run in bulk, passes alone?       → shared mutable state (see State Pollution)
+  Fails after unrelated code change?          → over-specified mock (replace mock with fake)
+
+"What coverage should we target?"
+  Greenfield project:                    → 80% line as floor, enforce branch coverage
+  Legacy codebase:                       → ratchet (never allow regression), not arbitrary target
+  Critical financial/safety system:      → mutation testing required; line % is insufficient
+```
+
+## Philosophy
+
+A test suite is a safety net, not a trophy case. Its job is to catch regressions fast enough that the team trusts it and runs it constantly. Speed, reliability, and signal-to-noise ratio matter more than total test count or coverage percentage. A slow, flaky suite that developers skip in CI is worse than no suite: it's actively misleading.
+
+## NEVER
+
+- **NEVER use `sleep()` in tests** — it sets a minimum wait that will still race on slow CI, while adding that minimum latency to every run forever. Use `waitFor`, polling assertions (`expect.poll`), or event-driven synchronization (`waitForSelector`, `waitForResponse`).
+- **NEVER assert on implementation details (function was called with X) instead of observable behavior (output is Y)** — implementation assertions break on every valid refactor, training developers to treat red tests as expected noise. Mock-verification tests confirm process, not outcomes.
+- **NEVER share mutable state between tests without explicit reset** — the test that fails on CI but passes locally has exactly one cause 90% of the time: run ordering. Global state initialized once becomes an implicit ordering dependency that the test runner will eventually expose.
+- **NEVER use a mock when a working fake is available** — mocks couple test assertions to call signatures; fakes exercise real behavior through the real interface. A fake breaks loudly when behavior changes; a mock passes silently while the production code diverges.
+- **NEVER write a test that cannot fail** — a `describe` block wrapping an empty `it` block, a catch that swallows assertion errors, or a `try/catch` around `expect(...)` all produce permanently-green tests. They consume maintenance time and generate false confidence.
+- **NEVER use real network calls in unit or integration tests** — latency makes the suite slow; network unavailability makes it flaky; third-party rate limits make it unreliable in burst CI. Seal all HTTP with MSW (browser/Node), Nock (Node), or WireMock (JVM/polyglot).
+- **NEVER treat 80% line coverage as a goal** — it's a floor. A suite at 80% line coverage with no branch coverage and no assertions on edge cases is theater. Mutation testing is the only tool that distinguishes covered-and-asserting from covered-and-watching.
+
+## When Things Go Wrong
+
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| Test passes locally, fails in CI | Environment drift or timing dependency | Add `CI=true` locally to reproduce; eliminate `sleep`; audit shared state reset |
+| Test passes alone, fails in suite | Mutable global state leaked between tests | Add `beforeEach` reset; grep for module-level mutables; run in random order to surface more |
+| Refactor breaks 30 tests that should still pass | Tests assert on implementation (mocks verifying call args) | Replace mock-verification tests with behavior assertions; rewrite to test outputs, not calls |
+| Coverage drops after adding features | New branches not covered; not a problem if thresholds are branch-based | Add boundary tests for new conditionals; do not chase line coverage alone |
+| E2E suite takes 45+ minutes | Too many E2E tests covering what integration tests should handle | Audit which E2E tests duplicate integration coverage; move them down the pyramid |
+| Flaky test fixed with a longer sleep | The race condition was treated, not cured | Find the event or state change that indicates readiness and wait on that explicitly |
+
+## Test Double Taxonomy
+
+Practitioners use these terms interchangeably and incorrectly. Precision matters because the choice changes what the test proves:
+
+| Double | Definition | Tests | Breaks When |
+|--------|-----------|-------|-------------|
+| **Stub** | Returns canned data; no call verification | Code handles specific return values | Canned data drifts from real API response shape |
+| **Mock** | Verifies specific calls were made | A specific interaction occurred | Refactor changes the call signature (even correctly) |
+| **Fake** | Working implementation with shortcuts (in-memory DB) | Real behavior through real interface | Fake diverges from real implementation semantics |
+| **Spy** | Records calls without blocking them | Side effects occurred while real logic ran | The real call has unacceptable side effects in test |
+
+**Decision rule**: Use fakes by default. Use stubs when you control the return value shape and it won't drift. Use spies when you need to verify side effects without blocking production logic. Use mocks only when you have no other option and accept that the test is coupled to implementation.
+
+## Pyramid vs. Trophy
+
+The test pyramid (many unit, fewer integration, few E2E) is the right default when:
+- The system is composed of pure functions and well-scoped units
+- Integration points are thin and well-defined (a single API client, a single DB adapter)
+- Unit tests run in milliseconds and give fast feedback loops
+
+The test trophy (fewer unit, more integration, some E2E) is the right choice when:
+- The system is CRUD-heavy and the "business logic" is mostly data transformation through layers
+- Integration failures are the most common bug category (mismatched schemas, wrong query, missing join)
+- The team has historically seen unit-passing/integration-failing bugs
+
+**Diagnostic**: Look at your last 10 production bugs. Were they unit-level logic errors or integration/contract failures? The answer tells you which shape your suite should be.
+
+See `references/pyramid-vs-trophy.md` for worked examples.
+
+## TDD Decision Framework
+
+TDD accelerates when:
+- The interface is known before you start (function signature is defined by a contract)
+- The behavior is pure and well-scoped (no UI, no external I/O in the unit)
+- You need to drive the design (writing the test first reveals bad API surface before implementation locks it in)
+
+TDD fights you when:
+- The interface is unknown and exploratory (you're figuring out what the function should look like)
+- The unit under test is UI-heavy (the test framework overhead exceeds the feedback value)
+- The code is data-pipeline heavy and test data setup dominates test writing time
+- You're working in a system where the seams don't exist yet and you'd spend 80% of the time creating test infrastructure
+
+**Practical rule**: TDD for business logic and domain functions. Spike-then-test for UI, data ingestion, and exploratory integrations. Retrofit tests on spikes before merging.
+
+## Playwright vs. Cypress
+
+| Criterion | Playwright | Cypress |
+|-----------|-----------|---------|
+| Multi-browser (Chromium, Firefox, WebKit) | Native | Chrome/Electron primarily |
+| Multi-tab / multi-window testing | Native | Not supported |
+| API mocking (intercept, mock route) | `route.fulfill()` — first-class | `cy.intercept()` — good but browser-scoped |
+| CI performance | Parallel sharding out of the box | Requires Cypress Cloud for parallelism |
+| Interactive debugging DX | Trace Viewer (after the fact) | Time-travel with real-time UI |
+| Network request inspection | Built-in HAR capture | DevTools integration |
+| Component testing | Available (experimental) | Mature component testing mode |
+
+**Decision rule**: Choose Playwright when multi-browser, multi-tab, or strong API mocking is required, or when CI parallelism is important at scale. Choose Cypress when the team needs interactive debugging in Chrome-only projects and component testing maturity matters.
+
+See `references/e2e-framework-patterns.md` for CI configuration patterns for both.
+
+## Boundary-Value Pattern
+
+Bugs live at boundaries, not in the middle of ranges. For every conditional in business logic:
+
+```
+If condition is: value >= minimum
+Test:            minimum - 1 (should fail)
+                 minimum     (should pass — the exact boundary)
+                 minimum + 1 (should pass)
+
+If condition is: array.length > 0
+Test:            empty array (0 items)
+                 single item (1 item)
+                 two items (behavior stable)
+```
+
+Common missed boundaries: empty string vs. whitespace-only string, null vs. undefined vs. missing key, zero vs. negative, max integer overflow, end of month/year in date logic, timezone boundaries at midnight.
+
+See `references/boundary-testing-patterns.md` for a full catalog by data type.
+
+## Coverage Policy
+
+- **Line coverage 80%**: minimum floor; below this, you have blind spots large enough to hide feature-scale bugs
+- **Branch coverage**: more meaningful than line; a line with three branches at 100% line coverage but 33% branch coverage is a test gap
+- **Mutation testing** (Stryker, PITest): the only tool that distinguishes "covered" from "asserting"; run on critical modules; a mutation score below 70% means your tests are watching, not verifying
+- **Ratchet pattern for legacy code**: set threshold to current value; enforce "never go below current"; add tests incrementally rather than chasing an arbitrary target in one sprint
diff --git a/skills/testing-strategy/references/boundary-testing-patterns.md b/skills/testing-strategy/references/boundary-testing-patterns.md
new file mode 100644
index 0000000..a085152
--- /dev/null
+++ b/skills/testing-strategy/references/boundary-testing-patterns.md
@@ -0,0 +1,89 @@
+# Boundary-Value Testing Patterns by Data Type
+
+Bugs live at boundaries. This is not a heuristic — it is a documented empirical finding across software testing research going back to Myers (1979). For every conditional in business logic, test n-1, n, and n+1 around every threshold.
+
+## Numeric Boundaries
+
+| Scenario | Test Values |
+|----------|------------|
+| `age >= 18` | 17, 18, 19 |
+| `quantity > 0` | -1, 0, 1 |
+| `amount <= 1000` | 999, 1000, 1001 |
+| `0 <= score <= 100` | -1, 0, 1, 99, 100, 101 |
+| Integer overflow | MAX_INT - 1, MAX_INT, MAX_INT + 1 |
+| Float precision | 0.1 + 0.2 (not 0.3), values near epsilon |
+
+## String Boundaries
+
+| Scenario | Test Values |
+|----------|------------|
+| Required field | empty string `""`, whitespace-only `"   "`, one char `"a"` |
+| Max length (e.g., 255) | 254 chars, 255 chars, 256 chars |
+| Email format | `a@b.c` (minimal valid), no `@`, double `@`, trailing dot |
+| URL | `http://`, `https://`, missing protocol, localhost |
+| Encoding | ASCII-safe string, Unicode (emoji, CJK), null byte `\0`, SQL injection payload |
+
+## Collection/Array Boundaries
+
+| Scenario | Test Values |
+|----------|------------|
+| Non-empty check | `[]`, `[item]`, `[item, item]` |
+| Pagination | page 0, page 1, last page, page beyond last |
+| First/last element access | index 0, index -1, index length-1, index length |
+| Deduplication logic | all unique, all same, one duplicate |
+
+## Date and Time Boundaries
+
+These are the most commonly missed boundaries in practice:
+
+| Scenario | Test Values |
+|----------|------------|
+| End of month | Feb 28 (non-leap), Feb 28 (leap), Feb 29, Mar 1 |
+| End of year | Dec 31, Jan 1 of next year |
+| Timezone at midnight | 23:59:59 UTC is next day in UTC+1 |
+| Daylight saving time | 1:59 AM → 3:00 AM (spring forward), 2:00 AM twice (fall back) |
+| Epoch | Unix timestamp 0 (Jan 1 1970), negative timestamps |
+| Leap second | Not usually testable but worth noting for financial systems |
+
+## Null / Undefined / Missing
+
+JavaScript/TypeScript specific — the most common source of runtime errors:
+
+| Value | Distinct from |
+|-------|--------------|
+| `null` | `undefined` (explicitly set to no-value vs. never set) |
+| `undefined` | missing key entirely (`obj.key === undefined` vs. `'key' in obj === false`) |
+| `""` | `null` (empty string is a value; null is absence of value) |
+| `0` | `false`, `null`, `undefined` (all falsy; only some are "no value") |
+| `NaN` | invalid numeric operation; `NaN !== NaN` by IEEE spec |
+
+**Pattern**: any function that accepts an optional parameter must be tested with the parameter absent, null, undefined, and an empty-equivalent value. Do not assume they're the same.
+
+## Boolean Boundary Anti-Patterns
+
+Tests that only cover `true` and `false` miss:
+- Default value when not set
+- What happens when the flag is toggled mid-operation
+- What happens when two boolean flags interact (4 combinations, not 2)
+
+For two independent flags A and B, test all four: `(F,F), (T,F), (F,T), (T,T)`.
+
+## State Machine Boundaries
+
+For any enum or status field:
+- Every valid transition (A → B when allowed)
+- Every invalid transition (A → C when not allowed — what is the error behavior?)
+- Boundary states: initial state, terminal state, error state
+- Duplicate transition (A → A — idempotent or error?)
+- Missing state (what if the value is an unexpected enum variant from a future API version?)
+
+## Practical Workflow
+
+1. For each function with a conditional, list every threshold value.
+2. Write tests for n-1, n, n+1 around each threshold.
+3. For string inputs: always include empty, whitespace-only, and max-length.
+4. For collections: always include empty, single-element, and two-element.
+5. For dates: always include end-of-month, end-of-year, DST transition if timezone-aware.
+6. For nullable values: always include null, undefined, and empty-equivalent separately.
+
+The test count grows linearly with boundaries. The bug-catch rate grows much faster.
diff --git a/skills/testing-strategy/references/e2e-framework-patterns.md b/skills/testing-strategy/references/e2e-framework-patterns.md
new file mode 100644
index 0000000..4d35d9b
--- /dev/null
+++ b/skills/testing-strategy/references/e2e-framework-patterns.md
@@ -0,0 +1,133 @@
+# E2E Framework Patterns: Playwright and Cypress
+
+## Playwright: CI Configuration
+
+### Parallel sharding (recommended for suites > 5 min)
+
+```yaml
+# .github/workflows/playwright.yml
+strategy:
+  matrix:
+    shard: [1/4, 2/4, 3/4, 4/4]
+steps:
+  - run: npx playwright test --shard=${{ matrix.shard }}
+```
+
+### API mocking with route.fulfill
+
+```typescript
+test('shows error on API failure', async ({ page }) => {
+  await page.route('**/api/users', route =>
+    route.fulfill({ status: 503, body: 'Service Unavailable' })
+  );
+  await page.goto('/users');
+  await expect(page.getByText('Unable to load users')).toBeVisible();
+});
+```
+
+### Multi-tab pattern
+
+```typescript
+test('opens in new tab and syncs state', async ({ context }) => {
+  const page1 = await context.newPage();
+  const page2 = await context.newPage();
+  await page1.goto('/dashboard');
+  await page1.click('[data-testid="open-in-new-tab"]');
+  // Both pages share the same browser context (cookies, localStorage)
+  await expect(page2.getByText('Dashboard')).toBeVisible();
+});
+```
+
+### waitForSelector vs. sleep
+
+```typescript
+// WRONG — will race on slow CI
+await page.click('#submit');
+await page.waitForTimeout(2000); // sleep
+await expect(page.getByText('Success')).toBeVisible();
+
+// RIGHT — deterministic
+await page.click('#submit');
+await page.waitForSelector('[data-testid="success-message"]');
+// or: await expect(page.getByText('Success')).toBeVisible(); // has built-in retry
+```
+
+---
+
+## Cypress: CI Configuration
+
+### Parallelism via Cypress Cloud
+
+```yaml
+# .github/workflows/cypress.yml
+- name: Cypress run
+  uses: cypress-io/github-action@v6
+  with:
+    record: true
+    parallel: true
+    group: 'CI'
+  env:
+    CYPRESS_RECORD_KEY: ${{ secrets.CYPRESS_RECORD_KEY }}
+```
+
+### API interception
+
+```javascript
+cy.intercept('GET', '/api/users', { fixture: 'users.json' }).as('getUsers');
+cy.visit('/users');
+cy.wait('@getUsers');
+cy.get('[data-testid="user-list"]').should('have.length', 3);
+```
+
+### Avoiding sleep in Cypress
+
+```javascript
+// WRONG
+cy.click('#submit');
+cy.wait(3000); // arbitrary sleep
+cy.get('.success').should('be.visible');
+
+// RIGHT
+cy.click('#submit');
+cy.get('.success', { timeout: 10000 }).should('be.visible');
+// Cypress retries the assertion; no sleep needed
+```
+
+---
+
+## Choosing Between Them: Non-Obvious Factors
+
+**Playwright wins for**:
+- Multi-browser matrix (Firefox, WebKit) required by regulatory or accessibility requirement
+- Testing browser extensions or chrome:// pages
+- Testing flows that open multiple tabs (OAuth popups, PDF viewers, print dialogs)
+- Systems where network traces (HAR) are needed for debugging
+- Teams that will exceed the Cypress free tier on CI run minutes
+
+**Cypress wins for**:
+- Teams that need interactive time-travel debugging during test authoring (the Cypress UI is genuinely superior for this)
+- Projects with existing Cypress investment and Chrome-only requirement
+- React/Vue component testing when Cypress's component testing runner is already in use
+- Teams that find Playwright's async/await model more cognitive overhead than they want
+
+**Never decide based on**: syntax preference, GitHub stars, "everyone uses X." Decide based on which framework's failure modes match your team's acceptable costs.
+
+---
+
+## MSW (Mock Service Worker) for Shared Mocking
+
+MSW works in both browser and Node.js, making it the right tool for sharing mock definitions between unit, integration, and E2E layers:
+
+```typescript
+// handlers.ts — shared across all test layers
+export const handlers = [
+  http.get('/api/users', () =>
+    HttpResponse.json([{ id: 1, name: 'Alice' }])
+  ),
+];
+
+// unit/integration: setupServer from msw/node
+// browser/E2E: setupWorker from msw/browser
+```
+
+This pattern eliminates the drift problem where unit test mocks and E2E mocks describe different contracts for the same endpoint.
diff --git a/skills/testing-strategy/references/pyramid-vs-trophy.md b/skills/testing-strategy/references/pyramid-vs-trophy.md
new file mode 100644
index 0000000..40d3197
--- /dev/null
+++ b/skills/testing-strategy/references/pyramid-vs-trophy.md
@@ -0,0 +1,55 @@
+# Pyramid vs. Trophy: Worked Examples
+
+## When the Pyramid Wins
+
+**Scenario**: A tax calculation engine with 40 business rules.
+
+The business logic is pure: inputs are financial values, outputs are computed amounts. Each rule is testable in isolation. Integration is thin: one DB read at start, one write at end. Running 400 unit tests takes 2 seconds.
+
+Pyramid shape: 400 unit tests, 10 integration tests (DB round-trip), 3 E2E (critical user paths).
+
+The unit tests catch: wrong formula, wrong rounding mode, wrong bracket boundary, missing deduction.
+The integration tests catch: wrong ORM query, missing transaction, schema mismatch.
+The E2E tests catch: page crashes, form submission failure, result display error.
+
+**Signal that pyramid is right**: look at recent bugs — they were arithmetic errors in the logic layer, not integration failures.
+
+---
+
+## When the Trophy Wins
+
+**Scenario**: A CRUD admin panel — users, roles, organizations, permission inheritance.
+
+The "business logic" is: read from DB, transform slightly, write to DB. Unit testing the transformation layer tests almost nothing useful. The real failure modes are: wrong JOIN producing incorrect permission inheritance, missing WHERE clause exposing wrong tenant's data, ORM eager-loading N+1 that works in tests but crashes production under load.
+
+Trophy shape: 20 unit tests (pure transformation utilities), 150 integration tests (real DB in Docker, full request-to-response), 10 E2E (auth flow, permission boundary, multi-tenant isolation).
+
+The integration tests catch: wrong query, missing RLS policy, incorrect cascade behavior.
+The unit tests would have caught: none of the real bugs.
+
+**Signal that trophy is right**: every recent production bug was an integration failure; your unit tests have never caught a bug in production.
+
+---
+
+## The Diagnostic Test
+
+Before choosing pyramid or trophy for a new system, run this exercise:
+
+1. List the last 10 bugs that reached production.
+2. Classify each: "Would a unit test have caught this?" vs. "Would an integration test have caught this?"
+3. The ratio tells you your bug distribution. Match your test distribution to your bug distribution.
+
+If 8/10 were integration bugs and you build a unit-heavy pyramid, 80% of your test investment covers 20% of your risk.
+
+---
+
+## The Hybrid Trap
+
+Teams often build a hybrid without realizing it: unit tests for everything, plus integration tests for everything, plus E2E for everything — triple coverage with triple maintenance. This happens when the pyramid is applied as a rule ("always write unit tests") rather than a tool.
+
+Signs of the hybrid trap:
+- A unit test and an integration test for the same behavior in the same commit
+- E2E tests that test the same flows as integration tests, just slower
+- Test suite takes 30+ minutes; nobody runs it locally
+
+Fix: audit which layer is actually catching bugs. Cut the layers that aren't. The rule is coverage of risk, not coverage of layers.
diff --git a/skills/workplace-conversations/SKILL.md b/skills/workplace-conversations/SKILL.md
new file mode 100644
index 0000000..c704555
--- /dev/null
+++ b/skills/workplace-conversations/SKILL.md
@@ -0,0 +1,264 @@
+---
+name: workplace-conversations
+description: Practitioner guide for preparing and navigating high-stakes workplace conversations — giving corrective or positive feedback, addressing underperformance, managing live conflict, upward feedback, peer disagreements, salary negotiations, and emotionally charged discussions. Trigger phrases: "giving feedback", "difficult feedback", "how do I talk to", "preparing for a hard conversation", "conflict with coworker", "talking to my manager about", "addressing performance", "someone is defensive", "hard 1:1", "confrontation", "underperformance", "performance conversation", "SBI", "coaching".
+allowed-tools: Read, Glob, Grep
+---
+
+# Workplace Conversations
+
+## Navigation — Choose Your Path
+
+Before anything else, route to the right track:
+
+```
+What is the primary challenge right now?
+
+A. I need to PREPARE and DELIVER feedback
+   (structured message, written prep, SBI, async or scheduled 1:1)
+   → Path A: Feedback Delivery
+
+B. I am IN or about to enter a LIVE difficult conversation
+   (real-time conflict, someone just escalated, amygdala hijack, Q&A under pressure)
+   → Path B: Live Conversation Navigation
+
+Not sure?
+  → Is there a specific behavior + impact you want to communicate?
+      YES → Path A
+      NO, it's a relationship/conflict/emotional situation → Path B
+
+Both paths lead to the same outcome: durable behavior change and an intact relationship.
+```
+
+---
+
+## Path A: Feedback Delivery
+
+### Mindset
+
+1. **Feedback is information transfer, not judgment delivery.** The moment the receiver perceives evaluation, their prefrontal cortex partially shuts down — defensiveness is physiological, not character flaw. Frame as "I noticed / I want to understand" not "you did X wrong."
+
+2. **Timing destroys or amplifies impact.** Feedback given within 48 hours of an event lands 3–4x more effectively than delayed feedback. "Annual review surprises" are managerial malpractice — they signal you withheld useful information for months.
+
+3. **The goal of feedback is behavior change, not catharsis.** If you need to vent, do it elsewhere. The conversation exists to help them.
+
+4. **Pattern vs. instance matters more than severity.** A single bad code review matters less than 3 in a row. Leading with "I've noticed a pattern" signals seriousness without catastrophizing a one-time event.
+
+5. **Positive SBI is not trivial.** Most managers skip it or give generic praise. Specific positive SBI builds the safety account that makes corrective feedback land better later.
+
+### Decision Tree for Feedback Type
+
+```
+Is this corrective or positive feedback?
+  ├─ Corrective + first time → SBI + curiosity (explore root cause)
+  ├─ Corrective + recurring pattern → SBI + explicit stakes + plan
+  ├─ Corrective + HR-level severity → load references/delivery-scripts.md
+  └─ Positive → SBI + specific impact (skip the "but" — never pair positive/corrective)
+```
+
+### Core Technique: SBI (Extended Practitioner Version)
+
+The SBI formula: **Situation → Behavior → Impact**
+
+**The "I" in Impact must connect to receiver values, not just your frustration.** If you know they care about team perception, connect the impact to that. Generic impact ("it slowed us down") lands weaker than specific impact ("it meant Sarah had to redo 3 hours of work and now doubts whether to surface issues").
+
+**SBI+ for recurring patterns:**
+> Situation → Behavior → Impact → **Expectation** → **Stakes**
+
+Add: "Going forward, I need [specific behavior]. If this continues, [concrete consequence]." The stakes must be real and proportional — don't threaten what you won't follow through on.
+
+For worked examples across code reviews, meetings, and deliverables: load `references/feedback-sbi-model.md`
+
+### Timing Heuristics
+
+| Scenario | Optimal Timing | Why |
+|----------|---------------|-----|
+| In-meeting behavior | Within same day, privately | Memory sharp; behavior still vivid |
+| Code/work quality | Within 48 hours of delivery | Before they're deep in next task |
+| Interpersonal conflict | After 2–4 hours cooling, same day | Enough distance to avoid flood; close enough to be concrete |
+| Pattern (recurring issue) | Scheduled 1:1, not ad hoc | Signals importance, allows preparation |
+| Crisis/incident behavior | After incident closes, not during | During crisis, feedback = distraction |
+
+### Path A: When Things Go Wrong
+
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| Receiver goes silent or shuts down | Social threat response; they feel ambushed or shamed | Name it: "I notice you've gone quiet — I want to make sure this feels like a conversation, not a verdict. What's your reaction?" |
+| Receiver becomes defensive / attacks back | Behavior stated as interpretation, not observation; or they feel blindsided | Return to facts: "I want to make sure I'm working from what I actually observed. Can we back up to [specific event]?" |
+| Receiver agrees but nothing changes | Agreement was to end discomfort, not genuine buy-in | Explicit follow-through plan required: named actions, dates, check-in. "What specifically will you do differently by [date]?" |
+| You realize mid-conversation you lack specifics | Feedback was prepared on impressions not evidence | Pause honestly: "I realize I should have more specific examples ready. Can we schedule 30 minutes tomorrow when I can come prepared?" |
+| Positive feedback lands flat | Generic praise, or receiver distrusts motives | Increase specificity: name exact action, exact effect, exact why it mattered. Vague praise is discounted; surgical praise lands. |
+
+### Path A Reference Loading Triggers
+
+Load `references/feedback-sbi-model.md` when:
+- User needs SBI examples across different contexts (code reviews, meetings, deliverables)
+- User is learning the framework and needs worked examples
+
+Load `references/delivery-scripts.md` when:
+- User needs exact opening lines or word-for-word phrasing
+- Scenario involves HR-sensitivity, upward feedback, or legal exposure
+- User is conflict-averse and needs scripted scaffolding to start
+
+Load `references/expectation-alignment.md` when:
+- Issue is stakeholder misalignment, scope creep, or "moving goalposts"
+- User needs to reset expectations without creating conflict
+
+---
+
+## Path B: Live Conversation Navigation
+
+### Mindset
+
+1. **The goal is shared understanding, not a verdict.** Most failed conversations happen because one party enters to win. Practitioners enter to learn something they don't know yet — even when the facts seem clear.
+
+2. **Timing is a power lever most people waste.** Raising an issue when the other person is stressed, public, or cornered guarantees defense. The right moment (private, unhurried, not post-incident) does half the work.
+
+3. **A prepared BATNA changes how you show up.** Knowing what you'll do if the conversation fails — not as a threat, but as a real alternative — eliminates desperation from your posture. The other person feels it even if you never say it.
+
+4. **Regression is normal.** Behavior changes after a hard conversation often reverse within 2–3 weeks. Schedule a check-in at week 3 precisely because of this, not because you distrust the other person.
+
+5. **Specificity is protection.** Vague feedback ("you need to be more professional") creates defensiveness because it leaves the other person to guess. Specificity removes the guess and removes the threat.
+
+### Triage — What Kind of Conversation Is This?
+
+```
+Is there a power differential (you are the manager)?
+  YES → Performance/termination track → load references/conversation-framework.md
+  NO  → Is this a pattern or a one-time incident?
+          PATTERN → Conflict/peer track → load references/delivery-scripts.md
+          ONE-TIME → Is it upward (to your manager)?
+                       YES → Upward feedback track → see Upward Feedback section below
+                       NO  → Sensitive topic track → load references/emotional-regulation.md
+```
+
+### The Three Phases
+
+**Phase 1: Preparation (40% of success)** — Most difficult conversations fail before they start.
+
+Key preparation steps:
+1. Separate facts from the story you're telling yourself
+2. Identify your contribution — what did you do (or not do) that contributed?
+3. Understand their perspective before you walk in
+4. Define your goal: outcome goal, relationship goal, identity goal
+5. Know your BATNA
+6. Choose time and place (private, unhurried, not Friday)
+
+For the full preparation worksheet: load `references/preparation-template.md`
+
+**Phase 2: Delivery** — Opening sets the tone for everything.
+
+Opening formula:
+```
+[Purpose] + [Intent] + [Invitation]
+"I'd like to discuss [topic]. My goal is to [understand/resolve/improve]. I'd like to hear your perspective too."
+```
+
+Conversation structure: **Share → Listen → Explore → Resolve**
+
+**Phase 3: Follow-up (Ensuring Change)**
+
+- Send written summary within 24 hours — but ask THEM to summarize what they're taking away first. Their words reveal whether there's real alignment.
+- Schedule a 3-week check-in at the close of the original conversation. Most regressions happen at week 2–3.
+- Close every conversation with a specific, owned action: who does what, by when, and how you'll both know it happened.
+
+### Amygdala Hijack Recovery
+
+When you or the other person is triggered (raised voice, shutdown, flooding):
+
+1. **Recognize it** — physical signs: heart rate up, breathing shallow, face flushing, thoughts racing
+2. **Name it without blame:** "I notice this is hitting hard for both of us."
+3. **Use the explicit pause:** "I want to make sure I respond thoughtfully. Can I have a moment?"
+4. **Body reset:** Unclench jaw, drop shoulders, unfold arms, plant feet, lean back slightly
+5. **Perspective shift:** What might be their positive intent? What am I missing?
+6. **Exit if needed:** "I want to give this the attention it deserves. Can we take a break and continue tomorrow?"
+
+Recovery time after full hijack: 20–30 minutes. Don't push through — a flooded conversation produces defensive agreements that don't stick.
+
+### Path B: When Things Go Wrong
+
+| Situation | Likely Cause | Recovery |
+|-----------|-------------|----------|
+| Other person shuts down or goes silent | Felt accused, not invited | Name it: "I notice you've gone quiet — I want to make sure this feels safe to discuss." Then pause. Don't fill the silence. |
+| Conversation spirals into unrelated grievances | No shared agenda set upfront | Interrupt cleanly: "I want to hear all of this. Can we agree to finish this topic first and schedule time for the rest?" |
+| Agreed actions don't happen after the conversation | Agreement was vague or socially coerced | Return with curiosity, not accusation. Ask what got in the way. Vague commitments usually mean the person didn't actually agree — they just ended the discomfort. |
+| Person becomes hostile or escalates | Conversation revealed a threat to their status/livelihood | De-escalate: "I can see this is hitting hard. I'm not here to threaten your position — I'm here because I want this to work." Stop talking. Let them respond. |
+| Your own emotions take over mid-conversation | Ambush response to something unexpected | Use the explicit pause. Practitioners normalize pausing; it signals self-awareness, not weakness. |
+
+### Upward Feedback (Special Case)
+
+Giving feedback to someone with power over you has a different risk profile. Three adjustments:
+
+1. **Frame as impact on your effectiveness, not a critique of their behavior** — "When X happens, I find it harder to do Y" is harder to dismiss than "you do X."
+2. **Request, don't demand** — end with a question, not a directive: "Would you be open to trying Z?" gives them an exit that isn't losing face.
+3. **Accept that they may not change** — your goal is to have said it clearly once, not to fix them. If nothing changes, that's data for your own decisions.
+
+### Conversation Types Quick Reference
+
+| Type | Primary Risk | Key Move |
+|------|-------------|----------|
+| Performance issue | They feel blindsided | Cite prior informal conversations; ask what they'd need to succeed |
+| Peer conflict | Both parties feel wronged | Separate fact from interpretation before the conversation, not during |
+| Termination | Emotional flooding | Short, clear, no hedging; deliver the decision in the first 60 seconds |
+| Salary/promotion | Negotiation feels adversarial | Anchor to market data + contribution, not personal need |
+| Upward feedback | Power retaliation risk | See Upward Feedback section above |
+
+### Path B Reference Loading Triggers
+
+Load `references/conversation-framework.md` when:
+- User needs full preparation workflow, fact/story separation exercise, contribution mapping
+- Manager-subordinate performance or termination track
+
+Load `references/preparation-template.md` when:
+- User needs a blank worksheet to fill out before the conversation
+
+Load `references/delivery-scripts.md` when:
+- User needs opening formulas by situation type, de-escalation phrases, closing scripts
+
+Load `references/emotional-regulation.md` when:
+- User needs pre-conversation regulation techniques
+- User is managing their own state during high-charge moments
+
+---
+
+## Shared NEVER Rules
+
+- **NEVER use the feedback sandwich (positive-negative-positive)** — it trains people to distrust positive feedback and miss the corrective message. Use pure SBI instead.
+
+- **NEVER give corrective feedback in public** — social threat activates the same neural circuits as physical threat. Public correction compounds shame and guarantees defensiveness.
+
+- **NEVER delay feedback to "find the right moment" indefinitely** — memory encoding degrades within 72 hours. Waiting for perfect conditions means the specific behavior is no longer vivid to either party.
+
+- **NEVER pair positive and corrective feedback in the same sentence with "but"** — "but" neurologically erases everything before it. Use separate conversations or at minimum separate paragraphs with a full stop.
+
+- **NEVER give feedback when you or the receiver are emotionally flooded** — cortisol and adrenaline impair complex cognition. Reschedule explicitly: "I want to have this conversation when we're both at our best — can we do 10am tomorrow?"
+
+- **NEVER interpret behavior aloud without evidence** — attributing motive ("you clearly don't care about quality") activates the fundamental attribution error. Stick to observable actions and their effects.
+
+- **NEVER skip the "impact" step of SBI** — behavior without impact sounds like nitpicking. Impact connects the behavior to something the receiver actually cares about.
+
+- **NEVER open with impact before the other person understands what behavior you're describing** — they'll spend the entire conversation defending their intention instead of hearing the impact. Behavior first, impact second, always.
+
+- **NEVER have a high-stakes conversation without a written BATNA** — without one your desperation leaks into your tone and the other person calibrates to it.
+
+- **NEVER follow up a difficult conversation only once** — single follow-ups let people wait you out. Set a 3-week check-in at the close of the original conversation.
+
+- **NEVER raise a performance issue for the first time in a formal review** — the employee experiences it as an ambush, the feedback becomes legally and relationally contaminated, and you've lost 6–12 months of intervention time.
+
+- **NEVER use "we need to talk" as a meeting invite subject** — it triggers 48–72 hours of anxiety that arrives to your conversation as pre-loaded defensiveness. Name the topic neutrally: "checking in on Q3 deliverables."
+
+- **NEVER conflate "they heard me" with "they agreed"** — acknowledgment is not commitment. Close every conversation with a specific, owned action.
+
+## Philosophy
+
+Feedback is an act of respect — it assumes the person can change and is worth investing in. Withholding honest feedback to avoid discomfort is not kindness; it's abandonment.
+
+The hardest conversations are hard because both parties have a legitimate version of events. Your job is not to correct their version but to build a third version together — one specific enough to produce durable action.
+
+Deliver what's true, specifically, without cruelty.
+
+## Do NOT Use This Skill When
+
+- Writing performance review prose (use a writing skill instead)
+- Termination conversations require HR/legal review — this skill covers your role in that process, not the process itself
+- Therapy or mental health support (refer appropriately)
+- HR/legal issues where process must lead (loop in HR first)
diff --git a/skills/workplace-conversations/references/conversation-framework.md b/skills/workplace-conversations/references/conversation-framework.md
new file mode 100644
index 0000000..0dc5727
--- /dev/null
+++ b/skills/workplace-conversations/references/conversation-framework.md
@@ -0,0 +1,353 @@
+# Difficult Conversation Framework
+
+Complete guide to the three-phase framework for challenging workplace conversations.
+
+## The Three Phases in Depth
+
+### Phase 1: Preparation (40% of Success)
+
+Most difficult conversations fail before they start. Thorough preparation dramatically improves outcomes.
+
+#### Step 1: Clarify the Facts
+
+##### What happened? (Objective)
+
+Write down only observable facts:
+
+- What did you see or hear?
+- When and where did it happen?
+- Who was present?
+
+##### What story am I telling? (Subjective)
+
+Notice your interpretations:
+
+- What assumptions am I making about their intent?
+- What conclusions have I jumped to?
+- What emotions am I attaching to the facts?
+
+##### Exercise: Separate Facts from Interpretations
+
+| Fact (Observable) | Interpretation (Story) |
+| ----------------- | ---------------------- |
+| "John submitted the report 3 days late" | "John doesn't care about deadlines" |
+| "Sarah interrupted me twice in the meeting" | "Sarah doesn't respect my opinions" |
+| "Alex didn't respond to my email for a week" | "Alex is avoiding me" |
+
+#### Step 2: Identify Your Contribution
+
+Even in situations where you feel wronged, consider:
+
+- What did I do (or not do) that contributed?
+- Did I set clear expectations?
+- Did I provide necessary resources/support?
+- Have I modeled the behavior I expect?
+- Did I address early warning signs?
+
+This isn't about blame - it's about taking ownership of what you can control.
+
+#### Step 3: Understand Their Perspective
+
+**Questions to Consider:**
+
+- How might they see this situation differently?
+- What pressures or constraints are they facing?
+- What might be their positive intention (even if the impact was negative)?
+- What do they care about that I can acknowledge?
+- What would they say if I asked them to describe what happened?
+
+**Empathy Exercise:**
+
+Complete these sentences:
+
+- "If I were in their position, I might..."
+- "They might be feeling..."
+- "They might want..."
+- "They might be worried about..."
+
+#### Step 4: Define Your Goal
+
+**Three Goals to Clarify:**
+
+1. **Outcome Goal:** What specific change do I want?
+2. **Relationship Goal:** How do I want us to relate after?
+3. **Identity Goal:** Who do I want to be in this conversation?
+
+**Goal Framing:**
+
+| Ineffective Goal | Effective Goal |
+| ---------------- | -------------- |
+| "Make them apologize" | "Understand what happened and agree on prevention" |
+| "Prove I'm right" | "Share my perspective and understand theirs" |
+| "Get them to change" | "Explore how we can work together differently" |
+
+#### Step 5: Choose Time and Place
+
+**Timing:**
+
+- Not when either party is stressed, rushed, or emotional
+- Not right before/after high-stakes situations
+- Allow enough time (30-60 minutes minimum)
+- Not on Fridays (no time to follow up)
+
+**Location:**
+
+- Private, neutral space
+- No power imbalances (not your office if you're the boss)
+- Away from distractions
+- Comfortable (seating arrangement, temperature)
+
+### Phase 2: Delivery (The Conversation)
+
+#### Opening (First 2 Minutes)
+
+The opening sets the tone for everything. Get it right.
+
+**Bad Openings:**
+
+- "We need to talk about your performance..."
+- "I've noticed some problems..."
+- "I've been meaning to tell you..."
+
+**Good Opening Formula:**
+
+```text
+1. State purpose: "I'd like to discuss [topic]"
+2. Share your intent: "My goal is to [understand/resolve/improve]"
+3. Invite partnership: "I'd like to hear your perspective too"
+```
+
+**Opening Templates:**
+
+For Performance:
+> "I'd like to talk about the project timeline. My goal is to understand what's happening and figure out how I can better support you. Can you share your perspective on how things are going?"
+
+For Conflict:
+> "I wanted to discuss what happened in yesterday's meeting. I'd like to understand your viewpoint and share mine so we can work together more effectively. Is now a good time?"
+
+For Feedback:
+> "I have some feedback about the presentation. My intention is to help you succeed, and I'd also like to hear your thoughts on how it went."
+
+#### The Conversation Body
+
+##### Structure: Share → Listen → Explore → Resolve
+
+##### 1. Share Your Perspective (2-3 minutes)
+
+Use this formula:
+
+- **When** [specific situation/time]
+- **I noticed** [observable behavior - no judgment]
+- **The impact was** [effect on you/team/work]
+- **I felt** [your emotion - optional]
+- **I'd like to understand** [invitation to respond]
+
+Example:
+> "When the client presentation was last week, I noticed the slides weren't ready until the morning of. The impact was that I didn't have time to review them, and I felt anxious going in unprepared. I'd like to understand what happened on your end."
+
+##### 2. Listen to Understand (5-10 minutes)
+
+Your only job here is to understand their perspective.
+
+**Listening Techniques:**
+
+- **Paraphrase:** "So what you're saying is..."
+- **Clarify:** "Help me understand what you mean by..."
+- **Acknowledge emotion:** "It sounds like you felt..."
+- **Validate:** "That makes sense because..."
+
+**Questions That Deepen Understanding:**
+
+- "Can you tell me more about that?"
+- "What was going through your mind when...?"
+- "How do you see this differently?"
+- "What do you wish I understood?"
+
+##### 3. Explore Together (10-15 minutes)
+
+Look for:
+
+- Points of agreement
+- Underlying interests (not positions)
+- New information that changes your understanding
+- Shared goals
+
+**Exploration Questions:**
+
+- "What do we both want here?"
+- "What would success look like for both of us?"
+- "What's getting in the way?"
+- "What haven't we considered?"
+
+##### 4. Resolve and Agree (5-10 minutes)
+
+Get specific:
+
+- **What** specifically will change?
+- **Who** will do what?
+- **When** will it happen?
+- **How** will we know it's working?
+- **When** will we check in?
+
+**Agreement Template:**
+
+```markdown
+## What We Agreed
+
+**Actions:**
+- [Person A] will [action] by [date]
+- [Person B] will [action] by [date]
+
+**Check-in:**
+- We'll touch base on [date] to see how it's going
+
+**If issues continue:**
+- We'll [next step]
+```
+
+#### Handling Difficult Moments
+
+**If They Get Defensive:**
+
+- "I can see this is bringing up strong feelings"
+- "I'm not trying to attack you - I want to understand"
+- "Let's slow down. What's your main concern right now?"
+
+**If They Deny or Deflect:**
+
+- "Help me understand how you see it differently"
+- "What would you need to see to agree there's an issue?"
+- "Let's focus on what we can both agree happened"
+
+**If You Get Triggered:**
+
+- "I need a moment to think about that"
+- "I'm feeling defensive - can you help me understand your point?"
+- "Let's take a 5-minute break and come back to this"
+
+**If You Hit an Impasse:**
+
+- "We seem stuck. Can we agree on what we do agree on?"
+- "What would help us move forward?"
+- "Would it help to bring in [neutral party]?"
+
+### Phase 3: Followup (Ensuring Change)
+
+#### Immediate Followup (Within 24 Hours)
+
+1. **Send written summary** of what was agreed
+2. **Thank them** for the conversation
+3. **Reaffirm commitment** to the relationship
+
+**Email Template:**
+
+```text
+Subject: Following up on our conversation
+
+Hi [Name],
+
+Thank you for taking the time to talk today. I appreciated your
+openness in sharing your perspective.
+
+Here's what I understood we agreed to:
+- [Action 1]
+- [Action 2]
+
+Let's check in on [date] to see how things are going.
+
+I'm committed to [your commitment], and please let me know if
+anything comes up in the meantime.
+
+[Your name]
+```
+
+#### Scheduled Followup (Per Timeline)
+
+**What to Cover:**
+
+- What's working?
+- What's not working?
+- Do we need to adjust?
+- Are we ready to close this topic?
+
+**Signs of Success:**
+
+- Behavior has changed
+- Relationship is intact or improved
+- Both parties feel resolution
+- Issue doesn't recur
+
+**If Issues Continue:**
+
+- Acknowledge improvement made
+- Be specific about remaining concerns
+- Consider escalation if necessary
+
+## Special Conversation Types
+
+### The "Feedback Conversation"
+
+**Use for:** Sharing developmental feedback, addressing skill gaps
+
+**Key Principles:**
+
+- Timely (soon after event)
+- Specific (concrete examples)
+- Balanced (also acknowledge strengths)
+- Forward-looking (focus on growth)
+
+### The "Conflict Resolution Conversation"
+
+**Use for:** Interpersonal conflicts, disagreements, damaged relationships
+
+**Key Principles:**
+
+- Focus on interests, not positions
+- Acknowledge emotions on both sides
+- Look for win-win solutions
+- Rebuild trust through small actions
+
+### The "Accountability Conversation"
+
+**Use for:** Missed expectations, broken commitments, performance issues
+
+**Key Principles:**
+
+- Clear expectations were set
+- Gap is documented
+- Consequences are appropriate
+- Support for improvement offered
+
+### The "Sensitive Topic Conversation"
+
+**Use for:** Personal matters affecting work, layoffs, sensitive feedback
+
+**Key Principles:**
+
+- Extra privacy and confidentiality
+- More preparation time
+- HR/legal involvement if needed
+- Extended processing time
+
+## Measuring Conversation Success
+
+### Immediate Indicators
+
+- Both people spoke and felt heard
+- No one stormed out or shut down
+- Specific agreements were made
+- Relationship feels intact
+
+### Short-Term Indicators (1-2 weeks)
+
+- Agreed actions are happening
+- Communication continues normally
+- No escalation required
+- Positive momentum visible
+
+### Long-Term Indicators (1-3 months)
+
+- Issue is resolved
+- Relationship is stronger
+- Similar issues handled better
+- Pattern of open communication
diff --git a/skills/workplace-conversations/references/delivery-scripts.md b/skills/workplace-conversations/references/delivery-scripts.md
new file mode 100644
index 0000000..9756ff9
--- /dev/null
+++ b/skills/workplace-conversations/references/delivery-scripts.md
@@ -0,0 +1,362 @@
+# Delivery Scripts for Workplace Conversations
+
+Ready-to-use opening lines, response phrases, and scenario playbooks for both feedback delivery (Path A) and live conversation navigation (Path B).
+
+## Opening Scripts
+
+### Opening Formula
+
+```text
+[Purpose] + [Intent] + [Invitation]
+```
+
+### General Openings
+
+**Collaborative tone:**
+> "I'd like to talk about [topic]. My goal is to understand your perspective and share mine so we can [outcome]. Is this a good time?"
+
+**Direct tone:**
+> "We need to discuss [topic]. I want to be straightforward with you, and I also want to hear your side. Can we talk now?"
+
+**Curious tone:**
+> "I've noticed [observation] and I'd like to understand what's happening. Can you help me see your perspective?"
+
+### Specific Situation Openings
+
+**For Performance Issues:**
+> "I wanted to check in about [specific area]. I've noticed [specific observation], and I'd like to understand what's happening from your end so we can address it together."
+
+**For Missed Deadlines:**
+> "I've noticed the last few deliverables have come in past their due dates. I want to understand what's been challenging so we can figure out how to get back on track."
+
+**For Quality Concerns:**
+> "I want to discuss some patterns I've noticed in recent work. I know you're capable of great work, so I'm curious what's been different lately."
+
+**For Conflict with Peer:**
+> "I'd like to clear the air about [incident/pattern]. I value our working relationship and want to address this before it affects our collaboration. Can we talk?"
+
+**For Giving Feedback to a Manager (Upward):**
+> "I have some feedback I'd like to share. I'm hoping this can help us work together even better. Are you open to hearing it?"
+
+**For Requesting Raise or Promotion:**
+> "I'd like to discuss my [compensation/career path]. I've prepared some thoughts, and I'm also interested in understanding your perspective on my trajectory here."
+
+**For Sensitive Personal Topic:**
+> "I need to discuss something personal that's affecting my work. I want to be transparent with you while also keeping some boundaries. Is now a good time?"
+
+**For Giving Difficult News (project cuts, role changes):**
+> "I have some difficult news about [topic]. I want to share what I know and give you a chance to ask questions."
+
+**For Ending a Collaboration:**
+> "I've made the difficult decision to [end/change] our working arrangement. I want to explain my thinking and discuss how to transition smoothly."
+
+**For Facilitation Between Others:**
+> "I've noticed some friction around [topic] and I want to help us work through it. I'm not taking sides — I just want to understand both perspectives and find a path forward."
+
+---
+
+## Handling Defensive Responses
+
+### When They Deny
+
+**They say:** "That didn't happen" / "That's not what I said"
+
+**Options:**
+- "Help me understand how you remember it. Let's compare notes."
+- "I have a different recollection. Can we look at [evidence]?"
+- "Whether or not we agree on what happened, can we agree on [impact]?"
+- "I want to make sure we're talking about the same situation. [Describe specific instance]. Does that match your recollection?"
+
+### When They Deflect
+
+**They say:** "What about when YOU did X?" / "Everyone does that" / "I've been so busy with..."
+
+**Options:**
+- "I'm happy to discuss that separately. Right now, I'd like to focus on..."
+- "You may be right about that. And, I still need to address this."
+- "I appreciate you raising that. Can we first finish discussing this?"
+- "Right now I want to focus on our conversation. If there are other issues to address, we can talk about those separately."
+
+### When They Attack
+
+**They say:** "You're being unfair" / "You're overreacting" / "You don't understand the whole situation!"
+
+**Options:**
+- "I understand you see it that way. Can you help me understand your perspective?"
+- "I hear that you feel this is unfair. Tell me what fairness looks like to you."
+- "I want to understand. Can you walk me through what happened from your perspective? I might be missing context."
+- "I may be wrong. Help me see what I'm missing."
+
+### When They Shut Down
+
+**Signs:** Silence, one-word answers, crossed arms, looking away
+
+**Options:**
+- "I notice you've gone quiet. What's coming up for you?"
+- "It seems like something I said landed wrong. Can we talk about it?"
+- "I want to make sure this feels like a conversation, not a verdict. What's your reaction?"
+- "Would it help to take a break and come back to this?"
+
+### When They Get Emotional
+
+**Signs:** Tears, raised voice, visible distress
+
+**Options:**
+- "I can see this is hitting hard. Would you like to take a few minutes? We can continue when you're ready, or pick this up another time."
+- "Your reaction tells me this matters to you. Help me understand."
+- "I didn't mean to upset you. What did I say that hit hard?"
+
+### When They Counter-Attack
+
+**They say:** "Well, what about when YOU [did something]?"
+
+**You respond:**
+> "That's a fair point, and I'm open to feedback. Let's finish this conversation first, and then I'd like to hear your concerns about me."
+
+### When They Agree Without Change
+
+**They say:** "You're right, I'll do better."
+
+**You respond:**
+> "I appreciate that. Let's get specific about what 'doing better' looks like so we're both clear. What will you do differently, and how can I support you?"
+
+---
+
+## Reframing Techniques
+
+### From Blame to Curiosity
+
+**Instead of:** "Why did you miss the deadline?"
+**Try:** "Help me understand what happened with the timeline."
+
+**Instead of:** "You should have told me."
+**Try:** "I would have liked to know earlier. What prevented that?"
+
+### From Character to Behavior
+
+**Instead of:** "You're unreliable."
+**Try:** "When deadlines are missed, it creates problems for the team."
+
+**Instead of:** "You're being difficult."
+**Try:** "I'm having trouble understanding your position. Can you explain?"
+
+### From Past to Future
+
+**Instead of:** "You always do this."
+**Try:** "Going forward, how can we handle this differently?"
+
+**Instead of:** "That presentation was a disaster."
+**Try:** "For the next presentation, let's talk about what would make it stronger."
+
+---
+
+## Phrases for Key Moments
+
+### Staying Curious
+- "Help me understand..."
+- "What was your thinking when..."
+- "What were you trying to accomplish?"
+- "What would you do differently now?"
+- "What would help you in this situation?"
+
+### Acknowledging Without Agreeing
+- "I hear that you see it differently..."
+- "I understand that's your experience..."
+- "That makes sense from your perspective..."
+- "I can see why you'd feel that way given..."
+- "Thank you for helping me understand your view..."
+
+### Redirecting to Facts
+- "Let's go back to what actually happened..."
+- "Can we focus on the specific incident?"
+- "Setting aside intentions, what was the impact?"
+- "What would a video camera have shown?"
+
+### Buying Time (When You Need a Moment)
+- "That's an interesting point. Let me think about that."
+- "I want to respond thoughtfully. Give me a moment."
+- "I hadn't considered that angle. Let me process it."
+- "Can you say more about that? I want to make sure I understand."
+
+### Redirecting to Solutions
+- "Now that we understand the situation, what can we do going forward?"
+- "What would help prevent this in the future?"
+- "What do you need from me to succeed?"
+- "How can we work together on this?"
+
+### Setting Clear Expectations
+- "Going forward, I need..."
+- "What I'm asking for is..."
+- "Here's what success looks like..."
+- "The expectation is..."
+- "By [date], I'd like to see..."
+
+### Closing Constructively
+- "I appreciate you having this conversation with me."
+- "I know this wasn't easy to hear. Thanks for being open."
+- "Let's check in on [date] to see how things are going."
+- "My door is open if you want to talk more."
+
+---
+
+## Closing Scripts
+
+### Summarizing Agreement
+> "Let me make sure I understand what we've agreed:
+> [Person A] will [action] by [date].
+> [Person B] will [action] by [date].
+> We'll check in on [date].
+> Does that match your understanding?"
+
+### Acknowledging Progress
+> "I appreciate you having this conversation with me. Even if we don't have everything solved, I feel like we understand each other better."
+
+### Setting Follow-Up
+> "I'd like to check in on this in [timeframe]. Does [date/time] work for you?"
+
+### Leaving Door Open
+> "If anything comes up before our check-in, please let me know. I want to address things as they arise."
+
+### When Resolution Isn't Reached
+> "We don't seem to be reaching agreement today. I'd like to [sleep on it / get more information / involve X]. Can we reconvene [when]?"
+
+---
+
+## Scenario Playbooks
+
+### Playbook: Addressing Chronic Lateness
+
+**Opening:**
+> "The standup has started at 9:00 AM, and I've noticed you've joined around 9:15 the last several days. I want to understand what's happening."
+
+**If they're defensive:**
+> "I'm not trying to call you out. I just want to make sure there isn't something I'm missing — is there a scheduling conflict or something else going on?"
+
+**If there's a reason:**
+> "Thanks for sharing that. Let's figure out a solution that works — could we adjust the meeting time, or is there something else that would help?"
+
+**If there's no clear reason:**
+> "I need everyone present at the start for standups to be effective. Can you commit to being there at 9:00 going forward? What would help you make that happen?"
+
+**Follow-up:**
+> "Let's check in next week. If you're on time consistently, we'll consider this resolved."
+
+### Playbook: Quality Issues in Code
+
+**Opening:**
+> "I've noticed the last few PRs have had more issues than usual — bugs making it to QA, tests failing, that kind of thing. I want to understand what's going on."
+
+**Explore:**
+- "Are the requirements clear?"
+- "Is the timeline realistic?"
+- "Is there something about our testing setup that's making it harder?"
+- "Is there other work pulling your attention?"
+
+**If it's workload:**
+> "It sounds like you're stretched too thin. Let's look at your commitments and see what we can adjust."
+
+**If it's skill:**
+> "Would it help to pair with someone on the next feature? Or would a refresher on our testing practices be useful?"
+
+**Setting expectation:**
+> "Here's what I need going forward: PRs should have tests, and they should pass CI before review. What would help you meet that bar consistently?"
+
+### Playbook: Conflict Between Team Members
+
+**Meeting with each person separately first:**
+> "I've noticed some tension between you and [colleague] around [topic]. Before we all meet, I want to understand your perspective. What's been happening from your point of view?"
+
+**Facilitating together:**
+> "Thank you both for being here. My goal is to help us find a path forward that works for everyone. I'm not here to judge who's right — I want to understand both perspectives and figure out how we move forward as a team."
+
+**Finding common ground:**
+> "It sounds like you both want [shared goal]. Where you differ is [specific disagreement]. Is that right?"
+
+**Resolution:**
+> "What would a good solution look like for each of you?"
+
+### Playbook: Performance Conversation
+
+**Opening:**
+> "I'd like to discuss your work on [project/area]. I've noticed [specific observation] and want to understand what's happening so we can address it together."
+
+**Sharing impact:**
+> "When [behavior], the impact is [effect]. For example, [specific instance]."
+
+**Inviting response:**
+> "I'd like to hear your perspective. What's been going on from your end?"
+
+**Moving to solution:**
+> "Given what we've discussed, what do you think would help improve this?"
+
+**Setting expectation:**
+> "Going forward, I need to see [specific behavior]. Let's check in [when] to see how it's going."
+
+### Playbook: Upward Feedback (to Manager)
+
+**Opening:**
+> "I have some feedback I'd like to share. I'm hoping this can help us work together even better."
+
+**The feedback:**
+> "When [situation], I find myself [feeling/struggling with] because [impact on my effectiveness]. I'm wondering if [alternative approach] might work better."
+
+**Inviting response:**
+> "I may be missing context. Help me understand your perspective."
+
+### Playbook: Pushing Back on Leadership / Unrealistic Deadlines
+
+**Opening:**
+> "I want to make sure we deliver something great, and I have some concerns about the current timeline/scope I'd like to discuss."
+
+**Present data:**
+> "Based on similar past work and our current capacity, here's what I'm seeing: [data]. The timeline we have assumes [optimistic conditions]."
+
+**Offer trade-offs:**
+> "We can hit the date if we [reduce scope / add resources / accept these risks]. Or we can deliver the full scope by [realistic date]. What's most important here?"
+
+**If pushed back:**
+> "I understand there's pressure on this. I want to set us up for success, which means being honest about what's realistic. If we commit to this timeline, here are the risks we're accepting..."
+
+### Playbook: Receiving Difficult Feedback
+
+**Acknowledging:**
+> "Thank you for telling me. I appreciate you being direct."
+
+**Clarifying:**
+> "Can you give me a specific example so I can understand better?"
+
+**Processing:**
+> "I need some time to think about this. Can I follow up with you [when]?"
+
+**Committing:**
+> "I hear what you're saying about [summary]. I'm going to work on [specific action]."
+
+### Playbook: Escalating an Issue
+
+**Opening:**
+> "I need to bring something to your attention. I've tried to address it directly, and I'm now at a point where I need your help."
+
+**Background:**
+> "Here's what's happened: [facts]. Here's what I've tried: [attempts]. Here's where we are: [current state]."
+
+**Ask:**
+> "I'm hoping you can [specific help needed]. What do you think the right next step is?"
+
+---
+
+## Words to Avoid
+
+| Avoid | Use Instead |
+| --- | --- |
+| "You always..." | "I've noticed a pattern where..." |
+| "You never..." | "In recent instances..." |
+| "You should have..." | "Going forward, I'd like..." |
+| "That's wrong" | "I see it differently" |
+| "Obviously..." | "From my perspective..." |
+| "To be honest..." | (Just be honest without the disclaimer) |
+| "Don't take this personally" | (Just don't make it personal) |
+| "No offense, but..." | (Just don't be offensive) |
+
+---
+
+**Related:** See `feedback-sbi-model.md` for structuring SBI feedback. See `conversation-framework.md` for the full three-phase preparation workflow.
diff --git a/skills/workplace-conversations/references/emotional-regulation.md b/skills/workplace-conversations/references/emotional-regulation.md
new file mode 100644
index 0000000..e4373df
--- /dev/null
+++ b/skills/workplace-conversations/references/emotional-regulation.md
@@ -0,0 +1,378 @@
+# Emotional Regulation for Difficult Conversations
+
+Techniques for managing your own emotions before, during, and after challenging workplace conversations.
+
+## Understanding Your Emotional Landscape
+
+### Why Emotions Run High
+
+Difficult conversations threaten core needs:
+
+- **Autonomy:** Someone is telling us what to do
+- **Competence:** Our abilities are being questioned
+- **Relatedness:** The relationship is at risk
+- **Fairness:** We feel treated unjustly
+- **Identity:** Our sense of self is challenged
+
+Understanding which need is threatened helps you address the root cause.
+
+### The Emotional Hijack
+
+When emotions take over:
+
+1. Amygdala detects threat
+2. Stress hormones flood system
+3. Prefrontal cortex (rational thinking) goes offline
+4. We react (fight, flight, freeze) instead of respond
+
+**Time to recover:** 20-30 minutes once fully triggered
+
+**Implication:** It's better to prevent hijack than to recover from one.
+
+### Know Your Triggers
+
+Common workplace triggers:
+
+| Trigger | Example | Underlying Need |
+| ------- | ------- | --------------- |
+| Criticism | "Your code has issues" | Competence |
+| Being ignored | Not invited to meeting | Relatedness |
+| Unfairness | Colleague gets credit | Fairness |
+| Micromanagement | Being told how to do task | Autonomy |
+| Public embarrassment | Called out in meeting | Identity |
+
+**Personal Trigger Inventory:**
+
+Complete this for yourself:
+
+- When [trigger], I feel [emotion]
+- My body reacts by [physical symptom]
+- I tend to [behavioral response]
+- What I actually need is [underlying need]
+
+## Pre-Conversation Regulation
+
+### The Night Before
+
+**If you're anxious:**
+
+1. Write down worst-case scenario
+2. Write down how you'd cope with it
+3. Write down most likely outcome
+4. Write down best-case scenario
+5. Notice: You'd survive any of these
+
+**If you're angry:**
+
+1. Write an angry letter (don't send it)
+2. Do physical activity (walk, gym)
+3. Talk to someone uninvolved
+4. Sleep on it
+5. Ask: Is this about them, or me?
+
+### The Day Of
+
+**Physical Preparation:**
+
+- Get enough sleep the night before
+- Eat something (low blood sugar = low patience)
+- Avoid excess caffeine (amplifies anxiety)
+- Exercise in the morning if possible
+- Wear something that makes you feel confident
+
+**Mental Preparation:**
+
+- Review your preparation notes
+- Practice your opening out loud
+- Visualize a successful conversation
+- Set your intention (curious, calm, direct)
+- Accept that discomfort is part of the process
+
+### Immediately Before
+
+**5-Minute Pre-Conversation Ritual:**
+
+1. **Ground** (1 min): Feel your feet on floor. Notice 5 things you can see.
+
+2. **Breathe** (2 min):
+   - Breathe in for 4 counts
+   - Hold for 4 counts
+   - Breathe out for 6 counts
+   - Repeat 4 times
+
+3. **Center** (1 min):
+   - Put hand on heart
+   - Say: "I can do this"
+   - Remember your positive intention
+
+4. **Release** (1 min):
+   - Let go of attachment to specific outcome
+   - Accept you can only control yourself
+   - Commit to staying curious
+
+## During-Conversation Regulation
+
+### Early Warning Signs
+
+Notice when you're starting to get triggered:
+
+**Physical:**
+
+- Heart rate increasing
+- Breathing getting shallow
+- Face flushing
+- Muscles tensing
+- Voice changing pitch
+
+**Mental:**
+
+- Thoughts racing
+- Catastrophizing
+- Rehearsing rebuttals instead of listening
+- Losing track of what they're saying
+- Feeling "attacked"
+
+### In-the-Moment Techniques
+
+**The Pause:**
+> "Let me think about that for a moment."
+
+Use this to:
+
+- Take a breath
+- Notice your emotional state
+- Choose your response instead of reacting
+
+**The Slow-Down:**
+
+- Speak more slowly than feels natural
+- Take a breath before responding
+- Ask them to repeat or clarify
+- Write something down
+
+**The Body Reset:**
+
+- Unclench your jaw
+- Drop your shoulders
+- Unfold your arms
+- Plant feet firmly on floor
+- Lean back slightly
+
+**The Perspective Shift:**
+Ask yourself:
+
+- What might be their positive intent?
+- What am I missing?
+- How would a neutral observer see this?
+- Will this matter in a year?
+
+### Phrases That Buy Time
+
+When you need to regulate:
+
+- "That's interesting. Tell me more about that."
+- "I want to respond to that thoughtfully. Give me a moment."
+- "I'm noticing a strong reaction in myself. Bear with me."
+- "Can we slow down? I want to make sure I understand."
+
+### When You Need to Stop
+
+It's okay to pause the conversation if:
+
+- You're too triggered to think clearly
+- The conversation has become unproductive
+- You need more information
+- Emotions are escalating for either party
+
+**How to exit gracefully:**
+> "I want to give this the attention it deserves. I'm finding it hard to think clearly right now. Can we take a break and continue [tomorrow/in an hour]?"
+
+## Post-Conversation Processing
+
+### Immediate Aftermath
+
+**Physical release:**
+
+- Take a walk
+- Stretch
+- Drink water
+- Breathe deeply
+
+**Mental processing:**
+
+- What went well?
+- What was hard?
+- What would I do differently?
+- What am I feeling now?
+
+### The Post-Conversation Debrief
+
+Write or talk through:
+
+1. **What happened:**
+   - What was said?
+   - What did I notice in their behavior?
+   - What did I notice in my own reactions?
+
+2. **What I learned:**
+   - About the situation
+   - About them
+   - About myself
+
+3. **What's next:**
+   - What actions need to happen?
+   - What follow-up is needed?
+   - What do I need to let go of?
+
+### Processing Lingering Emotions
+
+**If you're still upset:**
+
+- Write about it (journal, notes)
+- Talk to someone supportive (not to vent endlessly)
+- Exercise or move your body
+- Give yourself 24 hours before taking action
+
+**If you said something you regret:**
+
+- Acknowledge it as soon as possible
+- Apologize specifically ("I'm sorry I said X")
+- Don't over-explain or make excuses
+- Move forward without over-dwelling
+
+**If the conversation went poorly:**
+
+- Separate outcome from effort
+- Consider: What was in my control?
+- Ask: What can I learn?
+- Plan: What's the next step?
+
+## Building Emotional Resilience
+
+### Daily Practices
+
+**Mindfulness (5-10 min/day):**
+
+- Builds awareness of emotional states
+- Increases gap between trigger and reaction
+- Improves focus and presence
+- Reduces baseline stress
+
+**Physical exercise:**
+
+- Releases stress hormones
+- Improves emotional regulation
+- Increases frustration tolerance
+- Enhances sleep quality
+
+**Adequate sleep:**
+
+- Emotional regulation requires a rested brain
+- Sleep-deprived = more reactive
+- 7-9 hours for most adults
+
+### Cognitive Practices
+
+**Reframing:**
+Change how you interpret situations.
+
+| Triggering Thought | Reframe |
+| ------------------ | ------- |
+| "They're attacking me" | "They have feedback for me" |
+| "This is unfair" | "I don't have all the information" |
+| "I can't handle this" | "This is uncomfortable and temporary" |
+| "They're wrong" | "They see it differently" |
+
+**Self-compassion:**
+Speak to yourself as you would a friend.
+
+- "This is hard. It's okay to struggle."
+- "Everyone has difficult conversations."
+- "I'm doing my best with what I have."
+- "I can learn from this."
+
+### Long-Term Development
+
+**Therapy/Coaching:**
+
+- Work through patterns and triggers
+- Develop personalized strategies
+- Process past experiences
+- Build emotional intelligence
+
+**Feedback seeking:**
+
+- Ask trusted colleagues how you handle conflict
+- Get real-time feedback when possible
+- Notice patterns in how you react
+
+**Reflection practice:**
+
+- Regular review of challenging interactions
+- Identify growth over time
+- Celebrate improvements
+
+## Special Situations
+
+### When You're the One Being Criticized
+
+**In the moment:**
+
+1. Breathe. Don't react immediately.
+2. Say: "Thank you for telling me."
+3. Ask: "Can you give me a specific example?"
+4. Listen. Really listen.
+5. Say: "I need time to think about this."
+
+**After:**
+
+1. Process the emotion (usually hurt, shame, anger)
+2. Separate useful feedback from delivery
+3. Identify what's true, even if uncomfortable
+4. Decide what to do with it
+5. Follow up with them
+
+### When You Made a Mistake
+
+**Acknowledge:**
+
+- "I made a mistake."
+- "That was my error."
+- "I should have [done X instead]."
+
+**Don't:**
+
+- Over-apologize
+- Make excuses
+- Blame others
+- Minimize
+
+**Do:**
+
+- Take responsibility
+- Explain what you'll do differently
+- Ask what they need from you
+- Move forward
+
+### When You Need to Hold Firm
+
+Sometimes you need to deliver a message that won't be well-received.
+
+**Regulate yourself:**
+
+- Accept discomfort is part of the job
+- You're not responsible for their reaction
+- Clarity is kindness in the long run
+
+**Deliver clearly:**
+
+- Be direct without being cruel
+- State facts without excessive softening
+- Allow them to react
+- Hold the boundary
+
+**After:**
+
+- Don't apologize for the message (you can apologize for pain)
+- Stay available for questions
+- Follow up as appropriate
diff --git a/skills/workplace-conversations/references/expectation-alignment.md b/skills/workplace-conversations/references/expectation-alignment.md
new file mode 100644
index 0000000..9e1ac53
--- /dev/null
+++ b/skills/workplace-conversations/references/expectation-alignment.md
@@ -0,0 +1,271 @@
+# Expectation Alignment Guide
+
+Frameworks for setting, communicating, and managing expectations with teammates, stakeholders, and leadership.
+
+## Why Expectations Misalign
+
+Common causes of expectation misalignment in software teams:
+
+| Cause | Example |
+| --- | --- |
+| **Implicit assumptions** | "I assumed we'd use the existing API" vs "I thought we'd build a new one" |
+| **Different definitions** | "Done" means code complete to you, but deployed to them |
+| **Information gaps** | You know about a dependency; they don't |
+| **Timeline optimism** | Estimates given under pressure without accounting for reality |
+| **Scope creep** | Small additions pile up into major differences |
+| **Communication gaps** | Updates not shared, changes not communicated |
+
+## The Expectation Alignment Framework
+
+### 1. Define Success Explicitly
+
+Never assume alignment. Make success criteria explicit:
+
+| Vague | Explicit |
+| --- | --- |
+| "Make it fast" | "Page load under 2 seconds on 3G" |
+| "High quality" | "Zero P1 bugs in first month; 80% test coverage" |
+| "Soon" | "By end of sprint (Friday 5pm)" |
+| "User-friendly" | "New users can complete checkout in under 3 minutes" |
+
+### 2. Expose Assumptions
+
+Before starting work, surface hidden assumptions:
+
+**Questions to ask:**
+
+- "What are you assuming about [timeline/resources/scope]?"
+- "What would make this harder than expected?"
+- "What's the worst case scenario?"
+- "What dependencies are we assuming will be available?"
+- "What's the definition of 'done' for this?"
+
+### 3. Document and Share
+
+Expectations that aren't written down don't exist:
+
+- Put acceptance criteria in tickets
+- Send follow-up emails after verbal agreements
+- Update documentation when things change
+- Share publicly (not just DMs)
+
+### 4. Check Alignment Regularly
+
+Don't wait until delivery to discover misalignment:
+
+- Daily standups surface blockers early
+- Mid-sprint check-ins catch scope creep
+- Demo progress before "done" to validate direction
+- Regular stakeholder updates prevent surprises
+
+## Scenario: Unrealistic Deadlines
+
+### Recognition Signs
+
+- Timeline set without engineering input
+- Estimates significantly lower than past similar work
+- No buffer for unknowns
+- Dependencies assumed to be risk-free
+- "We just need to..." (minimizing language)
+
+### Response Framework
+
+**1. Acknowledge the goal:**
+> "I understand hitting this date is important for [business reason]."
+
+**2. Present data:**
+> "Based on similar past work, this typically takes [X]. Here's the breakdown..."
+
+**3. Offer options:**
+> "We have a few paths:"
+>
+> - "Hit the date by reducing scope to [core features]"
+> - "Deliver full scope by [realistic date]"
+> - "Add resources, but onboarding will take [time]"
+
+**4. Get explicit agreement:**
+> "Which approach would you like? Can I document this so we're aligned?"
+
+### Sample Conversation
+
+> **Stakeholder:** "We need the new dashboard by end of month."
+>
+> **You:** "I want to make sure we deliver something great. Let me share what I'm seeing: the full dashboard - charts, filters, export - typically takes 6-8 weeks. We have 4 weeks."
+>
+> **Stakeholder:** "We really need it by end of month."
+>
+> **You:** "I hear you. Here's what we could do: ship the core charts by end of month, then add filters and export in the following two weeks. That gets value to users faster. Would that work?"
+>
+> **Stakeholder:** "What about just working faster?"
+>
+> **You:** "I wish that worked, but rushing usually creates bugs that cost more time later. The scope-first approach gets you something solid you can show stakeholders. What matters most - the date or the full feature set?"
+
+## Scenario: Unclear Role Boundaries
+
+### Recognition Signs
+
+- Tasks falling through cracks
+- Duplicated work
+- "I thought you were handling that"
+- Confusion about who decides what
+- Finger-pointing when things go wrong
+
+### Response Framework
+
+**1. Identify the gap:**
+> "I've noticed some confusion about who owns [task/decision]. Can we clarify?"
+
+**2. Propose explicit ownership:**
+> "My understanding is that [Person A] owns [X] and [Person B] owns [Y]. Does that match your understanding?"
+
+**3. Define handoff points:**
+> "The handoff happens when [specific condition]. At that point, [Person B] takes over."
+
+**4. Document and share:**
+> "Let me send a quick summary so everyone's aligned."
+
+### RACI for Common Confusion Points
+
+| Task | Responsible | Accountable | Consulted | Informed |
+| --- | --- | --- | --- | --- |
+| Code complete | Developer | Developer | Reviewer | Team |
+| PR merged | Reviewer | Developer | - | Team |
+| Deployed to staging | DevOps/Dev | Team Lead | QA | Stakeholders |
+| Bug triage | Tech Lead | Product | Development | Stakeholders |
+| Scope changes | Product | Product | Tech Lead | Development |
+
+## Scenario: Scope Creep
+
+### Recognition Signs
+
+- "Can we just add..."
+- "One more thing..."
+- Requirements changing mid-sprint
+- Original estimate no longer realistic
+- Features growing beyond initial spec
+
+### Response Framework
+
+**1. Acknowledge the request:**
+> "That's a good idea. Let me think through the impact."
+
+**2. Quantify the change:**
+> "Adding [feature] would take approximately [time]. That would push our delivery from [date] to [new date]."
+
+**3. Present trade-offs:**
+> "We could add this by dropping [other feature] or extending the timeline. Which would you prefer?"
+
+**4. Get explicit approval:**
+> "Before I add this to the scope, can you confirm [the trade-off]?"
+
+### The Scope Change Template
+
+When scope changes are requested, respond with:
+
+```markdown
+**Change Request:** [What's being asked]
+
+**Impact:**
+- Timeline: [Current] → [New]
+- Resources: [Any changes]
+- Risk: [New risks introduced]
+
+**Options:**
+1. Add scope, extend timeline to [date]
+2. Add scope, drop [other feature]
+3. Defer to next phase
+
+**Recommendation:** [Your suggestion]
+
+**Need decision by:** [Date]
+```
+
+## Setting Expectations Proactively
+
+### At Project Start
+
+- Define success criteria explicitly
+- Surface assumptions and dependencies
+- Establish communication cadence
+- Agree on decision-making process
+- Document in shared location
+
+### During Work
+
+- Regular status updates (before asked)
+- Early warning on blockers
+- Proactive communication of changes
+- Demo progress incrementally
+- Update documentation as things change
+
+### At Delivery
+
+- Confirm acceptance criteria met
+- Document what was delivered vs planned
+- Capture lessons learned
+- Set expectations for next phase
+
+## Phrases for Expectation Conversations
+
+### Setting Expectations
+
+- "Let me make sure we're aligned on..."
+- "Here's what I'm committing to..."
+- "My understanding is that success looks like..."
+- "Just to be explicit about scope..."
+- "The definition of 'done' for this is..."
+
+### Surfacing Misalignment
+
+- "I want to flag a concern about..."
+- "I think we might have different assumptions about..."
+- "Can we clarify what we mean by...?"
+- "I'm seeing a gap between [X] and [Y]..."
+- "Before we go further, I want to make sure..."
+
+### Resetting Expectations
+
+- "Based on what we're learning, we need to adjust..."
+- "The original estimate didn't account for [X]..."
+- "Here's what's realistic given [constraints]..."
+- "I need to revise what I committed to because..."
+- "Let me give you an updated picture..."
+
+### Getting Commitment
+
+- "Can I confirm that we're agreeing to...?"
+- "Let me send a summary so we're all aligned."
+- "Is everyone okay with this plan?"
+- "Before we proceed, I need explicit approval for..."
+- "Can you confirm in writing?"
+
+## Expectation Alignment Checklist
+
+### Before Starting Work
+
+- [ ] Success criteria are explicit and measurable
+- [ ] Timeline includes buffer for unknowns
+- [ ] Dependencies are identified and owners confirmed
+- [ ] Assumptions are documented
+- [ ] Communication plan is agreed
+- [ ] Decision-making process is clear
+- [ ] RACI or ownership is explicit
+
+### During Work
+
+- [ ] Regular updates before being asked
+- [ ] Blockers raised immediately
+- [ ] Changes communicated proactively
+- [ ] Progress demonstrated incrementally
+- [ ] Documentation updated as things change
+
+### At Delivery
+
+- [ ] Acceptance criteria reviewed before "done"
+- [ ] Stakeholder sign-off obtained
+- [ ] Gaps between planned and delivered documented
+- [ ] Next steps and expectations clear
+
+---
+
+**Related:** Return to `workplace-conversations` skill for feedback delivery and live conversation navigation frameworks.
diff --git a/skills/workplace-conversations/references/feedback-sbi-model.md b/skills/workplace-conversations/references/feedback-sbi-model.md
new file mode 100644
index 0000000..9261dfe
--- /dev/null
+++ b/skills/workplace-conversations/references/feedback-sbi-model.md
@@ -0,0 +1,206 @@
+# SBI Feedback Model - Complete Guide
+
+The Situation-Behavior-Impact (SBI) model, developed by the Center for Creative Leadership, provides a structured approach to giving clear, objective feedback that minimizes defensiveness and promotes improvement.
+
+## The SBI Formula
+
+```text
+Situation + Behavior + Impact = Effective Feedback
+```
+
+| Component | What It Is | What It's NOT |
+| --- | --- | --- |
+| **Situation** | When and where the behavior occurred | Vague timing or generalizations |
+| **Behavior** | Observable actions the person took | Assumptions about motives or character |
+| **Impact** | Effect on you, the team, or the project | Judgment or evaluation |
+
+## Why SBI Works
+
+1. **Specificity** - Concrete examples are harder to dismiss than vague complaints
+2. **Objectivity** - Observable behaviors can't be argued; interpretations can
+3. **Actionability** - Clear situations and behaviors can be repeated or changed
+4. **Non-threatening** - Impact focuses on effects, not character attacks
+
+## SBI Examples: Positive Feedback
+
+### Code Review Excellence
+
+> **Situation:** "In the code review for the payment processing module last Tuesday..."
+>
+> **Behavior:** "...you not only caught the edge case that would have caused data loss, but you also suggested a more elegant solution using the repository pattern and included tests that demonstrated the issue..."
+>
+> **Impact:** "...which prevented a potentially costly bug in production and taught our junior developers about defensive coding. The whole team learned from your thoroughness."
+
+### Meeting Contribution
+
+> **Situation:** "During our architecture decision meeting this morning..."
+>
+> **Behavior:** "...you asked three questions that reframed the problem and suggested we document our trade-offs in an ADR..."
+>
+> **Impact:** "...which helped us reach consensus in half the usual time and gave us documentation we can reference later."
+
+### Mentorship
+
+> **Situation:** "Over the past month while onboarding Maya..."
+>
+> **Behavior:** "...you've been setting up daily 15-minute check-ins, creating a shared doc of resources, and giving her progressively more complex tasks..."
+>
+> **Impact:** "...which has accelerated her ramp-up significantly. She submitted her first independent PR two weeks ahead of our typical onboarding timeline."
+
+### Cross-Team Collaboration
+
+> **Situation:** "When the DevOps team asked for help migrating the deployment pipeline last week..."
+>
+> **Behavior:** "...you volunteered to pair with their engineer, documented the process as you went, and stayed late to ensure the cutover succeeded..."
+>
+> **Impact:** "...which built goodwill between our teams and created documentation that will help us next time."
+
+## SBI Examples: Constructive Feedback
+
+### Missed Deadlines
+
+> **Situation:** "This sprint, the API endpoint work was due on Wednesday..."
+>
+> **Behavior:** "...it was delivered on Friday without any communication about the delay..."
+>
+> **Impact:** "...which blocked the frontend team for two days, caused them to miss their milestone, and created tension between our teams."
+
+**Follow-up prompt:** "What happened? Is there something blocking you that I should know about?"
+
+### Code Quality Issues
+
+> **Situation:** "In the last three PRs you submitted..."
+>
+> **Behavior:** "...there were no unit tests and several of the changes broke existing tests that weren't updated..."
+>
+> **Impact:** "...which caused the CI pipeline to fail repeatedly, blocked other developers' merges, and required the tech lead to spend time debugging."
+
+**Follow-up prompt:** "I want to understand what's making testing difficult. Is there something about our test setup or your workload that's contributing to this?"
+
+### Meeting Behavior
+
+> **Situation:** "In yesterday's standup..."
+>
+> **Behavior:** "...when Alex was describing their blocker, you interrupted three times to offer solutions before they finished explaining..."
+>
+> **Impact:** "...which made Alex visibly frustrated and may have prevented them from sharing the full context of the problem."
+
+**Follow-up prompt:** "I know you're eager to help, and that's great. Can we try letting people finish before jumping in?"
+
+### Communication Gaps
+
+> **Situation:** "When the production issue happened on Monday..."
+>
+> **Behavior:** "...you started investigating immediately but didn't post updates in the incident channel for 45 minutes..."
+>
+> **Impact:** "...which left leadership and customer support without information to share with affected clients, and caused multiple people to ask for status updates, further distracting you."
+
+**Follow-up prompt:** "During incidents, even brief updates help. What would make it easier to communicate while you're troubleshooting?"
+
+### Lack of Engagement
+
+> **Situation:** "In our last three sprint planning sessions..."
+>
+> **Behavior:** "...you haven't provided estimates or raised concerns about any of the stories..."
+>
+> **Impact:** "...which means we're missing your technical perspective, and some stories have turned out to be much larger than expected."
+
+**Follow-up prompt:** "Your input is valuable. Is there something about our planning process that's making it hard to participate?"
+
+## Common Mistakes to Avoid
+
+### Mistake 1: Vague Situation
+
+❌ "Recently..."
+❌ "In general..."
+❌ "Sometimes..."
+
+✅ "In yesterday's standup..."
+✅ "During the deployment on March 15th..."
+✅ "In the code review for PR #1234..."
+
+### Mistake 2: Interpreting Instead of Observing
+
+❌ "You don't care about quality" (interpretation)
+✅ "The PR had 12 bugs caught in QA" (observable)
+
+❌ "You're not engaged" (interpretation)
+✅ "You were on your phone during the demo" (observable)
+
+❌ "You're being defensive" (interpretation)
+✅ "You raised your voice and said 'that's not my fault'" (observable)
+
+### Mistake 3: Generalizing Behavior
+
+❌ "You always miss deadlines" (generalization)
+✅ "The last three deliverables were past their due date" (specific)
+
+❌ "You never test your code" (generalization)
+✅ "The last two PRs had no unit tests" (specific)
+
+### Mistake 4: Making Impact Personal
+
+❌ "...which made me angry" (personal feeling as impact)
+✅ "...which caused the release to be delayed" (business impact)
+
+Personal feelings can be part of impact, but should be secondary to business/team impact:
+"...which delayed the release and, honestly, created frustration for the team."
+
+### Mistake 5: Forgetting the "So What"
+
+Feedback needs clear impact to be meaningful. Without it:
+
+❌ "In the meeting, you interrupted Alex three times." (So what?)
+
+✅ "In the meeting, you interrupted Alex three times, which prevented them from fully explaining the problem and may have led to us missing important context."
+
+## Extended SBI: Adding Intent and Next Steps
+
+Some practitioners extend SBI to **SBI-I** (adding Intent) or **SBI-NS** (adding Next Steps):
+
+### SBI-I: Checking Intent
+
+After delivering SBI feedback, ask about intent:
+
+> **SBI:** "When you pushed to main without a PR yesterday, it broke the build and blocked five developers for two hours."
+>
+> **I (Intent):** "I'm sure you didn't mean to cause a disruption. Can you help me understand what happened?"
+
+### SBI-NS: Proposing Next Steps
+
+End with a collaborative discussion of what to do differently:
+
+> **SBI:** "When you gave estimates without checking with the backend team, the sprint became overcommitted and we had to drop two stories."
+>
+> **NS:** "Going forward, could we establish a quick sync with backend before finalizing sprint commitments? What would make that work for you?"
+
+## When to Use SBI
+
+| Scenario | Use SBI? | Notes |
+| --- | --- | --- |
+| Formal performance feedback | ✅ Yes | SBI provides documentation |
+| Quick in-the-moment feedback | ✅ Yes | Keep it brief but structured |
+| Positive recognition | ✅ Yes | Makes praise specific and meaningful |
+| Annual reviews | ✅ Yes | Have multiple examples ready |
+| Giving yourself feedback | ✅ Yes | Use for self-reflection |
+| Venting frustration | ❌ No | Get calm first, then use SBI |
+
+## Practice Exercise
+
+Take a recent situation where you wanted to give feedback. Fill in:
+
+**Situation:** ________________________________________________
+
+**Behavior:** ________________________________________________
+
+**Impact:** ________________________________________________
+
+Now check:
+
+- [ ] Is the situation specific (time/place)?
+- [ ] Is the behavior observable (not an interpretation)?
+- [ ] Is the impact clear (business/team effect)?
+
+---
+
+**Related:** Return to `workplace-conversations` skill for the full framework, or see `delivery-scripts.md` for scripted openers and scenario playbooks.
diff --git a/skills/workplace-conversations/references/preparation-template.md b/skills/workplace-conversations/references/preparation-template.md
new file mode 100644
index 0000000..fe2c950
--- /dev/null
+++ b/skills/workplace-conversations/references/preparation-template.md
@@ -0,0 +1,370 @@
+# Difficult Conversation Preparation Template
+
+Use this worksheet to prepare before any challenging conversation.
+
+## The Quick Prep (10 minutes)
+
+Use this when you don't have much time but need to prepare.
+
+```markdown
+## Quick Conversation Prep
+
+**Topic:** [What is this conversation about?]
+
+**Date/Time Planned:** [When will you have it?]
+
+**With:** [Who?]
+
+---
+
+### The Situation (Facts Only)
+[What happened? Observable behavior, no judgment]
+
+### The Impact
+[How did this affect you, the team, or the work?]
+
+### My Goal
+[What specific outcome do I want?]
+
+### Their Perspective
+[How might they see this differently?]
+
+### My Opening Line
+[Write the first thing you'll say]
+```
+
+## The Full Prep (30 minutes)
+
+Use this for high-stakes conversations or when you need to process your emotions.
+
+```markdown
+## Conversation Preparation Worksheet
+
+### Conversation Details
+
+**Topic:** [One sentence description]
+**With:** [Name/Role]
+**Planned Date/Time:** [When]
+**Location:** [Where]
+**Time Allocated:** [How long]
+
+---
+
+## Part 1: Understanding the Situation
+
+### What Happened (Facts)
+
+Describe only what a video camera would have recorded:
+-
+-
+-
+
+### The Story I'm Telling Myself
+
+What assumptions or interpretations have I added?
+-
+-
+-
+
+### My Emotional State
+
+What am I feeling about this? (Check all that apply)
+- [ ] Angry
+- [ ] Hurt
+- [ ] Frustrated
+- [ ] Anxious
+- [ ] Disappointed
+- [ ] Confused
+- [ ] Concerned
+- [ ] Other: _______
+
+Intensity (1-10): ____
+
+Am I calm enough to have this conversation productively?
+- [ ] Yes - proceed
+- [ ] No - wait until: _______
+
+### The Impact
+
+**On me:**
+
+**On the team:**
+
+**On the work/project:**
+
+**On our relationship:**
+
+---
+
+## Part 2: Considering Their Perspective
+
+### How They Might See It
+
+If I asked them to describe this situation, they might say:
+
+### Their Possible Constraints
+
+What pressures, limitations, or challenges might they be facing?
+-
+-
+-
+
+### Their Positive Intent
+
+Even if the impact was negative, what might their positive intention have been?
+
+### What They Care About
+
+What do they value that I should acknowledge?
+
+---
+
+## Part 3: My Contribution
+
+### What I Did (or Didn't Do)
+
+Be honest - even if it's small:
+-
+-
+
+### What I Could Have Done Differently
+
+-
+-
+
+### What I'm Willing to Own
+
+What responsibility will I acknowledge in the conversation?
+
+---
+
+## Part 4: Defining My Goals
+
+### What I Want to Accomplish
+
+**Primary outcome:** [Most important thing]
+
+**Secondary outcomes:**
+1.
+2.
+
+### What I'm Willing to Accept
+
+**Best case:** [Ideal outcome]
+
+**Acceptable:** [Minimum acceptable]
+
+**Deal breaker:** [What I won't accept]
+
+### Relationship Goal
+
+After this conversation, I want our relationship to be:
+
+### Identity Goal
+
+Who do I want to be in this conversation?
+(e.g., "calm and direct," "open and curious," "firm but fair")
+
+---
+
+## Part 5: Planning the Conversation
+
+### My Opening
+
+**Purpose statement:**
+"I'd like to discuss..."
+
+**Intent statement:**
+"My goal is to..."
+
+**Invitation:**
+"I'd like to hear your perspective..."
+
+**Full Opening (write it out):**
+
+### Key Points to Make
+
+1. [Situation: When/where]
+2. [Behavior: What happened]
+3. [Impact: What resulted]
+
+### Questions to Ask
+
+1.
+2.
+3.
+
+### Potential Solutions/Proposals
+
+1.
+2.
+
+---
+
+## Part 6: Anticipating Challenges
+
+### If They Get Defensive
+
+What they might say:
+
+My response:
+
+### If They Deny
+
+What they might say:
+
+My response:
+
+### If They Deflect
+
+What they might say:
+
+My response:
+
+### If I Get Triggered
+
+My triggers in this situation:
+
+My plan to stay calm:
+
+---
+
+## Part 7: Logistics
+
+### Pre-Conversation Checklist
+
+- [ ] Scheduled time that works for both
+- [ ] Private location identified
+- [ ] Enough time allocated (__ minutes)
+- [ ] Backup plan if interrupted
+- [ ] My emotional state is ready
+
+### Post-Conversation Plan
+
+- [ ] Will send follow-up email within 24 hours
+- [ ] Check-in scheduled for: _______
+- [ ] Documentation needed: [ ] Yes [ ] No
+```
+
+## Preparation Prompts by Conversation Type
+
+### For Performance Conversations
+
+**Additional Questions to Answer:**
+
+1. What specific examples do I have? (At least 2-3)
+2. What are the clear expectations they weren't meeting?
+3. What support have I provided?
+4. What documentation exists?
+5. What are the consequences if this continues?
+6. What resources can I offer for improvement?
+
+### For Conflict Conversations
+
+**Additional Questions to Answer:**
+
+1. What is the history of this conflict?
+2. Have we discussed this before? What happened?
+3. Are there other people involved or affected?
+4. Is there a pattern, or is this a one-time issue?
+5. What do we have in common that we can build on?
+6. What would resolution look like for them?
+
+### For Upward Conversations (with Manager)
+
+**Additional Questions to Answer:**
+
+1. How does this align with their priorities?
+2. What's in it for them to address this?
+3. What political considerations exist?
+4. How will they look to their stakeholders?
+5. What alternatives can I propose if my first ask is rejected?
+6. What's my BATNA (best alternative if this fails)?
+
+### For Compensation Conversations
+
+**Additional Questions to Answer:**
+
+1. What is my market value? (Research data)
+2. What are my key accomplishments since last adjustment?
+3. What is the timing relative to budget cycles?
+4. What am I willing to accept besides salary?
+5. What is my timeline if I don't get what I want?
+6. What is the company's current financial situation?
+
+## Mental Preparation Techniques
+
+### The 5-Minute Calm-Down
+
+Before entering the conversation:
+
+1. **Breathe:** 4 counts in, 4 counts hold, 4 counts out (3x)
+2. **Ground:** Feel your feet on the floor, notice the room
+3. **Remind:** State your positive intention ("I want to understand")
+4. **Release:** Let go of needing to "win"
+5. **Begin:** Walk in with curiosity, not certainty
+
+### The Perspective Shift
+
+If you're feeling adversarial:
+
+1. Imagine they're a friend who did this
+2. Imagine you did what they did - what would explain it?
+3. Imagine someone you respect is watching
+4. Ask: "What would a wise, fair person do?"
+
+### The Stakes Reality Check
+
+When anxiety is high:
+
+1. What's the worst that could happen?
+2. How likely is that really?
+3. If it did happen, could I survive it?
+4. What's most likely to happen?
+5. What's the best that could happen?
+
+## Common Preparation Mistakes
+
+### Over-Preparing
+
+**Signs:**
+
+- You've written a script word-for-word
+- You've anticipated 20 possible responses
+- You're more anxious, not less
+- You're delaying the conversation to "prepare more"
+
+**Fix:** Prepare themes, not scripts. Know your key points and opening, then trust yourself.
+
+### Under-Preparing
+
+**Signs:**
+
+- You're "winging it"
+- You haven't thought about their perspective
+- You don't have specific examples
+- You don't know what outcome you want
+
+**Fix:** At minimum, complete the Quick Prep template.
+
+### Preparing to Attack
+
+**Signs:**
+
+- Your notes focus on what they did wrong
+- You've built a legal case
+- You're rehearsing "gotcha" moments
+- You've decided the outcome already
+
+**Fix:** Add: What might I be missing? What's their perspective?
+
+### Preparing to Avoid
+
+**Signs:**
+
+- Your opening is full of softeners
+- You're planning to hint, not state
+- You've prepared escape routes
+- You're hoping they'll bring it up first
+
+**Fix:** Write a direct opening. Practice saying it out loud.