feat: improve reliability of generated agent skills#1284
Conversation
… skills Add an OpenSpec change proposal (proposal/design/tasks + spec delta) that establishes a quality contract for the 11 generated agent skills: trigger disambiguation, canonical structure, explicit success criteria, named failure recovery, single-source skill/command generation, shared-snippet reuse, lean always-on body, and cross-skill navigation. Proposal only — no skill code or CLI behavior changes in this PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds planning docs for a unified Changesskill-authoring-conventions planning documents
Estimated code review effort: 2 (Simple) | ~10 minutes Possibly related PRs
Suggested reviewers: 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
openspec/changes/improve-skill-instructions/specs/skill-authoring-conventions/spec.md (1)
73-79: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winReconcile unconditional requirement with conditional scenario.
The requirement states "Each skill SHALL reference the related or next skill" unconditionally, but the scenario's
WHENclause only triggers "when a natural next or sibling skill exists." This leaves terminal skills (e.g.,feedback) without a defined behavior. Either:
- Add a scenario covering the absence of a related skill, or
- Soften the requirement to "SHALL where a natural next or sibling skill exists."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@openspec/changes/improve-skill-instructions/specs/skill-authoring-conventions/spec.md` around lines 73 - 79, The Cross-Skill Navigation requirement is unconditional while the scenario in the skill-authoring-conventions spec only applies when a natural next or sibling skill exists. Update the Requirement: Cross-Skill Navigation text and/or add a complementary scenario in the same spec so terminal skills like feedback have explicit behavior, using the existing requirement and scenario wording as the anchor. Make the policy consistent by either qualifying the requirement with “where a natural next or sibling skill exists” or adding an absence case that defines what terminal skills should do.
🧹 Nitpick comments (2)
openspec/changes/improve-skill-instructions/design.md (1)
31-39: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueAdd language specifier to fenced code block.
Satisfy
markdownlintMD040 by tagging the structural diagram astext(ormarkdown). No semantic change.-``` +```text Use when — one line; includes the sibling boundary🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@openspec/changes/improve-skill-instructions/design.md` around lines 31 - 39, The fenced structural diagram in the skill instructions is missing a language tag and needs to be annotated to satisfy markdownlint MD040. Update the fenced block in the design document so the diagram is explicitly marked as text (or markdown) while keeping the content unchanged; use the existing fenced section containing the “Use when”, “Inputs”, “Steps”, and “Guardrails” headings as the target.Source: Linters/SAST tools
openspec/changes/improve-skill-instructions/proposal.md (1)
3-3: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueRepetitive use of "right" weakens prose.
Three instances in one sentence dilute impact. Vary the wording: e.g., "correct skill," "proper steps," "intended place."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@openspec/changes/improve-skill-instructions/proposal.md` at line 3, The sentence in the OpenSpec proposal repeats “right” three times, making the prose feel repetitive and weak. Revise that sentence in the proposal text to vary the wording while preserving meaning, using distinct phrasing such as “correct skill,” “proper steps,” and “intended place” so the opening reads more cleanly.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@openspec/changes/improve-skill-instructions/proposal.md`:
- Line 59: The proposal text has a wording typo in the new capability spec
reference: update the phrase in the skill-authoring-conventions entry from “on
archive” to “on disk” or “in the repository.” Locate the bullet mentioning
openspec/specs/skill-authoring-conventions and correct the description so it
matches the repository context.
In
`@openspec/changes/improve-skill-instructions/specs/skill-authoring-conventions/spec.md`:
- Line 25: The skill-authoring convention text currently uses a lowercase
section name that conflicts with the proposal’s title-case naming. Update the
instructions in the spec so the required sections are named consistently as
title-case labels, matching the proposal’s “Use when / Inputs / Steps / Success
/ Failure & recovery / Guardrails / Related” ordering, and keep this wording
aligned wherever the section list is referenced so generators and tests can
match it deterministically.
In `@openspec/changes/improve-skill-instructions/tasks.md`:
- Line 31: The task item overstates that every skill must have a Related line,
but terminal or isolated skills may not have a natural successor. Update the
wording in the task list entry to explicitly scope it to skills that have a
natural workflow successor, or add a short exception list for terminal cases
such as feedback; keep the change aligned with the related skill-instruction
spec language.
---
Outside diff comments:
In
`@openspec/changes/improve-skill-instructions/specs/skill-authoring-conventions/spec.md`:
- Around line 73-79: The Cross-Skill Navigation requirement is unconditional
while the scenario in the skill-authoring-conventions spec only applies when a
natural next or sibling skill exists. Update the Requirement: Cross-Skill
Navigation text and/or add a complementary scenario in the same spec so terminal
skills like feedback have explicit behavior, using the existing requirement and
scenario wording as the anchor. Make the policy consistent by either qualifying
the requirement with “where a natural next or sibling skill exists” or adding an
absence case that defines what terminal skills should do.
---
Nitpick comments:
In `@openspec/changes/improve-skill-instructions/design.md`:
- Around line 31-39: The fenced structural diagram in the skill instructions is
missing a language tag and needs to be annotated to satisfy markdownlint MD040.
Update the fenced block in the design document so the diagram is explicitly
marked as text (or markdown) while keeping the content unchanged; use the
existing fenced section containing the “Use when”, “Inputs”, “Steps”, and
“Guardrails” headings as the target.
In `@openspec/changes/improve-skill-instructions/proposal.md`:
- Line 3: The sentence in the OpenSpec proposal repeats “right” three times,
making the prose feel repetitive and weak. Revise that sentence in the proposal
text to vary the wording while preserving meaning, using distinct phrasing such
as “correct skill,” “proper steps,” and “intended place” so the opening reads
more cleanly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 2afb5da9-8590-4a2f-95a4-fe390e1ad158
📒 Files selected for processing (4)
openspec/changes/improve-skill-instructions/design.mdopenspec/changes/improve-skill-instructions/proposal.mdopenspec/changes/improve-skill-instructions/specs/skill-authoring-conventions/spec.mdopenspec/changes/improve-skill-instructions/tasks.md
| - `src/core/templates/workflows/*.ts` — rewrite the 11 workflow instruction strings (and feedback) to the new conventions; collapse each skill/command pair onto one instruction source. | ||
| - `src/core/templates/workflows/store-selection.ts` (and likely new sibling snippet modules) — house the shared change-selection, artifact-loop, and context/rules guardrail blocks. | ||
| - `src/core/shared/skill-generation.ts` / `src/core/templates/skill-templates.ts` — adjust the assembly so skill and command derive from one source. | ||
| - `openspec/specs/skill-authoring-conventions/` — new capability spec created on archive. |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Fix typo: "on archive" → "on disk" (or "in the repository").
"On archive" does not fit the context of creating a new spec directory.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@openspec/changes/improve-skill-instructions/proposal.md` at line 59, The
proposal text has a wording typo in the new capability spec reference: update
the phrase in the skill-authoring-conventions entry from “on archive” to “on
disk” or “in the repository.” Locate the bullet mentioning
openspec/specs/skill-authoring-conventions and correct the description so it
matches the repository context.
|
|
||
| ## 4. Cross-skill navigation | ||
|
|
||
| - [ ] 4.1 Add a Related line to every skill pointing to its natural next/sibling (e.g. `propose` → `apply`, `verify` → `archive`, `new-change` → `continue`) |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Clarify terminal skills without a natural next/sibling.
"Every skill" includes terminal or isolated skills that may not have a meaningful next step. Either enumerate exceptions (e.g., feedback) or change to "every skill that has a natural workflow successor," matching the spec's conditional scenario.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@openspec/changes/improve-skill-instructions/tasks.md` at line 31, The task
item overstates that every skill must have a Related line, but terminal or
isolated skills may not have a natural successor. Update the wording in the task
list entry to explicitly scope it to skills that have a natural workflow
successor, or add a short exception list for terminal cases such as feedback;
keep the change aligned with the related skill-instruction spec language.
…eservation contract - Correct duplication/size figures to measured values (onboard 543, bulk-archive 237, verify 160, explore 278 instruction lines; skill/command overlap 89-100% for 9 of 11 pairs; propose body 87% identical to ff-change). - Add an audit-evidence table and worked before/after examples (trigger disambiguation, explicit success, failure recovery) to design.md. - Add a Behavior Preservation requirement and tighten the single-source and lean-body scenarios to be normalizable/testable. - Add behavior-preservation and single-source identity validation tasks. Strict-validated with the repo CLI. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
alfred-openspec
left a comment
There was a problem hiding this comment.
This proposal is solid. The measured audit evidence plus behavior-preservation contract makes this safe to take forward, and the single-source plan matches the existing template drift risk.
…ribution, and AGENTS.md guidance Broaden the proposal from instruction quality to making OpenSpec's skills first-class Agent Skills packages and getting them listable in a public directory: - skill-authoring-conventions: add standard-conformance and a generation/CI validation gate; anchor the lean-body rule to the standard's <500-line / ~5000-token budget with references/ split (onboard is the one over-budget body). - skill-distribution (new capability): a validated, publishable bundle and a documented listing checklist. - docs-agent-instructions (modified): openspec/AGENTS.md advertises the skills and the deterministic CLI so non-skill-loading agents follow the same workflow. Notes: agents.sh is a voice product, not the registry — the target is the Agent Skills standard (agentskills.io) and the skills.sh directory. Verified all 11 skill names already satisfy name==folder and the charset rules; deliberately orthogonal to add-tool-command-surface-capabilities (no layout/delivery change). Strict-validated; 3 spec deltas. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
openspec/changes/improve-skill-instructions/specs/skill-authoring-conventions/spec.md (1)
73-77: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winClarify or relax the "one level deep" path constraint.
Requiring reference links to be
references/files at "a relative path one level deep" is brittle if skill folders are nested or reorganized. Either explain why the depth matters (e.g., standard-mandated layout), or rephrase to require a stable relative path fromSKILL.mdwithout prescribing depth.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@openspec/changes/improve-skill-instructions/specs/skill-authoring-conventions/spec.md` around lines 73 - 77, Update the “Reference material in on-demand files” scenario in skill-authoring-conventions so the link requirement is less brittle: either justify the “one level deep” constraint or change it to require a stable relative link from SKILL.md without mandating directory depth. Keep the existing references/ guidance and adjust the scenario text so authors can place linked material using the relevant relative path while preserving the rule that the body remains readable without opening the reference file.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In
`@openspec/changes/improve-skill-instructions/specs/skill-authoring-conventions/spec.md`:
- Around line 73-77: Update the “Reference material in on-demand files” scenario
in skill-authoring-conventions so the link requirement is less brittle: either
justify the “one level deep” constraint or change it to require a stable
relative link from SKILL.md without mandating directory depth. Keep the existing
references/ guidance and adjust the scenario text so authors can place linked
material using the relevant relative path while preserving the rule that the
body remains readable without opening the reference file.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b5555378-acd1-4754-84df-87178d86198c
📒 Files selected for processing (6)
openspec/changes/improve-skill-instructions/design.mdopenspec/changes/improve-skill-instructions/proposal.mdopenspec/changes/improve-skill-instructions/specs/docs-agent-instructions/spec.mdopenspec/changes/improve-skill-instructions/specs/skill-authoring-conventions/spec.mdopenspec/changes/improve-skill-instructions/specs/skill-distribution/spec.mdopenspec/changes/improve-skill-instructions/tasks.md
✅ Files skipped from review due to trivial changes (4)
- openspec/changes/improve-skill-instructions/specs/skill-distribution/spec.md
- openspec/changes/improve-skill-instructions/design.md
- openspec/changes/improve-skill-instructions/tasks.md
- openspec/changes/improve-skill-instructions/proposal.md
- allowed-tools: each skill declares its toolset and emits the standard's allowed-tools frontmatter; Bash scoped to Bash(openspec:*) for CLI-only skills, unrestricted Bash only for apply-change/onboard (arbitrary commands). Declared set is a validated superset of body usage, so strict-allowlist agents never block a needed tool and ignoring agents are unaffected — pure upside. - New requirement + scenarios in skill-authoring-conventions; design rationale for the asymmetric-risk decision; tasks; validation-gate covers tool coverage. - Coherence pass: clarify that conformance/distribution/allowed-tools target the 11 generated SKILL.md skills; feedback is held to the authoring bar only. 3 deltas, 14 reqs / 30 scenarios, strict-valid. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
openspec/changes/improve-skill-instructions/specs/skill-authoring-conventions/spec.md (1)
18-18: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winUse title-case for section names.
The spec still uses lowercase
"use when"here. Per the design's canonical structure (line 53), section names are title-case:Use when / Inputs / Steps / Success / Failure & recovery / Guardrails / Related. Use consistent title-case section names so generators and validators can match them deterministically. This applies to line 25 as well.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@openspec/changes/improve-skill-instructions/specs/skill-authoring-conventions/spec.md` at line 18, Update the spec text in the relevant clause and any matching references so section names use title-case consistently; specifically, replace the lowercase “use when” wording in the affected requirement with “Use when,” and align the other section heading mention near the same area to the canonical title-case names used by the design structure. Keep the wording deterministic so generators and validators can match section names like Use when, Inputs, Steps, Success, Failure & recovery, Guardrails, and Related.
♻️ Duplicate comments (1)
openspec/changes/improve-skill-instructions/specs/skill-authoring-conventions/spec.md (1)
25-25: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winUse title-case for section names.
The scenario still lists 'a "use when" line' in lowercase. Align with the design's canonical structure (line 53) using title-case
Use when, and ensureInputsis also capitalized for consistency within the same list.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@openspec/changes/improve-skill-instructions/specs/skill-authoring-conventions/spec.md` at line 25, The canonical instruction sequence in the scenario still uses lowercase section labels; update the wording in the specification so the listed sections match the design’s title-case convention. In the relevant requirement text, change the “use when” entry to “Use when” and ensure “Inputs” remains capitalized, keeping the rest of the ordered list aligned with the same title-case style.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@openspec/changes/improve-skill-instructions/specs/skill-authoring-conventions/spec.md`:
- Line 127: The unrestricted shell access rule is ambiguous because it refers to
“the implementation skill” instead of the explicitly named skill. Update the
wording in the skill-authoring conventions spec to use apply-change directly, or
clearly define that “implementation skill” means apply-change, so generators and
validators have a single unambiguous target.
---
Outside diff comments:
In
`@openspec/changes/improve-skill-instructions/specs/skill-authoring-conventions/spec.md`:
- Line 18: Update the spec text in the relevant clause and any matching
references so section names use title-case consistently; specifically, replace
the lowercase “use when” wording in the affected requirement with “Use when,”
and align the other section heading mention near the same area to the canonical
title-case names used by the design structure. Keep the wording deterministic so
generators and validators can match section names like Use when, Inputs, Steps,
Success, Failure & recovery, Guardrails, and Related.
---
Duplicate comments:
In
`@openspec/changes/improve-skill-instructions/specs/skill-authoring-conventions/spec.md`:
- Line 25: The canonical instruction sequence in the scenario still uses
lowercase section labels; update the wording in the specification so the listed
sections match the design’s title-case convention. In the relevant requirement
text, change the “use when” entry to “Use when” and ensure “Inputs” remains
capitalized, keeping the rest of the ordered list aligned with the same
title-case style.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5f8d4c0f-014a-4cd5-b90a-fb65bfacdb75
📒 Files selected for processing (4)
openspec/changes/improve-skill-instructions/design.mdopenspec/changes/improve-skill-instructions/proposal.mdopenspec/changes/improve-skill-instructions/specs/skill-authoring-conventions/spec.mdopenspec/changes/improve-skill-instructions/tasks.md
🚧 Files skipped from review as they are similar to previous changes (2)
- openspec/changes/improve-skill-instructions/tasks.md
- openspec/changes/improve-skill-instructions/proposal.md
| #### Scenario: CLI bash pre-approved and narrowly scoped | ||
| - **WHEN** a skill invokes the OpenSpec CLI through a shell tool | ||
| - **THEN** its `allowed-tools` SHALL pre-approve the OpenSpec CLI invocation scoped to that binary (for example `Bash(openspec:*)`) | ||
| - **AND** unrestricted shell access SHALL be declared only for skills that run arbitrary build or test commands (for example the implementation skill) |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Clarify which skill is "the implementation skill."
The spec uses "the implementation skill" as the example for unrestricted Bash, but the design explicitly names apply-change and onboard. Use the actual skill name (apply-change) or clarify that "implementation skill" refers to it, so the generator and validation have an unambiguous target.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@openspec/changes/improve-skill-instructions/specs/skill-authoring-conventions/spec.md`
at line 127, The unrestricted shell access rule is ambiguous because it refers
to “the implementation skill” instead of the explicitly named skill. Update the
wording in the skill-authoring conventions spec to use apply-change directly, or
clearly define that “implementation skill” means apply-change, so generators and
validators have a single unambiguous target.
…ission-AI#1289) Address issue Fission-AI#1289: docs/concepts.md's "What a Spec Is (and Is Not)" guidance (what belongs in a spec vs. what to keep out) never reaches the skills that draft specs, so agents write implementation-laden specs unless separately instructed. Add a SPEC_CONTENT_GUIDANCE shared snippet, sourced from concepts.md and embedded by the spec-authoring skills (propose, ff-change, continue-change, sync-specs), plus a new "Embedded Spec-Content Guidance" requirement in skill-authoring-conventions and a test asserting the snippet stays aligned with the docs. Proposal, design, and tasks updated to match. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…guidance A doc-vs-skill audit found Fission-AI#1289 (spec-content guidance stranded in the docs) is one instance of a class: rules that shape artifact quality live only in docs/ and never reach the skills that draft artifacts, so agents don't follow them unless told. Confirmed absent from the templates by grep: right-sized rigor (Lite/Full), RFC-2119 keyword meanings, scenario quality (edge cases), and delta conventions (MODIFIED shows prior value, REMOVED says why). Generalize the requirement "Embedded Spec-Content Guidance" into "Embedded Authoring Guidance" (5 scenarios) covering the whole class, add a SPEC_CONVENTIONS_GUIDANCE shared snippet alongside SPEC_CONTENT_GUIDANCE, and require AGENTS.md (docs-agent-instructions) to carry the same conventions so non-skill agents get them too. Design gains an audit table plus two deliberately out-of-scope divergences (enabler-graph vs. gate wording; update-vs-fresh heuristics, owned by add-update-workflow). Now 15 requirements / 36 scenarios; strict-valid. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Incorporate maintainer direction on the skills architecture: - Duplication across skill files is intentional (self-contained skills → independent rewrites), so drop the single-source/DRY pillar. Item 4 becomes "self-contained skills, shared conventions by reference"; the spec requirement, design principles/decisions/alternatives, and tasks (no single-source refactor, no extracted procedure constants) follow. - Favor design/behavior guidance over procedure-heavy "if this then that" skills. Item 2 becomes guidance-first; the canonical structure's Steps section becomes Guidance, with deep/exact procedure moved to references/. - Deliver the Fission-AI#1289-class authoring guidance as a proposal-writing reference the artifact-drafting skills link to (item 12), not inline shared snippets. AGENTS.md carries the same reference for non-skill agents. Tests assert the reference matches concepts.md and that skills link to it. Three architecture principles now stated up front in What Changes and design. Still 15 requirements / 36 scenarios; strict-valid. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rewrite Start implementing skill-authoring-conventions in this PR, with a measured before/after as proof it earns its place. - Add a conformance scorer (src/core/shared/skill-conformance.ts) that scores every skill against the conventions on objective signals (trigger boundary, success criteria, failure & recovery, guardrails, related-skills, body budget, authoring-reference link) and prints a scorecard. - Add the authoring-conventions reference (the proposal-writing reference, src/core/templates/workflows/authoring-conventions.ts) — compact form of docs/concepts.md (belongs/avoid, rigor, RFC-2119 meanings, scenario quality, delta conventions). Closes the Fission-AI#1289 class. - Emit it on disk: getSkillReferenceFiles + init/update write references/ for exactly the skills that link it (verified e2e: openspec init emits openspec-propose/references/authoring-conventions.md; new-change gets none). - Rewrite the create-a-change family (new/propose/ff/continue) and sync-specs skills to the conventions — trigger boundaries, Use when/Inputs/Success/ Failure & recovery/Related, and the reference link for the spec-authoring ones. Behavior preserved (same commands/prompts/artifacts); command templates unchanged (self-contained, independently rewritable). - Regenerate golden template hashes; add skill-conformance test. Measured efficacy: convention checks passing rose 33/81 -> 57/81 across the 11 skills; the five rewritten skills now score full marks (7/7 or 8/8). Full suite green except a pre-existing, environment-specific zsh-installer failure (fails identically on baseline). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Complete the implementation across all 11 skills and the cross-cutting infra. Skills (behavior-preserving, command templates unchanged): - Rewrote apply, archive, bulk-archive, verify (procedural), explore (stance), onboard (tutorial), and feedback to the conventions — trigger boundaries, Use when/Inputs/Success/Failure & recovery/Related. Combined with the earlier create-a-change family + sync-specs, all 11 skills now conform. allowed-tools (item 11): - Declared per-skill toolsets (skill-tools.ts); generateSkillContent emits allowed-tools frontmatter. Bash scoped to Bash(openspec:*) for CLI-only skills; unrestricted Bash only for apply and onboard. Conformance gate (item 8) + distribution (item 9/skill-distribution): - validateSkillConformance enforces frontmatter validity, name==folder, resolvable references, and declared tools as hard errors; body budget is a warning. Wired into init/update (fail rather than write a bad skill) and covered in CI. Bundle-validation test + docs/skill-distribution.md checklist. Efficacy: convention checks 33/81 -> 80/81 (the one miss is onboard's over-budget body, a documented warning). Full suite green except the pre-existing env-specific zsh-installer failure. Regenerated all golden hashes. BLOCKED / flagged for maintainer: docs-agent-instructions (AGENTS.md, item 10) is left unbuilt because the codebase removed openspec/AGENTS.md generation and legacy-cleanup deletes it as obsolete; re-introducing it would contradict that direction. See tasks.md §9. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…er-body, drop AGENTS.md Methodical pass to fully satisfy the spec deltas: - Lean body: moved deep reference material out of the skill bodies into emitted references/ files — onboard artifact skeletons (references/onboarding-artifact-templates.md, body now 456 lines, under the 500 budget), sync-specs delta-format (references/delta-format.md), bulk-archive conflict examples (references/conflict-resolution.md), verify dimension detail (references/verification-dimensions.md). Generalized getSkillReferenceFiles to a REFERENCE_REGISTRY; onboard's *command* keeps the skeletons inline (self-contained, no references/ dir). - Declared tools cover body usage (6.3): the gate now fails if a body uses an unambiguous tool token (AskUserQuestion/TodoWrite/Grep/Glob/WebFetch/ WebSearch) not in the declared allowed-tools. - Reference/docs drift (10.7): a test asserts the authoring-conventions reference and docs/concepts.md share the same anchor items. - Dropped the docs-agent-instructions capability: OpenSpec removed AGENTS.md generation (legacy-cleanup deletes openspec/AGENTS.md as obsolete), so there is no always-on surface to target. Spec delta deleted; proposal/design/tasks updated. Always-on guidance can return in a separate change once a surface exists. Result: conformance scorecard 33/81 -> 81/81 (all skills fully conformant); 2 spec deltas, 14 requirements / 32 scenarios, strict-valid; full suite green except the pre-existing env-specific zsh-installer failure. Golden hashes regenerated (only the changed skills + onboard command; other command templates byte-identical). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…f-scope Methodical verification that the spec is fully built out: - Added a requirement-by-requirement coverage matrix (14/14 requirements, all scenarios) mapping each to concrete code/test evidence. - Behavior Preservation proven: diffed the executed CLI command set of all 11 skills against the pre-rewrite baseline — identical for every skill (the apparent verify deltas are prose in the new recovery section, not executed commands); user-facing prompts preserved; per-skill behavioral specs hold. - Clarified that slash-command templates are out of the spec's scope (no scenario governs them) — not an open task; the 10 unchanged command templates stay byte-identical by design. Scorecard 81/81; 2 deltas, 14 req / 32 scenarios; strict-valid. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
alfred-openspec
left a comment
There was a problem hiding this comment.
Reviewed the latest skill-authoring updates through e46c3e4, including the bda1816 spec-content guidance change. The direction looks right: artifact-drafting skills now link to a shared authoring-conventions reference instead of duplicating docs guidance, and the proposal/spec/tasks are aligned with the self-contained-skill architecture.\n\nVerified locally: targeted authoring/conformance/parity tests pass, build passes, and openspec validate improve-skill-instructions --strict passes.
TabishB
left a comment
There was a problem hiding this comment.
Ok this PR is super rough. I think we need to start over here and make more intentional, deliberate changes, a bit at a time. A lot of it just makes the skill bulkier and I can't really see it adding much value.
The main part i like is for tools and auto approving OpenSpec commands.
| /** | ||
| * Scores one skill's description + instructions against the conventions. | ||
| */ | ||
| export function scoreSkillConformance(input: ConformanceInput): ConformanceResult { |
There was a problem hiding this comment.
I'm not sure I agree that these are the key ingredients of every single skill. Not every single skill is implicit or explicit in nature. Different types of skills serve different purposes.
i.e., some skills are implicitly triggered vs some are explicit.
These seem focused on implicit skills (which we don't really have in OpenSpec to begin with). Even if these were implicit, I don't think they would follow the same pattern.
|
|
||
| ${STORE_SELECTION_GUIDANCE} | ||
|
|
||
| **Use when:** the user wants to write code and check off a change's tasks. To confirm the work is correct without modifying tasks, use \`openspec-verify-change\`; to create missing artifacts (proposal, design, tasks) rather than implement them, use \`openspec-continue-change\`. |
There was a problem hiding this comment.
As mentioned in the call yesterday. "Use when" as part of the instruction makes no sense. By the time the instruction is loaded in the agent has already choosen to invoke the skill.
In general a lot of the skills at the moment are expected to be explicity triggered vs implicitly triggered.
There was a problem hiding this comment.
not to mention this just doubles up on the description above anyways
|
|
||
| **Use when:** the user wants to write code and check off a change's tasks. To confirm the work is correct without modifying tasks, use \`openspec-verify-change\`; to create missing artifacts (proposal, design, tasks) rather than implement them, use \`openspec-continue-change\`. | ||
|
|
||
| **Inputs:** optionally a change name. If omitted, infer it from conversation context; auto-select when only one active change exists; if vague or ambiguous you MUST run \`openspec list --json\` and prompt for available changes. |
There was a problem hiding this comment.
This just seems to double up on the Input section below?
|
|
||
| **Failure & recovery** | ||
| - **Ambiguous or missing change name:** run \`openspec list --json\` and prompt with the AskUserQuestion tool; never guess. | ||
| - **\`state: "blocked"\` (missing artifacts):** stop implementing and invoke \`openspec-continue-change\` to create the missing artifacts, then re-run the apply instructions. |
There was a problem hiding this comment.
A user could not have this skill installed. I'm not sure if continue would be the right thing to do here either?
I would expect this to be a soft warning with a prompt that asks to proceed.
| - **Allows artifact updates**: If implementation reveals design issues, suggest updating artifacts - not phase-locked, work fluidly`, | ||
| - **Allows artifact updates**: If implementation reveals design issues, suggest updating artifacts - not phase-locked, work fluidly | ||
|
|
||
| **Success:** every task in the tasks file is checked \`- [x]\`, and \`openspec instructions apply --change "<name>" --json\` reports \`state: "all_done"\` with 0 remaining tasks. |
There was a problem hiding this comment.
I'm not sure if this is the real success criteria, ideally the success is the change is implemented as expected + tasks ticked off + matching the specs etc
|
|
||
| ${STORE_SELECTION_GUIDANCE} | ||
|
|
||
| **Use when:** the user wants to finalize a single completed change - sync its delta specs and move it to the archive. To sync main specs without archiving (keeping the change active), use \`openspec-sync-specs\`; to archive several changes in one run, use \`openspec-bulk-archive-change\`. |
There was a problem hiding this comment.
Ok sensing a theme here that, I think the assumptions that have gone into this by the agent from the model are just not right. It's repeated the same mistake here as above. We also have no clue that this is actually whats missing in the skills.
There's no empirical or anecdotal evidence for the need for these additional sections. It dosen't feel tied to anything in particular and we can't really prove this makes it better or worse.
| export const SKILL_TOOLS: Record<string, string[]> = { | ||
| 'openspec-explore': [CLI, 'Read', 'Grep', 'Glob', 'Write', 'Edit', 'AskUserQuestion', 'TodoWrite'], | ||
| 'openspec-new-change': [CLI, 'Read', 'AskUserQuestion', 'TodoWrite'], | ||
| 'openspec-continue-change': [CLI, 'Read', 'Write', 'Edit', 'AskUserQuestion', 'TodoWrite'], | ||
| 'openspec-apply-change': [FULL_BASH, 'Read', 'Write', 'Edit', 'Grep', 'Glob', 'AskUserQuestion', 'TodoWrite', 'Skill'], | ||
| 'openspec-ff-change': [CLI, 'Read', 'Write', 'Edit', 'AskUserQuestion', 'TodoWrite'], | ||
| 'openspec-sync-specs': [CLI, 'Read', 'Edit', 'AskUserQuestion', 'TodoWrite'], | ||
| 'openspec-archive-change': [CLI, 'Read', 'AskUserQuestion', 'TodoWrite', 'Skill'], | ||
| 'openspec-bulk-archive-change': [CLI, 'Read', 'Edit', 'AskUserQuestion', 'TodoWrite'], | ||
| 'openspec-verify-change': [CLI, 'Read', 'Grep', 'Glob', 'AskUserQuestion', 'TodoWrite'], | ||
| 'openspec-onboard': [FULL_BASH, 'Read', 'Grep', 'Glob', 'Write', 'Edit', 'AskUserQuestion', 'TodoWrite'], | ||
| 'openspec-propose': [CLI, 'Read', 'Write', 'Edit', 'AskUserQuestion', 'TodoWrite'], | ||
| }; |
There was a problem hiding this comment.
Are some of these tools coding agent agnostic? Like is TodoWrite and AskUserQuestion tool agnostic? I don't think so.
FULL_BASH seems tricky? how does it work with user level safeguards?
| if (shouldGenerateSkills) { | ||
| const conformanceErrors: string[] = []; | ||
| for (const { template, dirName } of skillTemplates) { | ||
| conformanceErrors.push(...validateSkillConformance(template, dirName).errors); | ||
| } | ||
| if (conformanceErrors.length > 0) { | ||
| throw new Error(`Skill conformance check failed:\n- ${conformanceErrors.join('\n- ')}`); | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
It makes no sense for this to be gated during init. I feel like if we did get added in it should be a linting rule when people create skills.
Like imaging if we made an update to the skill that was non conformant. This would just cause it to error for users initializing openspec in their project.
|
I opened #1300 to auto-approve the openspec CLI in generated skills and I am closing this PR now. |
What this does
Upgrades the 11 agent skills OpenSpec generates — the
SKILL.mdfiles that tell coding agents how to run the OpenSpec workflow — so an agent can reliably pick the right skill, know when it's done, recover when it gets stuck, and write specs that follow OpenSpec's conventions. It also addsallowed-toolsfrontmatter so agents stop asking permission on everyopenspeccall, and a validation gate so a malformed skill can't ship.Nothing an agent actually does changes — same commands, same prompts, same artifacts. Only the instructions get clearer. (Verified by diffing every skill's executed CLI commands against
main— identical.)What it fixes
Measured across all 11 generated skills today:
openspecCLI, so agents prompt on every call.docs/, so agents drafting specs never followed it.Before → after (if merged)
openspecCLI pre-approved (no permission prompts)An objective conformance scorecard (printed on every test run) goes from 33/81 → 81/81 checks passing.
Why it's safe to merge
main.zsh-installerfailure that also fails onmain(unrelated shell-completion test).allowed-toolsis pure upside — agents that honor it stop prompting; agents that ignore it are unaffected.Notes for review
AGENTS.mdguidance for agents that don't load skills — was dropped, because OpenSpec no longer generates that file (legacy-cleanupdeletes it as obsolete), so there's nowhere to put it.tasks.mdhas the full requirement-by-requirement coverage matrix if you want the details.🤖 Generated with Claude Code