Cutout skills from flows#90
Conversation
8e340b2 to
3d60a9a
Compare
Rosetta Triage ReviewSummary: This PR refactors the Rosetta instruction set by extracting inline skill logic from workflow files into dedicated Findings:
Suggestions:
Automated triage by Rosetta agent |
📋 Prompt Quality Validation Report❌ Validation FailedSummary by File
📄
|
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Reference Integrity | Problem: The description frontmatter field was deleted. BASE had description: MUST apply when automated QA/testing task is assigned...; NEW frontmatter has only name/tags/baseSchema. docs/schemas/workflow.md defines description as a required field that states WHEN/HOW to use the workflow and is the routing trigger used to select the workflow.Reason: Without the schema-required description, workflow selection/routing can fail to match this flow when a QA task arrives, and the file violates its declared baseSchema contract. Solution: Restore a description: line in the frontmatter (the WHEN/HOW routing trigger, e.g. the original 'MUST apply when automated QA/testing task is assigned...'), per docs/schemas/workflow.md. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 3 | ⬇️ Slightly worse |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-code-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-data-collection.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-requirements-clarification.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Workflow Completeness | Problem: BASE Phase 2 had a full 'Task 2: Define Explicit Assertions' (assertion types, 'Document all assertions in test plan', a 'Defined Assertions' template block, and an 'Assertions Defined: [Count]' state line). NEW deletes all of it and delegates to USE SKILL aqa-requirements-elicitation, but that skill's body only lists unknowns and produces no assertions. The assertion-definition responsibility is dropped by both the phase and the delegated skill, not relocated. Reason: Explicit per-step assertions are consumed downstream by Phase 6 (test implementation) and Phase 8 (correction). Phase 6 even validates 'All assertions from Phase 2 implemented'. Losing them means tests are authored against an assertion contract that no step writes. Solution: Restore an assertion-definition step plus a 'Defined Assertions' block to this phase's update_test_plan/validation_checklist (mirroring BASE), OR genuinely move it by adding explicit assertion-definition steps to aqa-requirements-elicitation. Do not keep the title 'Assertion Definition' while neither artifact produces assertions. |
| 🔵 Medium | Output Contract | Problem: The phase frontmatter description still reads 'Requirements Clarification and Assertion Definition', but the assertion-definition output was removed. BASE Task 2 'Define Explicit Assertions' and the test-plan template's '### Defined Assertions' (per-step assert + verification) are gone from NEW; the NEW <update_test_plan> template captures only Questions/Responses/Edge Cases/Test Data.Reason: The phase title still advertises 'Assertion Definition', but neither this phase template nor the delegated aqa-requirements-elicitation skill produces assertions, so the stated output no longer matches what is created. Solution: Either drop 'Assertion Definition' from the description to match the slimmer scope, or note that assertion definition is delegated to aqa-requirements-elicitation so the phase's stated output and its template stay consistent. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Single Responsibility | 5 | ⬆️ Slightly better |
| Output Contract | 3 | ⬇️ Slightly worse |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 3 | ⬇️ Slightly worse |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-selector-identification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-selector-implementation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Example Grounding | Problem: The base file grounded the abstract instruction 'follow the exact existing pattern' with concrete TypeScript examples (private readonly selector style, getter/action methods, new page-object skeleton). The new file deletes all code examples; the only example left is naming ( getSubmitButton() vs submitBtn()) inside <skill_precedence>, which illustrates the conflict rule but not how to add a selector or method.Reason: An agent unfamiliar with the page-object convention now has no concrete anchor in this phase file and must infer the shape, which can produce inconsistent implementations. Solution: Keep the terse body but add one small grounding example (a 3-4 line before/after of adding a selector + accessor in the project pattern), or state that the concrete pattern is owned by the aqa-selector-management skill so the engineer knows where the example lives. |
| 🔵 Medium | Output Contract | Problem: The base file defined a concrete output template for the Phase 5 test-plan section (Page Objects Modified/Created with selector names, types, purposes, and methods). The new file removes that template entirely; the only remaining output spec is the terse agents/aqa-state.md bullet list in step 5.3 (counts and paths). The engineer no longer has a deterministic shape for the per-selector implementation record.Reason: Without a defined output shape the downstream Phase 7/8 consumers and the state file get inconsistent detail, reducing traceability across the chain. Solution: Add a short output-shape block (or a one-line reference) for the Phase 5 record listing the required fields per page object (path, selectors added with type/purpose, helper methods), similar to the compact <correction_output_shapes> block used in aqa-flow-test-correction.md. |
| ⚪ Low | Epistemic Honesty | Problem: The new file adds strong failure handling for zero-document ACQUIRE ( <skill_acquire_failure>) but does not ask the engineer to surface uncertainty when an existing page-object convention is ambiguous (base had this implicit via 'understand its structure and patterns' detail). Nothing tells the agent to flag low-confidence pattern matches.Reason: Selector/page-object style guesses made silently can diverge from project conventions and only surface much later at test execution. Solution: Add one line to step 5.1 or the checklist: if the existing page-object convention is unclear or conflicting, record the uncertainty in agents/aqa-state.md and proceed with the closest match rather than guessing silently. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 3 | ⬇️ Slightly worse |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 3 | ⬇️ Slightly worse |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-test-correction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-test-implementation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Failure Handling | Problem: Step 6.1 forbids ACQUIRing coding/testing/aqa-test-authoring directly and delegates to the handoff. Sub-step 3 handles a narrative-only (orchestration-less) handoff doc only by recording a warning and asking the user. There is no defined route to actually produce the test when the handoff doc is thin AND the user cannot resolve it. The file has no <failure_handling> block at all, unlike the sibling qa-flow-test-implementation.md. Reason: A thin or stale handoff doc combined with the strict no-direct-skill policy and no failure_handling leaves the agent with no sanctioned way to author the test, so Phase 6 can dead-end. Solution: Add a <failure_handling> block mirroring qa-flow-test-implementation.md: define behavior for zero-doc/thin handoff, lint failures, and partial returns, including an explicit user-approved fallback to the domain authoring skill so the phase has a defined way to finish. |
| 🔵 Medium | Workflow Completeness | Problem: Step 6.1 hard-forbids ACQUIRing coding, testing, aqa-test-authoring directly because 'the handoff delegates internally', but step 3 only handles the case where the handoff doc lacks orchestration sections by recording a warning and asking the user. There is no defined path for actually producing the test when the handoff doc is narrative-only and the user is unavailable, so the phase can stall with no completion route.Reason: A thin or stale handoff KB document combined with the strict no-direct-skill policy leaves the agent with no sanctioned way to author the test, breaking the chain at Phase 6. Solution: Add an explicit fallback for the narrative-only handoff case (mirroring the test-correction phase's debugging/coding fallback chain) gated behind user approval, so the phase has a defined way to finish rather than only a warning-and-wait branch. |
| 🔵 Medium | Example Grounding | Problem: All concrete TypeScript test examples from base (setup, actions, explicit assertions like expect(welcomeMessage).toContain(...), cleanup hooks, TestRail comment) were removed. The new file has no code-level anchor for what a 'good' test looks like; it relies entirely on the handoff skill the agent may not have inspected.Reason: Without a grounded example and with authoring fully delegated, an agent whose handoff skill is thin (the file explicitly anticipates narrative-only handoff docs) has no fallback pattern to produce a correct test. Solution: Either retain one minimal end-to-end test example, or add a sentence pointing the engineer to the handoff/authoring skill as the source of the concrete test pattern so the abstraction 'create automated test' is grounded somewhere reachable. |
| 🔵 Medium | Output Contract | Problem: The new file delegates all authoring to the automation-test-implementation-handoff skill and keeps only a terse state-file output (step 6.4). The base file specified the concrete test-file structure expectations (imports order, describe blocks, explicit assertions, no hardcoded waits, TestRail reference) and a Phase 6 test-plan record template. With those deleted, the deliverable test file has no shape contract in this phase file beyond 'lint-clean' and 'assertions implemented'.Reason: The validate step 6.2 checks 'assertions implemented' and 'page objects used' but the harder constraints (no hardcoded waits, assertion explicitness) present in base are gone, so a weaker test can pass this phase. Solution: Add a short list of the non-negotiable acceptance properties for the produced test file (uses page objects only, all Phase 2 assertions present, no hardcoded sleeps, project import order) as a contract the handoff output must satisfy, or explicitly state these belong to the handoff skill so the verifier knows where they live. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Output Contract | 3 | ⬇️ Slightly worse |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 3 | ⬇️ Slightly worse |
| Safety Boundaries | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/aqa-flow-test-report-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Example Grounding | Problem: Base grounded selector-error detection with a concrete list of error-message patterns ('selector did not become visible', 'NoSuchElementException', 'TimeoutException', etc.) that trigger mandatory page-source analysis. The new file keeps only 'Verify page source analyzed for selector errors' (step 7.2.3) with no example patterns, so the trigger for the page-source analysis is now vague. Reason: Without the trigger patterns an agent may skip page-source analysis on a selector failure, the exact case the base file forced, weakening root-cause accuracy. Solution: Restore a short example list of the selector/locator error signatures that must trigger page-source analysis (even 3-4 representative patterns), so the agent reliably recognizes when this mandatory step applies. |
| 🔵 Medium | Output Contract | Problem: The base file gave a concrete per-failure analysis record template (Error Type, Error Message, Stack Trace, Likely Cause, Evidence Label/Rationale, full Page Source Analysis block, Suggested Fix, Priority) plus a Phase 7 test-plan section schema. The new file removes both and only specifies the agents/aqa-state.md bullets in step 7.3. The detailed failure record that Phase 8 consumes now has no shape in this phase file.Reason: Phase 8 (test-correction) consumes 'failure analysis from Phase 7'; without a defined record shape the handoff between phases can lose the page-source and evidence detail Phase 8 needs. Solution: Add a compact failure-record shape (fields: test name, error type, evidence label + rationale, page-source finding, recommended fix, priority) so the analysis output that Phase 8 corrections rely on stays deterministic, or reference that shape as owned by the failure-analysis skill. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 3 | ⬇️ Slightly worse |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 3 | ⬇️ Slightly worse |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow-test-report-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/api-test-spec-authoring/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Failure Handling | Problem: Prerequisites require 'API endpoint contracts available' and 'Gap analysis and user clarifications completed', but the process gives no branch for when these inputs are missing or incomplete (step 1 just reads them). Reason: The pitfalls warn against placeholder values; without an explicit missing-input branch the agent will fabricate contract details, producing wrong specs. Solution: Add a guard at step 1: if endpoint contracts or clarifications are missing, stop and report the missing input to the caller rather than inventing request/response shapes. |
| 🔵 Medium | Success Criteria | Problem: The skill lists a 6-step process and pitfalls but has no explicit done-when block. There is no testable completion condition (e.g. every test case mapped to >=1 scenario, every scenario has exact values, file mapping covers all scenarios). Reason: Without a done-when the agent cannot self-check coverage and may emit partial specs that look complete. Solution: Add a short success-criteria block stating measurable completion: each test case yields >=1 scenario, every scenario has exact request/response values and explicit assertions, all scenarios appear in the file mapping table, and shared utilities reference their scenario IDs. |
| ⚪ Low | Epistemic Honesty | Problem: The skill never tells the author to flag scenarios where the contract is ambiguous or assumed; it pushes for 'exact test values' everywhere with no way to mark a value as inferred. Reason: Forcing exact values without an assumption marker hides guesses behind confident-looking specs. Solution: Add one line: when an exact value or status code is inferred rather than sourced from the contract, mark it as ASSUMED so the caller can verify. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/aqa-codebase-analysis/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Output Contract | Problem: Step 7 'Update Test Plan' lists the fields to add (framework, frontend analysis, page objects, similar tests, recommended location, utilities) but gives no output format or example, unlike the sibling skills aqa-selector-management and api-test-spec-authoring which both include a concrete markdown template. Reason: A field list without a format yields inconsistent sections, so the selector and implementation phases that consume this analysis cannot reliably find page-object and utility findings. Solution: Add an output_format block showing the markdown structure of the 'code analysis section' (headings and bullet shape) so the section is written consistently and downstream phases can parse it. |
| 🔵 Medium | Self-Validation | Problem: The skill has no verification step confirming the analysis findings before handing off (e.g. that referenced page objects and utilities actually exist at the paths recorded). Reason: Search-based findings can include stale or guessed paths; without a re-check the downstream implementation phase acts on unverified references. Solution: Add a final check that each reported page object, utility, and similar-test path was actually found in the codebase, not assumed. |
| 🔵 Medium | Failure Handling | Problem: Step 1 reads 'agents/user-app/project_description.md' as a hard prerequisite but gives no branch if that file is absent or lacks framework/structure info; only step 2 (user-instructions) has a skip-if-missing clause. Reason: The pitfall 'Assuming project structure without verification' is listed but no step prevents it when the description file is missing. Solution: Add a guard at step 1: if project_description.md is missing or lacks framework/structure, derive what is possible from the codebase and report the gap, or stop and ask the caller, instead of assuming structure (which the pitfalls already warn against). |
| 🔵 Medium | Success Criteria | Problem: No done-when block. The process ends at step 7 with no testable check that the analysis is complete (framework identified, page objects classified existing/missing/to-extend, test location decided with rationale). Reason: Without completion criteria the agent may skip steps (e.g. utility search) and still consider the analysis done. Solution: Add success criteria: framework and standards captured, every relevant page object classified existing/extend/create, test location chosen with rationale, reusable utilities listed. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/aqa-requirements-elicitation/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟠 Very High | Goal Specification | Problem: The frontmatter description claims three jobs (analyze gaps, define explicit measurable assertions, prepare structured questions), but the body only does gap analysis. when_to_use is a single fragment 'Define gaps in test case understanding' and the assertion-definition and question-preparation goals are absent from the process. Reason: Description-body mismatch makes the workflow load this skill expecting assertions and questions it never produces. Solution: Reconcile description and body: either add process steps for defining measurable assertions and preparing structured questions, or narrow the description to gap analysis only so the stated goal matches the actual capability. |
| 🟠 Very High | Output Contract | Problem: The skill produces no defined output. The description promises 'define explicit measurable assertions' and 'prepare structured questions for user', but the only output the process names is step 3 'Preprate list unknowns and ambiguities' with no format, schema, or example, and no assertion artifact at all. Reason: A skill consumed by an AQA workflow with no output contract cannot reliably hand structured gaps/questions to the next phase, breaking the chain. Solution: Add an output_format block defining the deliverable: a structured list of gaps/unknowns and a set of clarifying questions (and, per the description, the measurable assertions), with a short markdown template and where it is written (the test plan file). |
| 🟡 High | Success Criteria | Problem: No done-when condition. There is no testable statement of when elicitation is complete (e.g. every ambiguous step has a question, every expected result is measurable). Reason: Without completion criteria the agent stops arbitrarily, leaving gaps unaddressed. Solution: Add success criteria: every vague step has a clarifying question and every expected result is stated as a measurable assertion. |
| 🟡 High | Decision Branching | Problem: There is no branch for the outcome of the completeness analysis: what to do when no gaps are found vs many gaps, or when the test plan file is missing. Reason: Without branches the agent does not know how to terminate when the plan is already complete or the input is missing. Solution: Add explicit branches: if no gaps, record 'no clarifications needed' and proceed; if gaps exist, produce the question list; if the test plan file is absent, stop and report to the caller. |
| 🟡 High | Example Grounding | Problem: The skill gives no example of a gap, an assertion, or a question, unlike the sibling AQA skills which include concrete templates. Reason: Without examples the abstract checklist (clear steps, measurable results, edge cases) is interpreted inconsistently. Solution: Add one concrete example of a measurable assertion and one clarifying question derived from a vague test step. |
| 🟡 High | Precision & Explicitness | Problem: Step 3 contains a broken instruction 'Preprate list unknowns and ambiguities' (typo, missing word). It is the single action verb of the skill and is malformed. Reason: The skill's core action is unreadable, so the agent may mis-execute or skip the only output-producing step. Solution: Rewrite as a clear directive, e.g. 'Prepare a list of unknowns and ambiguities, one item per gap, each phrased as a specific clarifying question.' |
| 🟡 High | Workflow Completeness | Problem: The process is three terse steps and stops at 'Preprate list unknowns and ambiguities'. There is no step to turn ambiguities into questions, to define assertions, or to write results anywhere; the chain implied by the description is incomplete. Reason: An incomplete process leaves the agent guessing the missing steps, producing inconsistent elicitation output. Solution: Add ordered steps covering: derive assertions from each requirement, convert each unknown into a specific question, and persist the gaps/questions/assertions to the test plan. |
| 🔵 Medium | Failure Handling | Problem: The prerequisite 'Test plan file exists' has no handling if the file is missing or empty. Reason: Reading a missing plan file yields no gaps and a silently empty result. Solution: Add a missing-input branch: if 'agents/plans/aqa-.md' is absent or empty, stop and report to the caller rather than proceeding. |
| 🔵 Medium | Self-Validation | Problem: No verification step ensures the produced gap/question list actually covers all five completeness dimensions listed in step 2. Reason: Without a re-check the agent may answer only some dimensions and still finish. Solution: Add a re-check that each of the five analysis dimensions (steps clear, results measurable, data defined, edge cases, success criteria) was assessed and reflected in the output. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 1 | ❌ Much worse |
| Success Criteria | 2 | ⬇️ Slightly worse |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 2 | ⬇️ Slightly worse |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 2 | ⬇️ Slightly worse |
| Precision & Explicitness | 2 | ⬇️ Slightly worse |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 2 | ⬇️ Slightly worse |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 2 | ⬇️ Slightly worse |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 2 | ⬇️ Slightly worse |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/aqa-selector-management/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Failure Handling | Problem: Step 4 says to analyze page source HTML 'only when frontend code unavailable or selectors still missing', and the prerequisites note 'or will request page source', but there is no branch for when neither the frontend code nor the page source is available, leaving missing selectors unresolved. Reason: The top pitfall is 'Guessing selectors without verifying'; without a no-source branch the agent has no defined safe exit and may fabricate selectors. Solution: Add a branch: if selectors remain unidentified after both frontend search and page source are exhausted (or page source cannot be obtained), stop and report the unresolved selectors to the caller instead of guessing. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/aqa-test-authoring/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Epistemic Honesty | Problem: Nothing in the process tells the agent to flag low-confidence or assumed implementation choices. Step 5 'Follow action patterns from similar tests' and Step 6 'Use project assertion style' rely on inference, but there is no instruction to disclose when a pattern was guessed versus confirmed. Reason: Test authoring leans on inferred project conventions; undisclosed assumptions cause hard-to-trace test failures and reviewer rework. Solution: Add one line (e.g. to the validation step or output_format) requiring the agent to record any assumptions made (assumed selector, inferred pattern, unverified standard) so the reviewer can confirm them. |
| 🔵 Medium | Failure Handling | Problem: The skill assumes its prerequisites (complete test plan, updated page objects, project standards) are always present. Step 1 'Consolidate from test plan' and Step 9 validation give no instruction for what to do if the test plan is missing assertions, a required page-object method does not exist, or project coding standards are unknown. Reason: Without explicit handling for missing inputs the agent will silently invent assertions or selectors, producing a test that does not match requirements. Solution: Add a gate at the start of the process (or to the prerequisites block) instructing the agent to stop and ask the user / route back to the prior phase when a listed prerequisite is missing (e.g. missing assertion, missing page-object selector, unknown coding standard) rather than fabricating implementation. |
| ⚪ Low | Input Contract | Problem: The skill description and frontmatter do not begin with 'Rosetta' as the skill schema (docs/schemas/skill.md, 'description: ["Rosetta" + ...]') requires, and there is no <core_concepts> block carrying the schema-mandated 'All Rosetta prep steps MUST be FULLY completed, load-context skill loaded' line; is used instead, which is not a schema section. Reason: Minor schema-contract drift; does not break behavior but is inconsistent with sibling skills in the same family and the base schema. Solution: Prefix description with 'Rosetta', and either add the schema's <core_concepts> with the standard prep-steps line or confirm is an accepted family convention. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/aqa-test-debugging/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Epistemic Honesty | Problem: Step 5 'Identify Patterns and Root Causes' and the output_format 'Root Cause: [analysis]' do not require the agent to distinguish a verified root cause from a hypothesis. A confidently-stated but guessed root cause leads to a wrong fix in Part B. Reason: Failure triage frequently guesses; flagging confidence prevents applying fixes to misdiagnosed failures. Solution: Add a directive in Step 5 (or the output template) to mark each root cause as verified-from-evidence vs suspected, and to state what additional data would confirm a suspected cause. |
| ⚪ Low | Input Contract | Problem: Description does not begin with 'Rosetta' and there is no <core_concepts> block with the schema-required 'All Rosetta prep steps MUST be FULLY completed, load-context skill loaded' line (docs/schemas/skill.md lines 4 and 56); is used instead. Reason: Schema-contract drift consistent with the sibling authoring skill; minor and non-behavioral. Solution: Prefix description with 'Rosetta' and add the standard <core_concepts> prep-steps line, or confirm is an accepted family convention. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/automation-test-execution-analysis/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Example Grounding | Problem: The failure categories in step 7 (environment, data, product regression, test bug, flakiness, infra timeout, auth/session, selector/locator, contract mismatch, unknown) are listed abstractly with no example of how a given log line maps to a category, and step 8 'tie to evidence' gives no worked example. Reason: Categorization is the core judgment of this skill; an example reduces inconsistent or ambiguous categorization across runs. Solution: Add one short worked example mapping a sample error string (e.g. 'TimeoutException on element visibility') to its category and the evidence snippet, mirroring the concrete pattern matching done in the sibling aqa-test-debugging skill. |
| 🔵 Medium | Output Contract | Problem: Step 9 says 'Produce or update the parent workflow's analysis artifact (path and template from phase file)' but the skill defines no fallback structure or required fields for that artifact. Unlike its sibling aqa-test-debugging (which has a concrete output_format block), this skill has no schema or canonical example, so the categorized findings format is fully delegated and unverifiable from within the skill. Reason: Without any in-skill output shape, two runs can emit incompatible artifacts and the downstream correction phase cannot rely on a stable structure. Solution: Add a minimal required-field list for the analysis artifact (e.g. failure id, category, evidence reference, verified-vs-hypothesis flag, suggested owner file) as a default to use when the phase file omits a template, plus one short example row. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/automation-test-implementation-handoff/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟠 Very High | Reference Integrity | Problem: Both AQA Phase 6 (aqa-flow-test-implementation.md step 6.1) and QA Phase 5 (qa-flow-test-implementation.md step 5.1.4) hard-forbid ACQUIRing the domain test-authoring skills (aqa-test-authoring, qa-test-implementation) directly and delegate all authoring to this handoff skill. But the handoff never ACQUIREs those domain skills — its process step 4 only loads coding-agents-prompt-authoring plus 'any skill the parent names', and neither phase file names a domain skill. On the normal path the new aqa-test-authoring and qa-test-implementation skills are unreachable (orphaned). Reason: The PR added rich domain authoring skills but left no execution route to them, so the core content of the test-implementation phases is dead and authoring silently regresses to generic coding/testing. Solution: Add an explicit ACQUIRE/USE of the domain authoring skill inside the handoff keyed off a parent-supplied variable, and make each phase pass the name (AQA Phase 6 -> aqa-test-authoring; QA Phase 5 -> qa-test-implementation); OR relax the phase ban so the phase ACQUIREs the domain authoring skill directly. |
| 🟡 High | Dependency Management | Problem: core_concepts and process step 4 force-load coding-agents-prompt-authoring — a meta-skill for authoring PROMPTS (skills/agents/workflows) — inside a skill whose job is to author automated TEST CODE. It is loaded on every test-implementation run. Reason: A mis-wired heavy meta-skill on the critical authoring path inflates context cost and biases the agent toward prompt edits rather than writing tests, lowering reliability of every Phase 5/6 run. Solution: Remove the mandatory coding-agents-prompt-authoring ACQUIRE/USE from core_concepts and process step 4; replace it with the actual domain test-authoring skill (aqa-test-authoring / qa-test-implementation). If prompt-authoring is genuinely needed, state in one phrase why a test-implementation phase authors prompts. |
| 🔵 Medium | Single Responsibility | Problem: The skill mandates loading four skills in core_concepts and process (repository-implementation-standards, coding, testing, coding-agents-prompt-authoring) plus hitl. Bundling general coding, testing, repo-standards, and prompt-authoring under one handoff skill widens its responsibility beyond 'land approved tests and hand off execution'. Reason: Always loading four heavy skills inflates cost/context for every run and dilutes the skill's single boundary responsibility. Solution: Keep the handoff focused on orchestration + boundary; make coding/testing/standards conditional on the parent workflow rather than always-load, and drop coding-agents-prompt-authoring unless justified, so the skill does one job: implement-validate-handoff. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 2 | ⬇️ Slightly worse |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow-api-spec-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Reference Integrity | Problem: Step 2.1 cites qa-data-collection skill "step 4 for full discovery logic" and step 2.2 hands the spec source to swagger-contracts-analysis. The phase relies on internal step numbers of a separate skill, which is sibling-internal knowledge that can drift if that skill is renumbered.Reason: Referencing another skill's internal step number couples the phase to that skill's structure and breaks reference integrity when the skill is edited. Solution: Replace the hard-coded "step 4" pointer with a behavior reference (e.g., "per the backend-source discovery logic in qa-data-collection") so the phase does not depend on another artifact's internal numbering. |
| 🔵 Medium | Reference Integrity | Problem: Step 2.1 references the backend docs path as RefSrc/{project-name}/docs/ (capitalized) in two places. The canonical Rosetta path term is lowercase refsrc/ (per pa-rosetta.md folder list). On a case-sensitive Linux target repo, the capitalized path will not resolve to the real refsrc/ directory.Reason: A wrong-case path can silently fail to find the backend architecture docs, causing the phase to skip a valid spec source and fall back to weaker code-only analysis. Solution: Change RefSrc/{project-name}/docs/ to the canonical refsrc/{project-name}/docs/ in both occurrences of step 2.1 so the path matches the predefined Rosetta target folder. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow-data-collection.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow-documentation-mcp-subflow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Precision & Explicitness | Problem: Step 9 reads "USE SKILL with the Resolved MCP collection skill tag" and step 7 reads "USE SKILL confluence-source-harvesting". The Resolved MCP collection skill is a runtime-resolved variable, but step 9 uses the bare USE SKILL alias without making clear it must substitute the resolved tag, unlike the literal skill name in step 7. An agent could misread step 9 as a missing skill name.Reason: An ambiguous USE SKILL with no resolvable name can cause the agent to stall or pick the wrong skill at the actual collection step, which is the core action of the subflow.Solution: Reword step 9 to make the substitution explicit, e.g., "USE SKILL <Resolved MCP collection skill from step 1>", matching the explicit-variable convention already used in the output-contract COMPLETED row. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow-execution-and-report-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow-gap-and-requirements-clarification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow-project-config-loading.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Decision Branching | Problem: The conditional HITL hinges on "only if config does not already exist" (step 0.1 item 4, header type=HITL-CONDITIONAL), but the file never defines the if/then/else for the alternate branch: what happens when config DOES exist (load it, validate freshness, ask if stale?) versus when it does not (collect from user). The existence check that drives the branch is also not specified (which path/file is probed).Reason: An unspecified branch lets the agent ask the user redundantly when config exists, or skip collection when it is missing, breaking the conditional-HITL contract. Solution: Make the branch explicit: if qa-project-config.md exists and non-empty THEN load and proceed without asking; ELSE ask user for project info and create it. State where existence is checked (e.g., agents/qa/{IDENTIFIER}/qa-project-config.md). |
| 🟡 High | Failure Handling | Problem: Unlike its sibling phase files (Phase 3, 4, 5, 7) which all have a dedicated <failure_handling> block, this Phase 0 file has none. It does not say what to do when the qa-project-config skill ACQUIRE returns zero documents, when the session directory cannot be created, or when the user refuses to supply required project info. The parent qa-flow.md <failure_handling> covers zero-doc ACQUIRE generically, but the phase-local edge cases (directory creation failure, user declines mandatory config) are unhandled.Reason: Phase 0 is the foundation for all later phases; an unhandled config-collection failure can produce an empty or fabricated config that silently corrupts every downstream phase. Solution: Add a <failure_handling> block covering: (a) qa-project-config zero-doc ACQUIRE (defer to parent zero-doc rule), (b) directory creation failure under agents/qa/{IDENTIFIER}/, and (c) user refuses or cannot provide required project info when config is absent (stop, record blocked state, do not fabricate config). |
| 🔵 Medium | Epistemic Honesty | Problem: Step 0.1 loads or collects project config (Swagger availability, base URLs, auth scheme, spec locations) but never requires the agent to flag values that were assumed, inferred, or supplied with low confidence. update_state step 0.2 records a coarse 'Config Source: [Existing / User provided / Discovered]' label but there is no instruction to mark individual fields ASSUMED/UNVERIFIED when inferred rather than confirmed. Reason: Phase 0 config feeds every downstream phase; silently recording inferred endpoints/auth as confirmed hides guesses behind a confident-looking config and corrupts data collection and spec analysis with no audit trail. Solution: Add one line to step 0.1 or 0.2: when any required config field is inferred or uncertain (not confirmed by the user or read from a spec), mark it ASSUMED in qa-project-config.md and surface it to the user before Phase 1 rather than recording it as confirmed. |
| 🔵 Medium | Safety Boundaries | Problem: Step 0.1 item 4 "ASK USER for project info only if config does not already exist" has no guard against fabricating config when the user provides incomplete or no answers. Compared with sibling phases that explicitly forbid fabrication and silent-bypass, Phase 0 has no such boundary. Reason: Fabricated project config at Phase 0 propagates wrong assumptions into data collection, spec analysis, and implementation, causing systemic downstream failure. Solution: Add a boundary: do not invent endpoints, base URLs, auth schemes, or spec locations; if the user omits required fields, record the gap and stop rather than guessing. |
| 🔵 Medium | Output Contract | Problem:<workflow_context> and <validation_checklist> say the outputs are initial-data.md AND qa-project-config.md, but update_state step 0.2 "Files Created" lists only initial-data.md, qa-state.md — it omits qa-project-config.md and adds qa-state.md (which is the global state file, not a per-session output). The recorded file list contradicts the declared outputs.Reason: An inconsistent file ledger makes the skip-gate verification in the parent flow (which checks for expected artifacts) unreliable and can wrongly pass or fail resumption checks. Solution: Align step 0.2 "Files Created" with the declared outputs: list initial-data.md and qa-project-config.md; track qa-state.md separately as the state file, not a session output artifact. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow-test-case-specification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow-test-correction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow-test-implementation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/confluence-source-harvesting/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/gap-and-contradiction-analysis/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: The <common_patterns> section (Typical Contradictions / Typical Gaps / Typical Ambiguities) largely restates examples already given inside <identify_contradictions>, <identify_gaps>, and <identify_ambiguities> (e.g. 'Fast response (how fast?)' duplicates the 'fast' example in identify_ambiguities; 'Owner/assignee conflicts' duplicates the Owner value-mismatch example).Reason: Duplicated example lists add tokens to every load without adding decision value and dilute the single canonical place an agent looks for each category. Solution: Remove <common_patterns> or fold any non-duplicated entries into the relevant identify_* section so each example appears once. |
| 🔵 Medium | Self-Validation | Problem: The <process> ends at step 6 'Assess risk and produce findings' and <output_format> produces the document, but there is no step telling the agent to re-check its own output before finishing (e.g. confirm every finding has an exact source quote, confirm all four sections are populated, confirm IDs C/G/A are sequential and referenced in the Risk Assessment). <analysis_guidelines> and <pitfalls> state the rules but no explicit self-verification pass is required.Reason: Without an explicit re-check step the agent may emit findings lacking quotes or unreferenced IDs, which the downstream requirements/test phases cannot act on. Solution: Add a final process step or a <validation_checklist> requiring the agent to verify before output: each contradiction/gap/ambiguity carries an exact quote and source, every Risk Assessment entry references an existing finding ID, and all six output sections exist even when empty. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/mcp-confluence-data-collection/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Self-Validation | Problem: The skill has no step asking the agent to verify its output before finishing (confirm child pages checked per <pitfalls>, confirm truncation noted, confirm space/URL/labels populated). The pitfalls list states 'always check with get_page_children' but nothing in <process> requires confirming it happened.Reason: Child-page detail often holds acceptance criteria; without a verification step the agent can silently omit it and the downstream gap-analysis sees a false-complete dataset. Solution: Add a closing verification step in <process> or a checklist that re-checks: child pages fetched for each parent, truncation annotated, and the output template fields filled. |
| 🔵 Medium | Success Criteria | Problem: There is no explicit 'done when' statement. The skill lists a <process> and an <output_format> but never states the completion condition (e.g. every retrieved parent had its children checked, truncation flagged where applied, fallback recorded when zero results). The companion skill confluence-source-harvesting has a <validation_checklist>; this MCP skill has none.Reason: Without testable completion criteria the agent may stop after fetching one page and skip child-page retrieval or truncation flagging, producing incomplete raw data for downstream phases. Solution: Add a short success-criteria or validation-checklist block: e.g. 'Done when each parent page has children checked, all pages over the word budget are flagged truncated, and a zero-result run ends with a recorded user decision or noted gap.' |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/mcp-jira-data-collection/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Self-Validation | Problem: No output-verification step. <pitfalls> notes some fields may be permission-restricted and that rendered HTML may need markdown conversion, but <process> never requires the agent to confirm description was converted or that restricted fields were flagged rather than dropped.Reason: Silently dropped or unconverted fields produce a normalized artifact that misleads downstream gap-analysis and test generation. Solution: Add a closing verification step: re-check that the description is in markdown, restricted/empty fields are explicitly marked, and custom fields discovered via jira_search_fields are reflected in the output. |
| 🔵 Medium | Success Criteria | Problem: No explicit completion condition. The <process> extracts fields and a <fallback> handles a missing ticket, but nothing states when extraction is considered done (e.g. all template fields populated or marked N/A, restricted fields noted rather than left blank). The output template lists 'Acceptance criteria' as a concept in the sibling qa-data-collection skill but this skill's template has no done-when.Reason: Without testable done-criteria the agent may stop after fetching summary/description and omit comments or custom fields that downstream phases need. Solution: Add a brief success-criteria/checklist: 'Done when every output-template field is populated or explicitly marked N/A, permission-restricted fields are labeled, and the ticket key + URL are present.' |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/mcp-testrail-data-collection/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Failure Handling | Problem: Unlike the Jira skill (which has a <fallback> for 'ticket not found') and the Confluence skill (which has a 'no results' fallback), this skill's <process> has no failure branch. Step 2 'Call TestRail MCP (get_case with case_id)' has no handling for case-not-found, MCP unreachable, or access denied. The only related guidance is the pitfall 'Some fields may be empty — document gaps', which covers empty fields but not a failed fetch.Reason: With no failed-fetch branch the agent has no defined behavior on MCP error and may hallucinate test-case content or silently produce an empty artifact. Solution: Add a fallback step mirroring the Jira skill: if get_case returns not-found or an error, verify the case ID/URL with the user and stop with a recorded gap rather than fabricating a case. |
| 🔵 Medium | Self-Validation | Problem: No output-verification step. <pitfalls> says 'Some fields may be empty — document gaps, never assume content' but <process> step 4 just 'Output structured test case artifact' with no re-check that each step has an expected result and that empty fields were marked rather than invented.Reason: Test cases without per-step expected results or with invented content are unusable and risky for downstream test design. Solution: Add a closing verification step requiring the agent to confirm each test step has a paired expected result and that all empty fields are explicitly marked as gaps before emitting the artifact. |
| 🔵 Medium | Success Criteria | Problem: No explicit completion condition. The skill does not state when extraction is done (e.g. all template fields populated or marked as gap, steps and expected results captured per step). Reason: Without testable criteria the agent may stop after capturing the title and skip per-step expected results that downstream test generation depends on. Solution: Add a short done-when/checklist: 'Done when case ID/title/section/steps/expected results are captured, every empty field is marked as a gap, and custom fields are included or noted unavailable.' |
| 🔵 Medium | Decision Branching | Problem: The <process> is a flat 1-4 sequence with no conditional handling: no if/then for missing ID (only <prerequisites> says 'ask if missing'), no branch for fetch failure, and no branch for empty/partial case data. The sibling Confluence and Jira skills both carry explicit conditional branches inside <process>.Reason: A flat sequence with no branches gives the agent no instruction for the common error and empty-data paths, lowering reliability versus the companion MCP skills. Solution: Add explicit conditionals to <process>: if no ID/URL then ask; if get_case fails then verify and stop with gap; if fields empty then mark gap and continue. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/qa-data-collection/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Single Responsibility | Problem: This skill bundles five distinct jobs in one <process>: retrieve test cases (sec 2), search documentation (sec 3), analyze backend source code with framework detection across Spring/Express/FastAPI/.NET (sec 4), discover existing test patterns (sec 5), and produce the raw-data document (sec 6). Section 4 alone carries deep framework-marker and route-decorator knowledge (@GetMapping, router.get(), @app.get(), [HttpGet]) for four stacks.Reason: Five responsibilities in one skill enlarge the cognitive search space and make the file harder to maintain and reuse; the framework knowledge in sec 4 also belongs behind a discovery step rather than baked into the collector. Solution: Keep this skill as the orchestrator/aggregator that delegates via the existing USE SKILL calls, and extract the backend-source-analysis and existing-test-pattern-discovery detail (secs 4-5) into a dedicated skill it references, leaving this file to sequence and assemble the raw-data artifact. |
| 🔵 Medium | Self-Validation | Problem: The skill produces a rich raw-data template (Data Collection Summary with counts of test cases, docs, endpoints, test files) but <process> never asks the agent to verify the summary counts match what was actually collected, or that endpoints in the table trace to a source (TestCase/Docs/Code). No verification step closes the process.Reason: Unverified summary counts and untraced endpoints give downstream gap-analysis and test generation a misleadingly complete picture. Solution: Add a final verification step before emitting raw-data.md: confirm each summary count equals the items collected, each API-endpoint row cites its source, and skipped sections (e.g. backend analysis) are marked N/A rather than omitted. |
| 🔵 Medium | Cognitive Budget | Problem: The skill is the largest of the six (process spans secs 1-6 with multi-level numbered subtrees plus a full output template ~100 lines). Section 4's backend-source priority logic (3 path-resolution sources, RefSrc docs reading, framework markers for 4 stacks, Repomix XML vs source-dir branching) is a single step block an agent must hold while also doing secs 2,3,5,6. Reason: A long multi-branch process loaded at once raises the risk the agent skips sub-steps (e.g. reading RefSrc docs before grepping source), which the Rosetta reliability goal warns against. Solution: Decompose by moving secs 4-5 behind a referenced skill (as above) or split the process into two loadable parts (collection vs codebase analysis) so each delivered chunk is closer to the ~5-step reliable window. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/qa-gap-analysis/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Failure Handling | Problem: require raw-data.md and api-analysis.md to exist, but the gives no handling if these files are missing, empty, or unreadable. The skill assumes they are present. Reason: If api-analysis.md is missing, the cross-reference table in step 1 produces meaningless results, silently degrading the whole chain. Solution: Add a step-0 check: if a prerequisite artifact is missing or empty, stop and ask the user or escalate to the orchestrator rather than cross-referencing against absent data. |
| 🔵 Medium | Success Criteria | Problem: The skill has no explicit testable done-condition. The ends at step 5 (prepare questions) and <output_format> defines the document, but nothing states when gap analysis is considered complete (e.g. every test step cross-referenced, every gap categorized, all critical questions resolved or recorded as assumptions). Reason: Without a done-condition the agent may stop early after partial cross-referencing and proceed to test specification with unresolved gaps. Solution: Add a short success-criteria block stating the skill is done when every test step has a cross-reference entry, all gaps/contradictions/ambiguities are documented with IDs, and all Critical questions are either answered or recorded as assumptions in analysis.md. |
| ⚪ Low | Self-Validation | Problem: There is no output verification step. The agent is not told to re-check that every test step produced a cross-reference row, or that question counts in the Executive Summary match the questions actually listed. Reason: Self-check reduces the risk of the documented summary counts diverging from the actual content. Solution: Add a brief validation checklist: counts in Executive Summary match the documented gaps/contradictions/ambiguities/questions, and no test step is left without a cross-reference entry. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 3 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 3 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 3 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/qa-project-config/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Reference Integrity | Problem: The config location term is inconsistent and undefined. Step 3 says find 'qa-project-config.md in the repo's agent-specific directory', step 5 says save to '<agent_folder>/qa-project-config.md', and step 6 writes to 'agents/qa/{IDENTIFIER}/'. The terms 'agent-specific directory' and '<agent_folder>' are never resolved to a concrete path, so read (step 3) and write (step 5) may target different locations. Reason: If step 3 looks in one place and step 5 writes to another, the 'config not found' branch fires on every run, re-asking the user even when a valid config exists. Solution: Define one operational term for the config directory once (e.g. resolve to the same concrete path used for state, such as agents/), and use that identical term in both the load step (3) and the save step (5) so the file is read from and written to the same place. |
| 🔵 Medium | Precision & Explicitness | Problem: Step 3's branch keys off 'found and non-empty' vs 'not found' but does not handle a found-but-incomplete config (e.g. an existing file missing the minimum fields validated in step 4). The vague directory term ('agent-specific directory') also leaves the search scope ambiguous. Reason: An existing partial config would currently be accepted as valid, leaving later phases without required information such as Swagger availability. Solution: Make the directory path concrete and add an explicit branch for an existing-but-incomplete config: if found but missing minimum required fields, ask only for the missing fields rather than skipping to step 5. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 3 | ⬆️ Slightly better |
| Reference Integrity | 3 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/qa-test-debugging/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/qa-test-implementation/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Reference Integrity | Problem: The skill repeatedly couples itself to external workflow phase numbers it does not own: prerequisites cite 'User approval from Phase 4 received', step 2 says utilities 'identified in Phase 4', and step 6 says 'All assertions from Phase 4 included'. A skill must not depend on a sibling workflow's phase numbering, and if the workflow is renumbered these references break silently. Reason: Phase-number references make the skill fragile to workflow changes and violate skill/workflow isolation; tying to the named artifact keeps the dependency stable. Solution: Replace 'Phase 4' references with the artifact or input they actually depend on (e.g. 'the approved test-specs.md' / 'the shared-utilities plan in test-specs.md'), since the skill already names test-specs.md as the source of the file mapping and assertions. |
| 🔵 Medium | Output Contract | Problem: Unlike the sibling QA skills, this skill has no <output_format> section. It describes code to write and a validation checklist, but does not state the deliverable artifact or where implementation results/summary are recorded for the next phase (debugging). Reason: A defined handoff artifact lets the downstream debugging skill reliably locate what was implemented; its absence forces re-discovery. Solution: Add a short output contract naming the produced test files/utilities and any summary artifact (e.g. list of created/modified files recorded in the phase artifact) that the test-debugging phase will consume. |
| 🔵 Medium | Failure Handling | Problem: require an approved test-specs.md and identified existing patterns, but the has no handling if test-specs.md is missing/unapproved or if no existing test framework/patterns can be found. Step 1 'consolidate from previous phases' assumes all inputs exist. Reason: Without these guards the agent may scaffold tests against an unconfirmed framework or incomplete specs, producing throwaway code. Solution: Add a guard: if test-specs.md is missing or not approved, stop and request it; if no existing test framework/pattern is discoverable, ask the user which framework to use instead of guessing. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 3 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 3 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/repository-implementation-standards/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/requirements-synthesis/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Failure Handling | Problem: The skill assumes multi-source data exists. say 'Collected raw data from at least one source', but the step 1 'Load all source data' has no handling for the case where a source file is missing, empty, or only a single thin source is available (which would make traceability and conflict resolution near-empty). Reason: Synthesizing from missing or empty sources would silently produce a hollow requirements document that looks complete but is not grounded in any source. Solution: Add a guard at step 1: if no source data is loadable, stop and request inputs; if only partial sources exist, proceed but record the missing sources as risks/assumptions so the gaps are visible in the output document. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 3 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/sequential-workflow-execution/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Structural Coherence | Problem: The process list has two steps labeled 9 (step 9 "If the user requests skipping a phase..." followed immediately by step 9a), then jumps to step 10. The 9 / 9a numbering is inconsistent with the otherwise sequential integer numbering and makes ordering ambiguous. Reason: Ambiguous step numbering in a process that the skill itself enforces as strictly ordered can cause an agent to mis-sequence or skip the sub-step. Solution: Renumber so each process item has a unique sequential identifier (e.g., make 9a into step 10 and shift the subagent-dispatch step to 11), or clearly mark 9a as a labeled sub-step of 9 rather than a sibling integer. |
| 🔵 Medium | Cognitive Budget | Problem: Step 9a ("Verification-failure unilateral start") is a single ~200-word paragraph packing the trigger condition, a one-line announcement format, an embedded non-exhaustive MUST NOT list (AskUserQuestion, menus, confirmation phrasings), the "same turn" requirement, and an exception clause. This is far denser than the other one-line steps and risks the agent dropping a sub-directive when scanning. Reason: Briefing notes agents reliably handle a bounded number of directives at once; a wall-of-text gate hides sub-rules and reduces reliable compliance. Solution: Decompose step 9a into a labeled sub-block with discrete bullets: (a) trigger, (b) required one-line announcement format, (c) the MUST NOT list as its own bulleted set, (d) the only acceptable user input. Keep wording; only restructure into atomic items. |
| ⚪ Low | Bloat Control | Problem: Step 9a repeats the same idea (no AskUserQuestion / no menu / no confirmation request / no pause) several times with near-synonymous phrasings within one paragraph. Reason: Repetition adds tokens to a permanently-loaded skill without adding new behavioral information. Solution: State the prohibition once as a single bulleted list of forbidden actions; drop the repeated restatements. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 3 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 3 | ⬆️ Slightly better |
| Cognitive Budget | 3 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/swagger-contracts-analysis/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Self-Validation | Problem: Unlike the other QA skills in this set (testrail-test-case-export, sequential-workflow-execution), this skill has no validation_checklist and no output-verification step. There is no instruction to confirm all target endpoints were covered or that extracted contracts are internally consistent. Reason: An extraction with silent gaps propagates 401/403/404 failures into downstream test design, which the pitfalls themselves warn about but provide no verification gate to catch. Solution: Add a validation_checklist (e.g., every target endpoint has a contract, auth determined per endpoint, data dependencies and creation order captured, code cross-checked against spec where both exist). |
| 🟡 High | Output Contract | Problem: The skill extracts endpoint contracts, auth requirements, and data dependencies but never specifies the shape of its output. when_to_use_skill says "the calling workflow determines ... where to write outputs," yet there is no schema, structure, or canonical example of what an extracted endpoint contract looks like (field set, format, grouping). The detailed bullet lists in step 2 describe what to look for, not the format to emit. Reason: Without a defined output shape, two runs can produce divergent structures, breaking workflows that consume the extracted contract. Solution: Add an output_contract / output_format section giving one canonical example of an extracted endpoint contract (e.g., a markdown or structured block with path, method, params, request/response schema, security, data dependencies) so downstream phases get a deterministic artifact. |
| 🔵 Medium | Example Grounding | Problem: The skill gives concrete framework pattern hints (e.g., router.get(), @GetMapping, [HttpGet]) but provides no example of a completed extraction for any single endpoint, so the abstract instruction set is not grounded in a worked output. Reason: A worked example anchors the expected granularity and format, reducing variance across runs. Solution: Include one short worked example showing an input endpoint definition and the resulting extracted contract. |
| 🔵 Medium | Failure Handling | Problem: Failure handling is limited to step 1 step-4 ("If none found: report back ... request user input"). There is no guidance for partial discovery (spec found but a target endpoint missing from it), spec-vs-code conflicts (only listed as a pitfall), or malformed/unreachable spec. Reason: These are common real-world conditions; without if/then handling the agent may silently emit incomplete or contradictory contracts. Solution: Add explicit branches: endpoint present in spec but absent in code (and vice versa) -> flag discrepancy; spec unreachable/malformed -> fall back to code analysis and note degraded confidence; target endpoint not found anywhere -> report which endpoints are unresolved. |
| 🔵 Medium | Success Criteria | Problem: There is no explicit "done when X, Y, Z" for the analysis. The process ends after step 4 (data dependencies) with no statement of what constitutes a complete, accepted analysis. Reason: Without testable completion criteria the agent cannot reliably decide when to stop or hand back. Solution: Add explicit success criteria, e.g., done when all target endpoints have contract + auth + data-dependency entries and any unresolved gaps are reported to the calling workflow. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 2 | ⬇️ Slightly worse |
| Success Criteria | 2 | ⬇️ Slightly worse |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 2 | ⬇️ Slightly worse |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/testrail-test-case-authoring/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Failure Handling | Problem: There is no guidance for inputs that don't fit the template: e.g., a requirement with no clear acceptance criterion for Traceability, more than 5 parameter combinations (pitfalls say split but the process doesn't state when/how to decide), or a test that is hard to express as single-action steps. Reason: Without explicit handling the author may silently drop required fields or overload a single case, degrading downstream export. Solution: Add brief failure/edge handling: when >5 parameter sets are needed, state the split rule in the process not just pitfalls; when traceability fields are unknown, mark them explicitly rather than omitting. |
| 🔵 Medium | Self-Validation | Problem: The skill defines strict format_rules (MUST use Steps + Expected Results, MUST NOT use BDD, MUST NOT include Post-conditions/Automation, steps numbered, expected results reference their step) but provides no validation_checklist to self-verify an authored test case against those rules before completion. Reason: A self-check gate catches format violations the author skill itself declares mandatory, before they reach export. Solution: Add a short validation_checklist mirroring the format_rules (e.g., no Given-When-Then; each step is a single action; every expected result names the step it follows; no Post-conditions/Automation fields; <=5 parameter sets). |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/testrail-test-case-export/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Safety Boundaries | Problem: This skill performs external write actions (mcp_testrail_add_case creates cases in a live TMS). Step 7 exports each case and the pitfalls note "Re-running export creates duplicate test cases in TestRail (by design, preserves history)." There is no pre-export confirmation gate showing the user how many cases will be created into which section_id before the write loop starts, so an accidental re-run silently mass-creates duplicates. Reason: Bulk creation into a shared external system is hard to reverse; without a count/destination confirmation an accidental re-run pollutes the TMS with duplicates. Solution: Add a confirmation gate before step 7: state the case count and target section_id, optionally use mcp_testrail_get_cases to detect likely duplicates, and require user acknowledgement before the write loop (or explicitly state the parent workflow owns this gate). |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/user-approved-code-changes/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Example Grounding | Problem: Step 4 requires presenting "each proposed change with before/after snippets and file paths; batch if small, otherwise chunk for review," but no example shows the expected before/after presentation format, and no concrete example of an accepted vs rejected approval phrase is given (it defers entirely to skill hitl). Reason: The before/after format and the approval-token discrimination are the load-bearing parts of an approval gate; an example reduces variance in how the gate is presented and judged. Solution: Add one short illustrative before/after presentation block and a positive/negative approval-phrase example (or an explicit pointer that the exact token set comes from the parent workflow / hitl), so the agent renders the gate consistently. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-data-collection.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Example Grounding | Problem: The NEW file deleted the concrete operational examples that BASE had for the keyword-search branch: the example CQL query ( type=page AND space={PROJECT_KEY} AND (text ~ "{term1}" OR text ~ "{term2}")), the result-ranking guidance, and the worked parent/child example (Parent: "Job Post" / Children: "Create a Job Post"...). Step 4 in NEW now only says Extract search terms and Retrieve relevant Confluence pages with no example of how a query is shaped or how children are traversed.Reason: The abstract instruction Retrieve relevant Confluence pages is harder to execute reliably without the concrete query example BASE provided; for an agent that fails to load the referenced skill, no grounding remains.Solution: Keep the search-term extraction and child-page traversal delegated to the skill, but add one short grounded example (a single CQL line and the parent/child illustration) inside <get_confluence> step 4, OR add an explicit note that confluence-source-harvesting owns the query-shape and child-traversal examples so the reader knows where to find them. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Input Contract | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 3 | ⬇️ Slightly worse |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-gap-and-contradiction-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Output Contract | Problem: The NEW file deleted the complete self-contained analysis.md schema that BASE defined (Executive Summary, sections 1 Contradictions, 2 Gaps, 3 Ambiguities, 4 Cross-Reference, 5 Positive Findings, 6 Risk Assessment, plus the per-item C1/G1/A1 record formats). NEW <create_analysis_document> now says only using the output format from the skill, with the following testgen-specific additions and shows just section 7 (Next Steps) and Analysis Metadata. The phase's own output contract is now incomplete and fully dependent on gap-and-contradiction-analysis defining all those sections.Reason: An agent that loads this phase but whose skill output differs from BASE's structure can produce an analysis.md missing contradictions/gaps/risk sections, breaking the downstream Phase 3 question generation that consumed those sections. Solution: Do not re-inline the full schema, but make the dependency explicit and verifiable: in <create_analysis_document> state which top-level sections the skill must produce (contradictions, gaps, ambiguities, cross-reference, risk assessment) so the contract is checkable even if the skill output drifts, and confirm the gap-and-contradiction-analysis skill actually emits sections 1-6 in that order. |
| 🔵 Medium | Example Grounding | Problem: BASE grounded the analysis with concrete Be Specific good/bad examples (e.g. bad Some details missing vs good User authentication method not specified (OAuth, SAML, basic auth?)) and typical-contradiction/gap/ambiguity examples. NEW deleted all of these; <run_analysis> step 3 now just says Identify contradictions, gaps, ambiguities with no positive/negative example.Reason: Without a specificity example the agent is more likely to emit vague findings like Some details missing, which is exactly the failure BASE's example warned against.Solution: Either confirm the gap-and-contradiction-analysis skill carries the specificity good/bad example, or add one short negative+positive example pair to <run_analysis> so the agent has a calibration anchor for what counts as a specific finding. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Output Contract | 3 | ⬇️ Slightly worse |
| Decision Branching | 5 | ✅ Much better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 3 | ⬇️ Slightly worse |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-project-config-loading.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| ⚪ Low | Structural Coherence | Problem: In <obtain_project_info step="0.4"> the numbered list has two items numbered 2: step 2. Ask user to confirm or customize the data retrieval process appears after the 1. ACQUIRE questioning/SKILL.md and the example-format block which is itself numbered 2. The duplicate ordinal makes the step sequence ambiguous.Reason: Duplicate step numbers can cause the agent to skip or merge a step in a sequential phase, though impact is low because the actions remain individually clear. Solution: Renumber the steps in <obtain_project_info> so each has a unique ordinal (ACQUIRE skill = 1, ask question with example = 2, confirm/customize = 3, validate = 4, save = 5). |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-question-generation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/testgen-flow-requirements-document-generation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/testgen-flow-test-case-export.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟠 Very High | Workflow Completeness | Problem: The parent testgen-flow.md declares export-report.md the on-disk evidence that Phase 6 ran and lists it in output_directory; this phase's validation_checklist (line 83) requires it with TMS IDs/URLs, per-case status, and timestamp. But the success-path step update_documents 6.6 only writes test-scenarios.md and testgen-state.md — it never writes export-report.md. The file is written only inside the step 6.2 fallback branches (manual/CSV/defer). On the normal TMS-export path the required deliverable is never created. Reason: A deliverable required by the validation checklist and by the parent flow's skip/evidence logic has no creation step on the main path, so an agent can mark Phase 6 complete (and a later run's skip-gate can pass) without the evidence file the chain depends on. Solution: Add an explicit instruction in step 6.6 to create agents/testgen/{TICKET-KEY}/export-report.md on the success path, populated from the per-case results tracked in step 6.5 (TMS IDs/URLs, status, timestamp), and make the step-6.2 fallbacks write into that same file so every branch has one owning step. |
| 🟡 High | Output Contract | Problem: The base file specified a concrete CSV column order and a canonical TestRail export-summary table example. The new file keeps a CSV column list only inside the fallback branch of step 6.2 (line 45) and gives no canonical example of the export-report.md content (the primary deliverable), only a prose list of required fields in the checklist (line 83).Reason: Without a canonical example for the main deliverable, the report layout is left to agent interpretation, reducing determinism of the Phase 6 evidence file. Solution: Add one short canonical export-report.md skeleton (target info + a TC-to-TMS-ID result table) near step 6.6, mirroring the field list already required by the checklist, so the output shape is deterministic without depending on a fallback branch. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 3 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-test-case-generation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| ⚪ Low | Safety Boundaries | Problem: The base validation checklist contained explicit negative format constraints — NO BDD format (Given-When-Then), NO Post-conditions field, NO Automation field — directly inside this phase. The new file removes all three negative constraints; the inline TC schema (lines 86-120) and checklist (lines 253-262) state only positive fields and rely on the testrail-test-case-authoring skill for the format ban.Reason: The deleted negative constraints are enforced by the mandatory skill, but the inline fallback template used when the skill is unavailable no longer restates them, so the format ban is not self-contained in the documented degraded path. Solution: No change required for correctness: verified that testrail-test-case-authoring/SKILL.md (lines 19-21, 192) carries the MUST NOT BDD / MUST NOT Post-conditions / MUST NOT Automation constraints, and step 5.3 mandates USE SKILL testrail-test-case-authoring. The constraint is preserved via delegation. Optionally add a one-line reminder in the inline-fallback note (line 87) that the self-contained template must also avoid BDD/Post-conditions/Automation, since that template is the explicit fallback when the skill is unavailable. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Output Contract | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
📋 Prompt Quality Validation Report❌ Validation FailedSummary by File
📄
|
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Input Contract | Problem: Unlike its sibling AQA skills, this skill has no dedicated <input_contract> section. Inputs are described only in prose: lists 'Raw test case data', 'API endpoint contracts', 'Gap analysis' as availability bullets, and <when_to_use_skill> says 'the calling workflow determines input/output file paths'. There is no table naming the expected file paths, formats, or required fields the calling workflow must supply. Reason: Without an explicit input contract the agent cannot validate inputs deterministically before the step-1 GATE, and the calling workflow has no precise handoff format to satisfy. Solution: Add a short <input_contract> listing the three required inputs (test cases, endpoint contracts, clarifications) with their expected form (file or inline) and the minimum content each must contain, mirroring the aqa-codebase-analysis input table. Keep paths workflow-supplied but state what shape is expected. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 3 | ⬆️ Slightly better |
| Output Contract | 4 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/api-test-spec-authoring/references/templates-and-redaction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/aqa-codebase-analysis/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Cognitive Budget | Problem: The SKILL.md is ~12.3K chars resident before any reference is loaded. Per the audit spec, prompts in the 10K-20K char band warrant a high-severity note for reliable evaluation/cognitive load. Most heavy material (report template, worked example) was correctly moved to references/report-template.md, but the SKILL.md still carries an 8-step process plus full <input_contract> table, <output_format>, <safety_boundaries>, <failure_handling>, a 9-item <validation_checklist>, and a 10-item list — several restate the same constraints (e.g. 'do not modify source files' appears in safety_boundaries, validation_checklist, and pitfalls). Reason: Resident SKILL.md size sits in the band the spec flags; the repetition of the same constraint across three sections inflates the cognitive budget for every agent that loads the skill, even when only a subset of steps applies. Solution: Trim duplicated constraints that appear in three places (the analysis-only / no-source-write rule, and the Coverage epistemic-honesty rule) to a single canonical statement referenced by the others, as the file already does for the 'Path precedence on conflict' rule. This shrinks the always-resident cost without losing any contract. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 3 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/aqa-codebase-analysis/references/report-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/aqa-requirements-elicitation/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/aqa-selector-management/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Cognitive Budget | Problem: SKILL.md is ~13.3K chars resident — within the 10K-20K band the audit spec flags as a high-cognitive-load note. Although Part-B mechanics, the tier table, the example pair, and the Part-B failure/validation/pitfalls extensions are correctly deferred to the reference, the file still keeps both the full Part-A and Part-B orchestration, the input_contract table, two conflict-precedence blocks, safety_boundaries, failure_handling, validation_checklist, and pitfalls resident at once, even though any single invocation runs only Part A or only Part B. Reason: A Part-A-only invocation currently pays the resident cost of Part-B's process steps 5-7 and conflict-precedence detail it will never use, putting the file in the size band the spec marks for cognitive-budget concern. Solution: Consider splitting the resident <validation_checklist> / / <failure_handling> Part-A vs Part-B halves entirely into the reference (the file already does this for Part-B), so a Part-A invocation never loads Part-B orchestration steps 5-7 and vice versa. This further shrinks the per-invocation resident cost without losing the shared contracts. |
| 🔵 Medium | Single Responsibility | Problem: The skill bundles two distinct responsibilities: Part A 'read-only identification' (steps 1-4, invoked by aqa-flow-selector-identification) and Part B 'writes page-object files' (steps 5-7, invoked by aqa-flow-selector-implementation). <when_to_use_skill> states the two parts 'are invoked by separate phases and may run independently'. This stretches the 1-2 related-responsibilities guideline (identify vs. implement, read-only vs. write). Reason: Combining a read-only identification phase and a file-writing implementation phase in one skill is a real SRP tension, but it is consciously justified and scoped per-invocation, so it is a low-severity observation rather than a regression. Solution: No change required if kept as-is: the 'Why one file' rationale in references/strategy-and-template.md documents the deliberate decision (shared 4-tier taxonomy, selector-inventory shape, and fragile-selector handoff are tightly coupled and would drift if split). The per-phase scope binding in <input_contract> and lazy-loading of Part-B detail mitigate the cost. Recorded as a noted tradeoff, not a defect. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 3 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/aqa-selector-management/references/strategy-and-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/aqa-test-authoring/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Reference Integrity | Problem: Step 2 hard-codes a cross-skill dependency on another skill's internal step number: 'the closest-aligned default with sibling aqa-codebase-analysis step 6's location rule' and <input_contract> references aqa-selector-management Part B. Skill-to-sibling-skill step-number coupling is fragile if that sibling renumbers.Reason: Citing a sibling skill's internal step number breaks skill-isolation boundaries and silently rots when the sibling is edited. Solution: Reference the owning phase/skill by logical name only (e.g. 'the selector-implementation phase owns page-object edits') without citing the sibling skill's internal step number; let the workflow bind the concrete threshold/decision. |
| 🔵 Medium | Cognitive Budget | Problem: SKILL.md is dense and long for an always-loaded entry: the <input_contract> carries a six-row table plus a four-level precedence rank plus a multi-bullet GATE, and step 2's location branch duplicates threshold logic also stated in <success_criteria>. A single read carries many parallel rule sets at once.Reason: The skill already uses progressive disclosure for the template; the contract/precedence detail is the next-largest block that could be deferred to keep the entry surface within reliable reading budget. Solution: Move the verbose existence/scope GATE bullets and the precedence rank into the existing reference file (alongside the template) and keep only a one-line pointer in SKILL.md, mirroring how the output template was already deferred. |
| 🔵 Medium | Bloat Control | Problem: The same rules are restated across four sections. The repo-docs-win precedence appears in <input_contract> (rank list), again in step 4, in <safety_boundaries>, in <success_criteria>, in <validation_checklist>, and in <pitfalls>. The 'do not silently drop unimplementable assertions' rule appears in step 4, success criteria, failure handling, validation checklist, and pitfalls.Reason: Heavy repetition inflates the always-loaded SKILL.md surface and raises the chance an agent edits one copy and leaves the others inconsistent. Solution: State each rule once in its owning section and reference it by name elsewhere (the file already does this for <validation_checklist> as single source of truth; apply the same to precedence and the silent-drop rule). |
| ⚪ Low | Example Grounding | Problem: The skill describes assertion-mapping, import order, and wait strategies abstractly (step 3a-3d) but the only concrete example lives in the separate template reference; no inline positive/negative example of, e.g., an implemented assertion vs an uncovered one. Reason: The most error-prone rule (silent assertion drop) has prose but no concrete grounding example to anchor the boundary. Solution: Add one short inline positive/negative pair for the highest-ambiguity rule (implemented assertion vs recorded uncovered assertion), or point step 3 readers to the sibling execution-analysis skill's worked example. |
| ⚪ Low | Structural Coherence | Problem: Step 5 says the output template is 'load on demand at step 4' while step 4 itself also references the template, and <output_format> repeats the same load-on-demand note — the load point is stated in three places with slightly different step attributions (step 4 vs step 5).Reason: Minor inconsistency about which step loads the template could make an agent load it at the wrong point or load twice. Solution: State the template load point once (e.g. 'load at step 4 before emitting') and have step 5 and <output_format> reference that single statement. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 3 | ⬆️ Slightly better |
| Cognitive Budget | 3 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/aqa-test-authoring/references/test-implementation-template.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| ⚪ Low | Conflict Resolution | Problem: The intro paragraph re-states the division of labor ('base SKILL.md keeps the process orchestration, GATE, precedence rules...') already declared in SKILL.md. A reference file restating its parent's responsibility list is mild duplication that can drift. Reason: Listing the parent's owned sections here duplicates SKILL.md and risks the two lists diverging on edit. Solution: Shorten the intro to one line stating the file holds only the verbatim template; drop the enumerated list of what SKILL.md owns. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/aqa-test-debugging/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Cognitive Budget | Problem: SKILL.md packs Part A (six numbered steps with sub-steps), Part B (three steps), the iteration-cap state-file protocol, a six-field proposed-change block, output format, full failure handling, two-part safety boundaries, and a two-part validation checklist into one entry. The reader carries many parallel rule sets simultaneously. Reason: The skill already uses progressive disclosure for the escalation note; the proposed-change/iteration mechanics are the next-largest deferrable block to keep the entry within reliable budget. Solution: Defer the verbose Part B proposed-change block and the iteration-cap state-file mechanics to a reference (the escalation template is already deferred), leaving a short pointer in SKILL.md. |
| 🔵 Medium | Bloat Control | Problem: The seven-category taxonomy is fully restated in step 3, then re-listed by name in <validation_checklist> ('Selector / Locator |
| 🔵 Medium | Single Responsibility | Problem: The skill owns both Part A (report analysis / categorization) and Part B (apply approved code corrections) plus the 3-iteration cap orchestration. That is two distinct responsibilities (read-only triage vs mutating the repo with approval gates) in one always-loaded skill. Reason: Analysis and code-mutation are different risk classes; bundling them enlarges the surface an agent must hold and the safety scope it must respect at once. Solution: No rewrite required, but note the dual responsibility; if it grows, splitting Part B's apply discipline into a reference or a sibling skill would tighten SRP. At minimum keep the A/B boundary explicit (already done). |
| ⚪ Low | Structural Coherence | Problem:<output_format> shows a Proposed Corrections block (File / Current / Proposed / Reason / Impact) while step 7's inline block and <validation_checklist> Part B add a 'Risk' field not present in the <output_format> example — the canonical proposed-change shape differs across the three places.Reason: Divergent field lists for the same artifact let an agent emit a change block missing the Risk field that the checklist then flags as incomplete. Solution: Align the field set in step 7, <output_format>, and the Part-B checklist so all three list the same fields (add Risk to the <output_format> example or drop it from the checklist). |
| ⚪ Low | Reference Integrity | Problem: Repeated cross-references to other phases by number ('Phase 4 selector-identification step 4.2', 'Phase 5', 'Phase 1 plan filename') and to aqa-flow-code-analysis.md <naming_convention> couple this skill to sibling phase internals and exact step numbers.Reason: Pinning to another file's internal step numbers is fragile and breaks skill-isolation boundaries when siblings are edited. Solution: Reference phases by logical role/name and the naming convention by name, without pinning to specific sibling step numbers that rot on renumber. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 3 | ⬆️ Slightly better |
| Cognitive Budget | 3 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/aqa-test-debugging/references/escalation-template.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| ⚪ Low | Conflict Resolution | Problem: The closing paragraph restates the no-4th-iteration-without-waiver rule that step 9 in SKILL.md already owns ('the Track-iteration step 9 rule in SKILL.md governs this'). The reference and the parent both assert the rule. Reason: Duplicating the governing rule across parent and reference creates two edit points for one behavior. Solution: Keep the verbatim escalation note plus the 'ask the user' instruction; drop the restated governance sentence and let SKILL.md remain the single owner of the waiver rule. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/automation-test-execution-analysis/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Safety Boundaries | Problem: This skill writes a tracked analysis artifact from raw logs/CI output, request/response captures, and stack traces, but unlike the sibling aqa-test-debugging (which has an explicit redaction rule for Authorization: Bearer, API keys, PII) and confluence-source-harvesting, it has NO <safety_boundaries> section and no redaction instruction. Logs routinely embed tokens/PII.Reason: Test execution logs and HTTP captures frequently contain bearer tokens, API keys, and customer PII; writing them verbatim into a tracked, downstream-fed artifact is a leak the sibling skills explicitly guard against but this one does not. Solution: Add a <safety_boundaries> block requiring a credential/PII redaction scan on log lines, stack snippets, and request/response captures before writing the analysis artifact, mirroring the sibling debugging skill's redaction rule. |
| ⚪ Low | Structural Coherence | Problem: The Fact-vs-Hypothesis-flag-mandatory rule is stated three times: end of step 8, the note under <output_format>, and a <validation_checklist> line.Reason: Triple statement of one rule is minor bloat and an extra edit point, though here it reinforces a high-value rule so impact is low. Solution: State the mandatory-flag rule once (the output-format note is the natural owner) and reference it from the checklist. |
| ⚪ Low | Reference Integrity | Problem: The category list differs between the two places it appears: <process> step 7 lists 'environment, data, product regression, test bug, flakiness, infra timeout, auth/session, selector/locator, contract mismatch, unknown' while the <output_format> enum uses hyphenated variants 'product-regression |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 3 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/automation-test-implementation-handoff/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Dependency Management | Problem:<core_concepts> and step 4 instruct the implementation phase to ACQUIRE and USE SKILL coding-agents-prompt-authoring ('ACQUIRE coding-agents-prompt-authoring/SKILL.md FROM KB and USE SKILL coding-agents-prompt-authoring') as a mandatory load alongside the domain test-authoring skill. That skill authors prompts/skills, not automated test code; loading it into a test-implementation handoff couples this skill to an unrelated capability and adds a large always-on dependency.Reason: Pulling a prompt-authoring skill into every test-implementation run is an out-of-scope dependency that bloats context and can misdirect the agent toward prompt work instead of writing test code. Solution: Remove coding-agents-prompt-authoring from the mandatory ACQUIRE list in <core_concepts> and step 4 (keep coding, testing, repository-implementation-standards, and the parent-named domain skill). If a specific prompt-authoring step is genuinely needed, scope it to that step rather than as a blanket load. |
| 🔵 Medium | Bloat Control | Problem: The domain-skill-required rule and the 'never silently fall back to coding+testing' rule are restated in <core_concepts>, the <input_contract> row, step 4 GATE, <failure_handling> (two branches), <validation_checklist>, and <pitfalls> — six-plus repetitions of one constraint.Reason: One constraint stated six times inflates the entry and multiplies edit points for a single behavior. Solution: Keep the full rule in the step 4 GATE (its owning location) and reference it by name from the other sections rather than re-stating the silent-fallback prohibition each time. |
| 🔵 Medium | Single Responsibility | Problem: The skill loads four foundational skills ( repository-implementation-standards, coding, testing, coding-agents-prompt-authoring) plus a parent-named domain skill plus its own handoff/state/HITL sequence. The mandatory multi-skill orchestration broadens the skill beyond the single 'land code, validate, hand off' responsibility.Reason: Each mandatory ACQUIRE/USE is an extra round trip and an extra responsibility the skill takes on; fewer, scoped loads keep it focused and cheaper. Solution: Trim the mandatory loads to the ones the handoff actually needs (see Dependency Management issue) so the skill's responsibility stays 'implement approved tests and hand off', not 'compose five skills'. |
| ⚪ Low | Structural Coherence | Problem: Steps 1-4 each begin with 'ACQUIRE ... FROM KB and USE SKILL ...' duplicating the identical list already in <core_concepts> bullets — the same five ACQUIRE lines appear twice (concepts and process).Reason: Verbatim duplication of the load list across two sections is redundant and risks the two copies diverging. Solution: List the skill loads once (either as concept bullets or as process steps) and have the other section reference them, not repeat the ACQUIRE strings verbatim. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 3 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/confluence-source-harvesting/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: The same behaviors are stated in <process>, then fully re-described in <failure_handling>, then again in <validation_checklist>, and again in <pitfalls>. Examples: the redaction rule appears in <safety_boundaries>, <failure_handling> ('Page contains material requiring redaction'), and <validation_checklist>; 'permission errors are not empty content' appears in <safety_boundaries>, <failure_handling>, <validation_checklist>, and <pitfalls> (four times).Reason: The 401/403 rule restated four times and the redaction rule three times inflate an already-large SKILL.md and create multiple drift points for one behavior. Solution: Keep each rule in its owning section (redaction in <safety_boundaries>, GATE branches in <failure_handling>) and reference by name from the checklist/pitfalls rather than re-describing. |
| ⚪ Low | Example Grounding | Problem: The redaction patterns are well exemplified (concrete shapes like eyJ..., postgres://user:pass@), but the truncation banner rule (step 6) and link-normalization (step 7) have no concrete before/after example for what a truncation banner or canonical URL should look like.Reason: The truncation/normalization rules are the least-grounded instructions in an otherwise example-rich skill. Solution: Add a one-line example banner string and one canonical-vs-display URL pair to step 6/7, matching the concreteness of the redaction examples. |
| ⚪ Low | Structural Coherence | Problem: The <input_contract> row for 'Configured Confluence site / base URL' carries an inline parenthetical clarification ('NOT this skill — but step 8 GATE relies on it being knowable') that mixes contract data with rationale, and <process> step references (3, 5, 8) are duplicated in the <failure_handling> preamble that re-points to the same GATEs.Reason: Mixing rationale into the contract cell and double-stating the GATE mapping is minor structural noise, low behavioral impact. Solution: Move the rationale out of the contract table cell into a one-line note below the table; state the GATE-to-failure-handling mapping once. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 3 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/gap-and-contradiction-analysis/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Input Contract | Problem: Prerequisites name raw-data.md as an example input path but the skill never states who supplies the path or its required structure. <failure_handling> says 'whatever the parent workflow points at' — the actual input contract is left to the parent.Reason: Without a named input source the agent may guess the path when run standalone. Solution: Add one line in stating the input path is supplied by the parent workflow and that the skill accepts one or more source files in any text format. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/mcp-confluence-data-collection/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Cognitive Budget | Problem: File is 18,635 chars (10K-20K band). The <vendor_replacement> block alone runs ~25 lines enumerating Notion/SharePoint/GitBook/etc. per item. For an MCP-extraction skill whose core process is 8 steps, the meta-portability prose dominates and competes with the operational instructions an agent must follow at runtime.Reason: Runtime extraction does not need the vendor-swap guidance; it inflates every cached send and the active reasoning surface. Solution: Move the <vendor_replacement> block to a references/vendor-swap.md loaded on demand (progressive disclosure), leaving a one-line pointer in SKILL.md. This is the same pattern qa-data-collection used for backend-source-analysis.md. |
| 🔵 Medium | Structural Coherence | Problem:<process> step 5 (Fallback: no results) logically belongs with the no-URL search path (step 2) but sits after child-page retrieval (step 3) and per-page extraction (step 4), so the zero-result branch is read out of execution order.Reason: Out-of-order branch placement makes the control flow harder to trace correctly. Solution: Reorder so the no-results fallback is adjacent to the search step, or add a forward-reference note at step 2.2 pointing to step 5. |
| 🔵 Medium | Bloat Control | Problem: Redaction targets, grep patterns, and placeholder vocabulary are stated in <safety_boundaries>, then re-described (with 'single source of truth' pointers) again in <process> step 8, <validation_checklist>, and <pitfalls>. The cross-pointers are correct but the repeated framing adds volume.Reason: Repetition increases length without adding new instruction. Solution: Keep the canonical list in <safety_boundaries> and shorten the re-pointers in step 8 / checklist / pitfalls to a bare reference without re-stating what is redacted. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/mcp-jira-data-collection/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Structural Coherence | Problem: Step 5 (final redaction re-scan) and step 4 (pre-emit validation) both precede the actual emit, but the skill never has an explicit 'emit/write the artifact' step the way mcp-testrail step 5 ('Emit ...') does. The output-write action is implicit. Reason: An implicit write step leaves the terminal action of the skill unstated. Solution: Add an explicit final step 'Emit the artifact per <output_format>' after the redaction re-scan, mirroring mcp-testrail-data-collection step 5. |
| 🔵 Medium | Workflow Completeness | Problem: Step 6 reads 'Custom-field discovery fallback: see step 3 custom-fields branch (canonical) — no separate procedure.' It is a numbered step that contains no action, only a pointer back to step 3. A numbered execution step with no operation is a dangling placeholder. Reason: An empty numbered step invites the agent to look for an action that does not exist, breaking the sequential read. Solution: Remove step 6 as a numbered step and fold its pointer into step 3, or relabel it as a note rather than a process step. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/mcp-testrail-data-collection/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| ⚪ Low | Success Criteria | Problem: Unlike its sibling mcp-jira-data-collection, this skill has no dedicated <success_criteria> block; the done-conditions are only inferable from <validation_checklist>. The two sibling MCP skills are inconsistent in structure.Reason: Consistent completion contracts across the sibling MCP skills reduce ambiguity for the parent qa-data-collection workflow that delegates to all three. Solution: Add a short <success_criteria> block mirroring mcp-jira-data-collection so the completion contract is stated once explicitly rather than reconstructed from the checklist. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/qa-data-collection/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Single Responsibility | Problem: The skill orchestrates five distinct data-collection responsibilities: TMS retrieval (delegated), documentation search (delegated), backend source-code framework analysis (step 4), existing-test-pattern discovery (step 5), plus the safety/assembly pass. Steps 4 and 5 are substantial in-house analysis jobs layered on top of the orchestration role. Reason: A collection orchestrator that also performs deep codebase pattern analysis carries more than the healthy 1-2 responsibilities, raising the per-run reasoning load. Solution: Consider whether step 5 (existing-test-pattern discovery) belongs in a separate skill the way backend-source-analysis was extracted to a reference; at minimum keep it as orchestration with detail deferred. |
| ⚪ Low | Reference Integrity | Problem: Step 1 references qa-project-config.md without a path, while <failure_handling> refers to it as agents/qa/qa-project-config.md. The prerequisites list it as qa-project-config.md only. The path is given in one place but not the other two.Reason: Inconsistent path forms can lead the agent to read or create the config at the wrong location. Solution: Use the full agents/qa/qa-project-config.md path consistently in and step 1, matching <failure_handling>. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/qa-data-collection/references/backend-source-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/qa-gap-analysis/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Structural Coherence | Problem: Step 5 'Prepare Prioritized Questions' instructs asking the user and recording answers, but HITL/user interaction in Rosetta is meant to be governed by bootstrap-hitl-questioning, not embedded per-skill. The skill drives a user Q&A loop (Critical/Important/Optional questions, re-ask, defer) inline. Reason: Per the prompt-authoring hardening rule, active user involvement should route through the HITL bootstrap, not be duplicated as skill-local interaction control. Solution: Keep the question CATEGORIZATION here but state that the actual asking is performed under the workflow's HITL protocol, rather than embedding the ask/re-ask loop as skill-owned behavior. |
| 🔵 Medium | Bloat Control | Problem: The user-involvement / question-resolution logic appears in three places at length: <process> step 5, <success_criteria> (the four-state Critical-question resolution list), and <failure_handling> (the no-response deferral). The Critical-question resolution rule is restated nearly verbatim in success_criteria and failure_handling.Reason: The same resolution policy expressed three times inflates the file without adding behavior. Solution: State the four-state resolution rule once (in <success_criteria>) and have <failure_handling> reference it rather than re-describe the Assumption-with-Deferred-tag mechanics. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/qa-project-config/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Input Contract | Problem: The skill has a <prerequisites> block but no explicit <input_contract> defining the shape/format of the test case reference it must parse in step 1. The supported formats are shown as freeform quoted strings (e.g. "Write API tests for TC-1234") rather than a structured contract of accepted inputs and validation rules.Reason: Sibling skill qa-test-implementation in the same PR defines a formal <input_contract>; the missing formal contract here makes input validation slightly less determinate for the agent.Solution: Add a short input contract table naming the accepted reference kinds (TestRail ID pattern TC-NNNN, Jira key PROJ-NNN/URL, freeform description), which are required vs optional, and the validation each must pass before an IDENTIFIER is derived. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/qa-test-debugging/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Failure Handling | Problem: Unlike its sibling skills in this PR (qa-project-config, qa-test-implementation, repository-implementation-standards all have a dedicated <failure_handling> block), this skill has no <failure_handling> section. Failure cases are scattered: step 1 covers a missing report (ask user), and step 8 caps iterations at 3, but there is no handling for an unparseable/corrupt report, an empty report (zero results), or the user declining to provide a report path after the step-1 ask.Reason: The missing failure paths mean the agent has no defined behavior when the report input is bad, risking fabricated analysis or a silent stall — a reliability gap the sibling skills do not have. Solution: Add a <failure_handling> block covering: report path not provided after the step-1 ask (stop, report to workflow), report present but unparseable/empty, and zero failures found (skip Part B, mark complete). Keep iteration-cap escalation cross-referenced from step 8. |
| 🔵 Medium | Single Responsibility | Problem: The skill bundles two distinct responsibilities — Part A (report analysis / categorization, read-only) and Part B (applying code corrections, write + lint). Part B mutates test source files and runs linting, a materially different risk profile from the read-and-report Part A. Reason: The two parts have different safety surfaces; conflating them in the skill's stated responsibility slightly blurs when write access is in scope. Solution: This is acceptable as one skill since both center on the failure-to-fix loop, but make the split explicit in <when_to_use_skill> (it currently lists both as one flow) so a caller can invoke analysis-only without implying the correction mandate. |
| 🔵 Medium | Structural Coherence | Problem: The <pitfalls> block (line 221) instructs the agent to 'apply <safety_boundaries> redaction before writing' and step 3 (line 86) does likewise, but <safety_boundaries> is defined later in the file (line 225), after <pitfalls> and after the process that depends on it. The redaction rule the process leans on is the single source of truth yet sits below its first invocation.Reason: Ordering matters for an agent reading top-to-bottom; the step-3 template tells the agent to redact 'BEFORE writing' but the actual redaction target list appears far below, increasing the chance redaction is applied loosely on first read. Solution: Move <safety_boundaries> above <process> (or at least above the step-3 failure-documentation template), so the redaction targets are defined before the steps that reference them. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 3 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/qa-test-implementation/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Safety Boundaries | Problem: This skill writes test code and shared utilities to disk, and step 3's AuthHelper example handles tokens, yet there is no <safety_boundaries> block. The validation checklist line 'No hardcoded URLs / credentials / production data in test files. Synthetic test data only' is the only guard, and it lives in the checklist rather than as an explicit boundary section like its sibling skills (qa-project-config, qa-test-debugging, requirements-synthesis) all carry.Reason: An implementation skill that produces auth helpers and request payloads is a credential-leak surface; the other QA skills in this PR all define an explicit boundary, so its absence is an inconsistency that weakens the guarantee. Solution: Add a short <safety_boundaries> block stating that generated test/helper code must never embed literal secrets or real production data (use env vars/fixtures), mirroring the redaction discipline the other QA skills define, and reference it from the checklist item. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/qa-test-implementation/references/multi-language-examples.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Example Grounding | Problem: Minor cross-language inconsistency: the Python and TypeScript AuthHelper examples both expose an auth_headers/authHeaders convenience method, but the Java AuthHelper (lines 73-77) exposes only getToken and the test inlines the Authorization header construction. The three canonical helpers are not structurally parallel even though the file claims to 'mirror SKILL.md steps 3 and 4'.Reason: An agent copying the helper shape across languages may expect a parallel header helper and not find one in Java, a small consistency gap; not behavior-breaking since the test shows the inline alternative. Solution: Either add an authHeaders-equivalent to the Java AuthHelper for parity, or add a one-line note that Java RestAssured idiomatically inlines the header so the helper intentionally omits it. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/repository-implementation-standards/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/requirements-synthesis/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Input Contract | Problem: Inputs are described in <prerequisites> ('Collected raw data from at least one source', 'Analysis of gaps/contradictions (if performed)', 'User answers ... (if collected)') and step 1 says 'Load all source data (raw-data files, analysis output, user answers if present)', but no explicit input contract names the expected filenames/paths or how the agent locates them. The sibling skill qa-test-implementation in this PR formalizes its inputs in an <input_contract> table with default paths; this skill leaves them implicit.Reason: Without named input artifacts the agent must guess which files constitute 'source data', and <failure_handling> already references answers.md by name (line 153) — that filename should be declared in an input contract for consistency.Solution: Add an <input_contract> (or expand prerequisites) naming the default artifact names the skill looks for (e.g. raw-data file(s), analysis output, answers.md) and that the calling workflow may override paths, mirroring the qa-test-implementation contract. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/requirements-synthesis/references/output-schemas.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/sequential-workflow-execution/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Example Grounding | Problem: Step 10b gives one concrete announcement string ('skip refused: state row missing -> starting at Phase 0') and there is a state-delta template, but the more nuanced gate-priority decisions (step 8 vs step 9 vs step 10) are explained only abstractly in the table with no worked transition example showing the agent picking the right step. Reason: The three-gate distinction is the most error-prone part of this skill (the pitfalls themselves call it out); a single grounded example reduces the chance of misapplying step 10's no-questions rule to a genuine HITL gate. Solution: Add one short worked example showing a transition that superficially matches both step 8 and step 10, and how the precedence rule resolves it to step 8, so the abstract precedence rule has a concrete anchor. |
| 🔵 Medium | Input Contract | Problem: The skill depends on several inputs supplied by the parent workflow (current phase id, the phase markdown ACQUIRE target, the workflow state file path, the in-scope subagent dispatch contract) but there is no explicit input-contract section listing them with required/optional status. Step 1 says 'Confirm current phase id and its ACQUIRE target from the parent workflow' and step 5 says 'Update the workflow state file path provided by the parent workflow' but these bindings are scattered through the process narrative. Reason: Without a consolidated input contract a subagent cannot tell at a glance which bindings must be present before stepping, raising the chance of starting a phase with a missing state-file path. Solution: Add a short input-contract table naming the parent-supplied bindings (phase id, phase-file ACQUIRE target, workflow state file path, HITL transition flags, loop target) with required/optional markers, mirroring the pattern already used in the sibling user-approved-code-changes skill. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 3 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 3 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/swagger-contracts-analysis/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Cognitive Budget | Problem: At 14714 chars the SKILL.md is in the 10K-20K range flagged by the audit spec as a high-attention size. The safety_boundaries section duplicates a long credential-pattern list that also appears in the validation_checklist redaction item and again in step 5's emit flow, repeating the same grep patterns ('Bearer ', 'Authorization:', 'password:', 'api_key=', 'client_secret', JWT 'eyJ...', 'BEGIN PRIVATE KEY', 'postgres://user:pass@') three times. Reason: The threefold repetition of the same pattern list inflates the always-loaded SKILL.md without adding new instruction, pushing the file toward the high-cost size band where a leaner version would load more cheaply on every invocation. Solution: State the credential-pattern list once in safety_boundaries and have the validation_checklist and step 5 reference it ('re-grep per safety_boundaries pattern list') rather than re-enumerating the full pattern set each time. |
| 🔵 Medium | Failure Handling | Problem: Failure modes are handled across the process (step 1.4 'if none found, report back', step 5.2 'flag back as a gap') and the pitfalls, but there is no consolidated failure_handling section covering ambiguous routing, parsing failure, conflicting spec-vs-code sources beyond Notes, or GraphQL fallback. The GraphQL case appears only as a pitfall ('Not handling GraphQL APIs — adapt analysis to use schema introspection') rather than a handled branch in process. Reason: Scattered failure rules are easy to miss; a subagent hitting an un-parseable route or a GraphQL API has to reconstruct the correct behavior from a pitfall line instead of a handled branch. Solution: Add a brief failure_handling section consolidating the not-found, ambiguous-routing, parse-failure, and GraphQL-fallback cases that are currently scattered between step 1.4, step 5.2, and the pitfalls, with explicit if/then for each. |
| 🔵 Medium | Success Criteria | Problem: There is no dedicated success-criteria / done-when section. The validation_checklist functions as a proxy (coverage, citations, no-fabrication, reconciliation evidence) but completion is never stated as a single testable condition the way the sibling testrail-test-case-authoring skill does with its explicit success_criteria block. Reason: A subagent needs an unambiguous done-when condition to stop reliably; relying on the reader to infer completion from a checklist is weaker than a stated success criterion. Solution: Add a short success_criteria block stating the case is done when every target endpoint has a contract entry or a flagged gap, every entry has a citation, all hybrid entries have non-empty Notes, and the redaction scan passed — pointing to the existing validation_checklist for the detail. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 3 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 3 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/swagger-contracts-analysis/references/canonical-example.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| ⚪ Low | Safety Boundaries | Problem: This is a progressive-disclosure reference, not a procedural prompt, so most procedural gates are non-applicable; for those it inherits the parent SKILL.md context (score 4, comparison 3). The example body shows real-looking ID values ('o-123', 'c-1', 'customer_id') which is correct as synthetic, but the example never demonstrates the redaction discipline that the parent SKILL.md safety_boundaries mandates — a reader copying this 'one complete worked entry' as a template gets no in-example reminder that a real Authorization header value would need a placeholder. Reason: The example is explicitly positioned as the template to copy for the first entry of a new project; modeling redaction inside it reinforces the parent skill's most safety-critical rule at the exact point of imitation. Solution: Add one line in the example's Auth or Notes block showing a redacted credential placeholder (e.g. Authorization example value shown as '<redacted: bearer token>') so the canonical example also models the safety discipline it is meant to anchor. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/testrail-test-case-authoring/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Cognitive Budget | Problem: The file is 16224 chars, in the 10K-20K high-attention band per the audit spec. The validation_checklist alone re-derives format-compliance, step discipline, naming, parameterization, traceability honesty, safety, and required-field checks that are each already stated as MUST/MUST-NOT rules earlier in format_rules, naming_conventions, epistemic_honesty, and safety_boundaries, so the agent processes the same constraint set twice at full length. Reason: A second full-length restatement of every rule as a checklist roughly doubles the constraint text the agent must hold for this one skill, increasing context load on every invocation without adding new behavior. Solution: Trim the validation_checklist to verification-only items that the upstream sections cannot self-enforce (e.g. the re-grep for BDD keywords and credential shapes) and reference the earlier sections for the rest, instead of re-stating each rule in full. |
| 🔵 Medium | Bloat Control | Problem: At 16224 chars this is the largest file in the group and several rules are stated three times. The 'do not invent FR-X/US-X/AC IDs' rule appears in success_criteria, input_contract, pitfalls, epistemic_honesty, failure_handling, and validation_checklist. The 5-parameter-set cap appears in success_criteria, input_contract, pitfalls, failure_handling, and validation_checklist. The safety redaction rule appears in success_criteria, pitfalls, safety_boundaries, and validation_checklist. Reason: The same three rules restated 4-6 times each inflate the always-loaded SKILL.md without new instruction, and a reader must reconcile near-duplicate phrasings to confirm they are identical. Solution: State each core rule once in its authoritative section (fabrication -> epistemic_honesty; 5-set cap -> failure_handling; redaction -> safety_boundaries) and have the other sections reference it ('per <epistemic_honesty>') rather than restating the full rule, as the file already does in a few places. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/testrail-test-case-export/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Input Contract | Problem: The skill consumes a fairly rich input set (the authored test cases with titles/steps/expected/priority/type, project_id, suite_id, and the workflow state file path) but there is no input_contract section listing them. Step 1 uses project_id, step 7 uses project_id and suite_id and the planned case set, and the validation_checklist refers to 'test-scenarios.md' as the source document — but none of these inputs is declared with required/optional status or expected format up front. Reason: This skill performs irreversible external writes; an undeclared input set raises the risk of the agent invoking the export against a wrong/missing project_id or suite_id because the binding source was never stated explicitly. Solution: Add an input_contract table listing project_id, suite_id, section_id, the authored case set (source doc, e.g. test-scenarios.md), and the workflow state file path with required/optional markers, mirroring the sibling testrail-test-case-authoring input_contract. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 3 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/user-approved-code-changes/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: The <references> block (lines 160-180) re-lists every subagent, skill, and MCP that is already named inline in each phase block (e.g. discoverer, aqa-codebase-analysis, repository-implementation-standards, TestRail/Confluence/Playwright MCPs are all stated twice). The new file grew from 170 to 183 lines partly from this duplication.Reason: Duplicated reference lists cost cached tokens on every resend and risk drift when one copy is edited and the other is not. Solution: Trim the <references> section to only the items not already named in phase blocks (subagent capability descriptions and the MCP defaults), and drop the repeated skill list that duplicates each phase's Recommended skills line. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-code-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Output Contract | Problem: The base file (Task 6 / Update State) embedded the full Phase 3 test-plan section template and the verbatim agents/aqa-state.md block. The new file replaces these with a short field list (lines 58-68) and delegates the report shape to the skill's 9-section template. The exact on-disk markdown the workflow produces is now only fully specified inside aqa-codebase-analysis references/report-template.md.Reason: Schema was relocated to the bound skill (verified present), not dropped; the comparison reflects that the workflow file alone is now less self-contained. Solution: No fix required for correctness — the skill's <output_format> and report-template.md define the schema and the workflow's <validation_checklist> line 77 enforces the report path. Confirm the skill reference path stays valid in N-1 deployments. |
| 🔵 Medium | Workflow Completeness | Problem: The file shrank from 324 to 80 lines by moving all six analysis tasks (project description read, user-instructions extraction, frontend analysis, page-object inventory, similar-tests, utilities) into the aqa-codebase-analysis skill referenced at line 47 (USE SKILL aqa-codebase-analysis). The workflow's own <phase_steps> collapse this to one opaque line 1. Execute codebase analysis. This is acceptable delegation (the skill exists and preserves every moved task with equal or greater rigor), but a reader of the workflow alone no longer sees that user-instructions extraction and frontend analysis are part of Phase 3.Reason: The detail moved into a verified existing skill rather than being deleted, so this is a minor self-description gap, not a regression. Solution: No content was lost — verified aqa-codebase-analysis/SKILL.md steps 2-7 reproduce all six base tasks. Optionally add a one-line scope hint to <phase_steps> step 1 (e.g. 'covers project description, user instructions, page objects, similar tests, utilities') so the workflow self-describes its delegated scope. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-data-collection.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-requirements-clarification.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Decision Branching | Problem: Base Task 2 ('Define Explicit Assertions', lines 33-50 of base) gave the agent inline guidance on the four assertion types and how to derive them. The new file moves derivation entirely into aqa-requirements-elicitation (step 2.1, line 33) and keeps only a transcription template at step 2.4. There is no inline if/then for the case where the bound skill returns assertions whose type does not map cleanly to Presence/State/Content/Behavioral.Reason: The derivation logic was relocated to a verified existing skill and the None-case is handled; this is a minor edge-branch gap, not a regression. Solution: No content loss — verified aqa-requirements-elicitation/SKILL.md (lines 42-47, 82) derives typed measurable assertions and explicitly handles the zero-derived case, which step 2.4 line 72 transcribes. Optionally add one branch in step 2.4 for an assertion the elicitation skill tagged with no clean type (e.g. 'record under the closest type with a note'). |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-selector-identification.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Output Contract | Problem: Base Task 5 contained a fully worded user-facing page-source request (the 'How to Provide Page Sources' / DevTools copy-outerHTML instructions) and base Task 7 contained the Phase 4 test-plan section template. The new file at step 4.2 reduces this to '2. Provide clear instructions to user for capturing HTML' without restating them, deferring to the skill. Reason: The user-facing instruction text and output template moved to the bound skill (verified present) rather than being dropped; the workflow file alone is now less self-contained for the HITL page-source branch. Solution: Verified the skill's <output_format> references a ## Selector Management section template and the page-source path; the DevTools capture wording now lives in the skill/references. Confirm references/strategy-and-template.md carries the user-facing capture instructions so the agent can still produce them when frontend code is unavailable. |
| 🔵 Medium | Workflow Completeness | Problem: The file shrank from 357 to 76 lines, moving the seven detailed tasks (interaction mapping, existing-selector check, frontend search, page-source request, HTML analysis, selector strategy, test-plan update) into aqa-selector-management Part A (referenced line 45 USE SKILL aqa-selector-management, 'Execute Part A only'). The workflow now reduces these to step 4.1 'Execute Part A'.Reason: Detail was relocated into a verified existing skill with explicit Part A scoping, not deleted. Solution: No content lost — verified aqa-selector-management/SKILL.md Part A steps 1-4 reproduce interaction mapping, existing-selector check, frontend search, and page-source HTML analysis, plus the 4-tier strategy table in references/strategy-and-template.md. No change needed beyond confirming the Part A/Part B split stays consistent with aqa-flow-selector-implementation.md. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-selector-implementation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Epistemic Honesty | Problem: Neither the base nor the slimmed version tells the engineer what to do when the selector value from Phase 4 is uncertain or the mapped selector cannot be located in the page source; the file assumes every selector is implementable. Reason: Without a low-confidence path the engineer may invent a locator silently, which surfaces only as a Phase 7 failure. Solution: Add one line in step 5.2 or skill_acquire_failure directing the engineer to flag any selector that cannot be implemented or confirmed and to record it in agents/aqa-state.md rather than guessing a locator. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-test-correction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-test-implementation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-test-report-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow-test-report-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Reference Integrity | Problem: The frontmatter still reads name: modernization-flow-reuse while the document is the AQA Phase 7 Test Report Analysis workflow. The PR edited this file (added the Scope clause and evidence-label blocks) but did not correct the mismatched name, unlike the r3 sibling which was renamed to aqa-flow-test-report-analysis in the same PR.Reason: A skill/workflow whose registered name points at an unrelated modernization flow can be acquired or routed incorrectly, and the inconsistency with the corrected r3 file invites confusion. Solution: Set the frontmatter name: to aqa-flow-test-report-analysis to match the r3 sibling and the document's actual purpose. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/adhoc-flow.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-api-spec-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Reference Integrity | Problem: The step header uses step="2.1" for the determine-spec block but its inner text says "step 3 cross-check" is in another phase, and the phase_steps list is numbered 1-4 while the actual sub-blocks are tagged 2.1-2.5. The mismatch between the top-level phase_steps numbering (1,2,3,4) and the step ids (2.1...2.5) can mildly confuse step tracing.Reason: Consistent step numbering helps the agent track which sub-step it is on across a multi-step phase. Solution: Align the phase_steps list numbering with the step ids used in the section tags (either both 2.1-2.5 or both 1-4), so a reader maps the overview list to the detailed blocks without guessing. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-data-collection.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-documentation-mcp-subflow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Cognitive Budget | Problem: The four branch names (SKIPPED_NO_CONFIG, ACQUIRE_FAILED, EMPTY_HARVEST, COMPLETED) plus config-key precedence list (4 keys) plus in-scope signal list (7+ field names) plus three-pass verify-remediation cases must all be tracked together across resolve/harvest/verify sub-blocks. This is more than the ~5-step reliable window for one phase invocation. Reason: Holding many named branches and field lists at once during a single phase increases the chance of a skipped or mis-applied branch. Solution: Decompose into a tighter linear sequence or fold the in-scope signal field list into a single reference to the qa-project-config template rather than enumerating 7+ field names inline. |
| 🔵 Medium | Bloat Control | Problem: The subflow uses a heavy indirection style: branch triggers are named in <output_contract> and referenced by name, config-key precedence is in <workflow_context> and referenced, early-exit logic is restated in three places (the <execute_documentation_mcp> preamble early-exit rule, per-step -> early-exit markers, and <verify_remediation>). For a single-outcome write-one-row task this is a lot of cross-referential machinery that an agent must hold simultaneously.Reason: A small task (write one outcome line and verify it) carrying multiple layers of name-indirection raises the chance the agent loses track of which branch it is in. Solution: Consider consolidating the four branch outcomes and their triggers into one place (the table) and dropping the duplicated early-exit prose in the preamble, since each step already carries its own -> early-exit marker. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-execution-and-report-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: The two-layer binding is stated three times: once in <pinned_analysis_binding> (the layering order list), again in <execute_analysis> step 5 (domain_analysis_skill and output artifact path with the same Part A caveat), and the bypass-refusal rule appears in both the binding block and step 5. The Part A / Part B out-of-scope caveat is repeated at least three times across the file.Reason: Repeating the same binding values and caveats inflates the phase file and risks divergence if one copy is edited and the other is not (the maintainer note already flags this fragility). Solution: State the domain_analysis_skill = qa-test-debugging Part A binding and the bypass-refusal rule once (in the binding block) and have step 5 reference it by name rather than restating the values and the Part A caveat. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-gap-and-requirements-clarification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-project-config-loading.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-test-case-specification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow-test-correction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow-test-implementation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-test-case-generation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Epistemic Honesty | Problem: Neither BASE nor NEW instructs the agent to flag low-confidence or fabricated values. The TC template (step 5.3, lines 93-141) requires concrete Test Data and Expected Results but never tells the agent to mark guessed/uninferable values as gaps. The referenced testrail-test-case-authoring skill does enforce this in its success_criteria, but this phase file does not state that requirement or cite it.Reason: Test cases with invented IDs, ACs, or data values look authoritative but mislead downstream TestRail export and execution. Solution: Add one line in step 5.3 or the validation_checklist requiring that any test data / expected result not derivable from requirements.md be marked as an explicit assumption/gap (as enforced by testrail-test-case-authoring), not silently invented. |
| 🔵 Medium | Failure Handling | Problem: The modified file has no explicit failure-handling block. The BASE version also lacked one, so this is not a regression, but the phase still does not say what to do when its single input agents/testgen/{TICKET-KEY}/requirements.md is missing/empty, when zero requirements are extractable, or when the testrail-test-case-authoring skill returns an incompatible shape. Step 5.3 only describes the fallback template shape, not the stop/ask behavior.Reason: Without a missing-input path the agent may silently produce an empty or fabricated test-scenarios.md when the upstream artifact is absent. Solution: Add a small failure_handling block covering: missing/empty requirements.md (stop, record in testgen-state.md, return to Phase 4), zero extractable requirements, and skill-load failure beyond the inline-template fallback. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/testgen-flow.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-data-collection.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Output Contract | Problem: BASE included a concrete worked example for child-page capture ('Parent: Job Post; Children: Create/Edit/Delete a Job Post') that grounded the child-page requirement; the NEW file keeps the abstract pitfall ('always use get_page_children') but drops the example. Reason: The abstract instruction survives and the example is recoverable from confluence-source-harvesting, so behavior is largely preserved; the loss is a minor reduction in grounding for the most error-prone step. Solution: No inline change required if confluence-source-harvesting carries the child-page recursion example; verify that skill includes a concrete parent/child illustration. |
| 🔵 Medium | Workflow Completeness | Problem: The NEW file dropped the explicit CQL query construction step (BASE step 3 'Build CQL query: type=page AND space={PROJECT_KEY} AND (text ~ "{term1}" OR ...)') and the relevance-ranking sub-steps. The NEW step 1.2.4 only says 'Extract search terms' and 'Retrieve relevant Confluence pages' without the query-building detail. Reason: The mechanic exists in mcp-confluence-data-collection so this is not a true loss, but the phase no longer signals that the search-query shape lives in the skill, leaving a small chance an agent skips the skill and improvises a weaker search. Solution: Confirm the CQL build and ranking detail is covered by the referenced skill mcp-confluence-data-collection; it is (the skill defines 'space = PROJ AND text ~ ...' and ranking), so no inline restatement is needed — only verify the phase points the agent to that skill for the query mechanics, which it does in step 1.2.2. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-gap-and-contradiction-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-project-config-loading.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-question-generation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-requirements-document-generation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Output Contract | Problem: The NEW file keeps only the Executive Summary and Traceability Matrix snippets inline and defers the full requirements.md document skeleton (sections: Document Control, Out of Scope, Glossary, Appendices A/B/C, the per-US/FR/NFR/C/D/A/R entry templates) to the requirements-synthesis skill. The phase says 'using the output format from the skill' but does not list which sections the testgen requirements.md must contain. Reason: Core requirement sections are safely in the skill, but the BASE 'Document Control' version table and 'Appendices' (source docs / analysis summary / change log) are testgen-document-shaping that is neither inline nor in the skill output_format, a minor structural loss for document auditability. Solution: The requirements-synthesis skill (SKILL.md output_format) provides sections 1-10 incl. User Stories, FR, NFR, Constraints, Out of Scope, Assumptions, Risks, Traceability, Glossary, so the content is preserved; verify nothing testgen-specific (Document Control table, Appendices A/B/C from BASE) was lost — those appendix/version-control sections are NOT in the skill output_format and appear dropped without a new home. |
| ⚪ Low | Example Grounding | Problem: BASE carried concrete worked examples (US-1 User Login with AC1-3, FR-1 Password Validation, NFR-1 API Response Time 200ms/p95) that grounded what 'good' looks like; the NEW phase file drops all of them. Reason: These examples now live in the requirements-synthesis skill's references/output-schemas.md domain, so grounding is recoverable; the loss is cosmetic at the phase level. Solution: No inline change needed if output-schemas.md retains canonical US/FR/NFR examples; confirm presence in that reference file. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-test-case-export.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| ⚪ Low | Example Grounding | Problem: BASE embedded hardcoded vendor specifics (TestRail project_id=69, suite_id=3300, griddynamics.testrail.io URLs, full priority/type_id mapping tables, custom_preconds python example) directly in the phase. The NEW file removes all of these and defers to the testrail-test-case-export skill, keeping only TMS-agnostic orchestration. Reason: Removing baked-in vendor IDs is a net gain, not a regression; the lone minor effect is reduced inline grounding, fully mitigated because the skill carries the mappings and the canonical preconditions example. Solution: This is an intentional and correct improvement for Dependency Management (the hardcoded org-specific IDs were a portability defect). All mappings (P0->4, type_id table, preconditions TEST DATA-first ordering, mcp_testrail_add_case signature) are verified present in testrail-test-case-export/SKILL.md, so no content was lost; the only residual is that the phase now has no inline worked example, relying entirely on the skill. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📋 Prompt Quality Validation Report❌ Validation FailedSummary by File
📄
|
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/aqa-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Conflict Resolution | Problem: The new <orchestration_and_escalation> adds a 'Verification-failure unilateral-start override' that tells the agent to start the earliest incomplete phase in the same turn and explicitly 'do NOT call AskUserQuestion'. This is a deliberate carve-out from the session-wide HITL default that governs the rest of the workflow, so two rules point opposite directions at one gate.Reason: An auto-start path that suppresses the approval prompt is the kind of branch an agent can over-generalize; the tight gating makes it acceptable, but the competing-rule pair should read as unambiguous precedence rather than two co-equal rules. Solution: The conflict is already bounded (3 ANDed preconditions, ambiguity defaults to ASK, scope limited to 'this gate only'); keep it but make the precedence one line more explicit by stating that this override outranks the hitl skill ONLY when all three preconditions are simultaneously true, mirroring the wording already used in the Scope bullet. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-code-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-data-collection.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-requirements-clarification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-selector-identification.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| ⚪ Low | Success Criteria | Problem: The condensed <validation_checklist> lost the explicit state-write check that the BASE had ('agents/aqa-state.md updated with Phase 4 completion'). The new checklist verifies interactions mapped, selectors checked, strategy documented, but does not assert that the Phase 4 completion was recorded in the state file, even though <update_state step="4.3"> is a real step that can be skipped.Reason: State-file updates drive the orchestrator's sequential cadence (a phase is 'done' only when state says so per aqa-flow.md); omitting it from the done-when list lets the agent finish identification without recording it, stalling phase transitions. Solution: Add the state-write completion bullet to the validation_checklist. Note the data-collection and code-analysis checklists do NOT currently carry a state-file bullet (they verify the plan/report file), so this standardizes rather than mirrors them. Behaviorally mitigated: the phase has a dedicated update_state section and the parent enforces state writes. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-selector-implementation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-test-correction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-test-implementation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/aqa-flow-test-report-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/adhoc-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Conflict Resolution | Problem: The diff renames the persistence skill from plan-manager to operation-manager in <description_and_purpose> and the plan-wbs/execute-track building blocks, but the unchanged <OPERATION_MANAGER> block still describes the alias as wrapping rosettify MCP / npx rosettify@latest and its commands. Now two names coexist: the macro OPERATION_MANAGER (rosettify) and a operation-manager skill, with no statement of how they relate.Reason: An executing agent may treat operation-manager skill and the OPERATION_MANAGER rosettify alias as two different mechanisms, causing confusion about which to invoke for plan persistence.Solution: Add one line clarifying the relationship (e.g. state that the operation-manager skill is the skill form of the OPERATION_MANAGER alias), or use a single consistent term across the description, building blocks, and the macro block. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/workflows/testgen-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Conflict Resolution | Problem: The new verification skip-gate bans user interaction at verification failure (the agent 'MUST NOT AskUserQuestion'), scoped by 'At this gate', and is declared authoritative even if the sequential-workflow-execution skill fails to load. Unlike aqa-flow.md (which embeds an explicit precedence carve-out), testgen-flow has no inline statement that this no-pause rule does NOT override the genuine Phase 3/6 HITL approval gates. Reason: On the narrow path where the skill fails to load and a verification failure coincides with a real Phase 3/6 approval boundary, an agent could over-apply the no-pause rule and skip a required user approval. Solution: Add one inline sentence stating the no-pause verification rule does not override the Phase 3 and Phase 6 HITL approval gates, mirroring the precedence carve-out already present in aqa-flow.md. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-data-collection.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-gap-and-contradiction-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Output Contract | Problem: The new <create_analysis_document> defers sections 1-6 entirely to the gap-and-contradiction-analysis skill ('This phase does NOT duplicate that template') and shows only the Pass-2 append-only block verbatim. The fenced Pass-2 block in the diff is partially truncated in the hunk (the ## Analysis Metadata content shows [...]-style ellipsis), so the phase no longer carries a complete, self-contained example of the final analysis.md shape if the skill fails to load.Reason: The phase defers analysis.md sections 1-6 entirely to the gap-and-contradiction-analysis skill and has no skill-LOAD-failure handling. If the skill is unavailable the agent has no Pass-1 structure and no stop/escalate path, so it can emit a malformed analysis.md (Pass-2 only) that breaks Phase 3. Solution: Either restore a one-line inline enumeration of the analysis.md sections (1-6) as a fallback contract, or add skill-load-failure handling that re-invokes once then stops and escalates (mirror the <failure_handling> block already in testgen-flow-requirements-document-generation.md). Do not rely solely on the skill loading. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-project-config-loading.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-question-generation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-requirements-document-generation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| ⚪ Low | Output Contract | Problem: The refactor removed the full inline requirements.md skeleton (sections 1-11: User Stories, FRs, NFRs, Constraints, Dependencies, Out of Scope, Assumptions, Risks, Traceability, Glossary, Appendices) and replaced it with 'using the output format from the skill' plus only two inline fragments (Executive Summary and Traceability Matrix). The full section list now lives only in requirements-synthesis.Reason: If requirements-synthesis does not load, the agent lacks an inline contract for the primary deliverable's structure, risking missing sections that downstream Phase 5 depends on. Behaviorally mitigated: the phase has explicit <failure_handling> that re-invokes once then stops and escalates on skill-load failure, so it will not emit a malformed document; the cross-file dependency remains a maintainability concern.Solution: Add a one-line enumeration of the required top-level sections (US, FR, NFR, Constraints, Dependencies, Out of Scope, Assumptions, Risks, Traceability) inside <create_requirements_document> as a fallback contract, keeping the skill authoritative for the per-entry shape. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-test-case-export.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-test-case-generation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow-api-spec-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow-data-collection.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow-documentation-mcp-subflow.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow-execution-and-report-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow-gap-and-requirements-clarification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow-project-config-loading.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow-test-case-specification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow-test-correction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow-test-implementation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/qa-data-collection/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/qa-data-collection/references/backend-source-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/qa-gap-analysis/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/qa-project-config/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/qa-test-debugging/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Success Criteria | Problem: The skill defines clear stop/route behaviors in <failure_handling> and a thorough <validation_checklist>, but unlike its sibling QA skills (qa-data-collection, qa-gap-analysis) it has no dedicated <success_criteria> section stating the single explicit 'done when X, Y, Z' contract. Completion is instead scattered across the Part A/B boundary note, step 8 iteration policy, and the validation checklist. Reason: A single explicit completion contract makes the skill testable and consistent with the rest of the qa-* family; relying on the reader to assemble it from three locations risks a caller declaring complete prematurely. Solution: Add a short <success_criteria> section consolidating the done-conditions already implied (Part A: execution-report.md written with all sections + safety re-scan passed; Part B when run: changes applied + lint-checked + test intent unchanged + iteration cap respected), mirroring the sibling QA skills. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/qa-test-implementation/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/qa-test-implementation/references/multi-language-examples.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/aqa-codebase-analysis/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/aqa-codebase-analysis/references/report-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/aqa-requirements-elicitation/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/aqa-selector-management/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| ⚪ Low | Single Responsibility | Problem: The skill bundles two distinct responsibilities — Part A (read-only selector identification, steps 1-4) and Part B (page-object writing, steps 5-7) — invoked by two separate workflow phases (aqa-flow-selector-identification and aqa-flow-selector-implementation). This exceeds the 1-2 related-responsibilities guideline: one part is read-only analysis, the other mutates page-object source. Reason: A read-only phase and a file-writing phase sharing one skill increases the chance a Part A invocation accidentally crosses into Part B writes; the design mitigates this with per-phase scope binding but the dual responsibility remains a maintenance and safety consideration. Solution: Keep as-is only if the coupling is justified; the reference already documents the rationale at 'Why one file (design rationale)'. If maintainers later see drift, consider splitting. No rewrite needed now — flagging that the two-phase, read-vs-write split sits at the edge of the SRP guideline. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/aqa-selector-management/references/strategy-and-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/aqa-test-authoring/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/aqa-test-authoring/references/test-implementation-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/aqa-test-debugging/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: Heavy cross-reference scaffolding repeats the same pointers many times: the canonical taxonomy is announced as 'single source of truth referenced by step 4, <success_criteria>, <validation_checklist>, <failure_handling>', the 6-field Proposed Change set is restated in step 7, <output_format>, part-b-mechanics.md, and the Part-B checklist, and the 'page sources missing' rule appears in <input_contract>, step 4 item 0, <failure_handling>, <success_criteria>, and <validation_checklist>. The meta-commentary about where each rule is canonical (e.g. 'single source of truth — referenced by...') adds words without adding instruction. Reason: The repeated cross-reference annotations are non-operational provenance/bookkeeping that inflate the always-loaded surface and obscure the actual actions. Solution: Keep the canonical definitions but trim the inline 'referenced by X, Y, Z' bookkeeping notes; one short note per canonical block is enough. Avoid re-stating the page-sources rule verbatim in five sections — point to <failure_handling> once. |
| ⚪ Low | Single Responsibility | Problem: The skill bundles two responsibilities with different risk profiles: Part A (read-only report analysis, steps 1-6) and Part B (writes test source files, runs lint, tracks iterations, steps 7-9). The <when_to_use_skill> section openly acknowledges this: 'The skill bundles two responsibilities with materially different risk profiles' and notes the split is preserved 'so future SRP tightening (extracting Part B to a sibling skill) is a one-step refactor.' Reason: A skill that both analyzes (safe) and mutates repo files (risky) in one surface raises the blast radius of an accidental full-skill invocation; the author already flags this as a known tightening target. Solution: Acceptable for now given the explicit Part A/Part B boundary, gating, and approval discipline. Track the noted future extraction of Part B into a sibling skill so a read-only Part-A invocation does not carry write-capability instructions. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/aqa-test-debugging/references/escalation-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/aqa-test-debugging/references/part-b-mechanics.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/automation-test-execution-analysis/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Reference Integrity | Problem: This skill is authored as a phase orchestrator that resolves and runs OTHER skills, which strains the Rosetta 'skills can't call skills / no lateral sibling awareness' boundary. step 3 says 'USE SKILL debugging', step 4 'Resolve the parent-specified domain analysis skill', step 6 'USE the resolved domain analysis skill; execute only Part A'. <core_concepts> and the step-6/step-7 text also assume knowledge of another skill's internal 'Part A / Part B' structure (e.g. 'aqa-test-debugging Part A'), which is sibling-internal awareness. Reason: Skills knowing and invoking sibling skills' internal structure breaks Rosetta's isolation model; if the referenced skill renames or merges its parts, this skill's instructions silently go stale. Solution: Either reclassify this as a workflow phase (phase schema) rather than a skill, or strip the dependence on a sibling skill's internal Part A/Part B partitioning — refer only to the contracted output (a categorized read-only analysis artifact) without naming another skill's internal sections. |
| 🔵 Medium | Bloat Control | Problem: The <safety_boundaries> redaction block is very long (~20 lines of grep patterns/examples) and appears nearly verbatim in aqa-test-debugging's <safety_boundaries> (automation-test-implementation-handoff has no <safety_boundaries> block). This is DRY/bloat debt across the skill family. Reason: Re-baking an identical multi-paragraph redaction policy into multiple skills is DRY/Bloat debt; if the policy changes it must be edited in many places, risking divergence. Solution: Keep the canonical grep list and 'structural stays verbatim' rule; compress the per-target prose into the table-style bullets it already mostly is, and consider sourcing the shared secret/PII redaction policy from a single sensitive-data reference rather than re-baking it per skill. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/automation-test-implementation-handoff/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Reference Integrity | Problem: The skill's entire is built on acquiring and using OTHER skills, which strains the Rosetta 'skills can't call skills' boundary. Step 1: 'ACQUIRE repository-implementation-standards/SKILL.md FROM KB and USE SKILL repository-implementation-standards'; step 2: 'ACQUIRE coding/SKILL.md ... USE SKILL coding'; step 3: 'ACQUIRE testing/SKILL.md ... USE SKILL testing'; step 4: ACQUIRE+USE a parent-named domain skill. <core_concepts> also says 'USE SKILL hitl'. A skill chaining four-plus sibling skills behaves as a workflow phase, not a leaf skill. Reason: Authoring a skill as a multi-skill loader violates Rosetta isolation; it couples this skill to the existence, names, and load order of sibling skills, making it brittle and ambiguous about who owns orchestration. Solution: Reclassify as a workflow phase (phase schema) if cross-skill orchestration is intended, OR recast steps 1-4 as recommended foundational skills the executing agent already loads (workflows 'recommend skills at least'), so this artifact does not itself drive skill loading. If kept, document explicitly why this skill is exempt from the no-skill-calls-skill rule. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/confluence-source-harvesting/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Cognitive Budget | Problem: The SKILL.md file is 12,659 characters (in the 10K-20K band). It carries a full input_contract table, 10-step process, large safety_boundaries, failure_handling, validation_checklist, best_practices, pitfalls, resources, and templates all in the always-loaded body. Reason: A large always-loaded skill body raises per-call token cost and cognitive load, which the shared-context size rule flags at the 10K-20K band. Solution: Move the lower-frequency detail (e.g. the long credential/PII grep-pattern list in <safety_boundaries> and the canonical-URL normalization example pair) into a skill-internal references/ file loaded on demand, keeping the operational steps and gates in SKILL.md. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/gap-and-contradiction-analysis/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Cognitive Budget | Problem: The SKILL.md file is 11,107 characters (in the 10K-20K band). Multiple large embedded markdown templates (contradiction entry, gap entry, ambiguity entry, cross-reference, and the full <output_format> document) plus four enumerated taxonomies live in the always-loaded body. Reason: The body is large enough to trip the shared-context 10K-20K size threshold, raising per-call token cost on every load. Solution: Consider moving the per-finding entry templates and the full <output_format> document skeleton into a skill-internal references/ file referenced on demand, keeping the process steps, finding taxonomies, and gates in SKILL.md. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/mcp-confluence-data-collection/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Cognitive Budget | Problem: The SKILL.md file is 14,891 characters (in the 10K-20K band) — the second largest in this batch. It carries success_criteria, an 8-step process with embedded CQL examples and ranking rules, full output_format, large safety_boundaries with grep patterns, failure_handling, a long validation_checklist, and pitfalls all always-loaded. Reason: A 14.9K always-loaded body is near the upper portability/cost band and the shared-context size rule flags 10K-20K bodies; the skill already has a references/ folder to absorb the overflow. Solution: Move maintainer-grade detail (the full grep-pattern catalog in <safety_boundaries> and the CQL/ranking example block in step 2) into the existing references/ folder loaded on demand; the vendor-swap.md split already shows the pattern. Keep steps, gates, and output schema in SKILL.md. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/mcp-confluence-data-collection/references/vendor-swap.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/mcp-jira-data-collection/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Cognitive Budget | Problem: The SKILL.md file is 15,869 characters — the largest in this batch and in the 10K-20K band. The inline <vendor_replacement> section (lines 157-178) is a maintainer-only fork guide that is always loaded at runtime even though it is never needed during extraction. Reason: A 15.9K always-loaded body raises per-call token cost on every extraction; the sibling Confluence skill demonstrates the cheaper progressive-disclosure pattern, so this one is inconsistently heavier without benefit. Solution: Move the <vendor_replacement> fork guide into a skill-internal references/vendor-swap.md and reference it on demand, mirroring what mcp-confluence-data-collection already does — this removes maintainer-only content from the always-loaded runtime body. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/mcp-testrail-data-collection/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Cognitive Budget | Problem: The SKILL.md file is 10,730 characters (just inside the 10K-20K band). The inline <vendor_replacement> fork guide (lines 136-153) is maintainer-only content that is always loaded at runtime though never needed during extraction. Reason: Always-loaded maintainer-only content adds per-call token cost; the shared-context size rule flags 10K-20K bodies and the sibling Confluence skill already shows the cheaper pattern. Solution: Move the <vendor_replacement> fork guide into a skill-internal references/vendor-swap.md and load it on demand, mirroring mcp-confluence-data-collection, to drop the always-loaded body below the threshold. |
| 🔵 Medium | Success Criteria | Problem: Unlike its sibling skills mcp-jira-data-collection and mcp-confluence-data-collection, this skill has no <success_criteria> block. It relies only on <validation_checklist> for the done-condition. The two sibling MCP skills both open with an explicit single-paragraph success_criteria stating the testable 'complete when X OR failure-path-followed' contract. Reason: Without an explicit success_criteria the testable done-condition is implicit and inconsistent with the two sibling MCP collection skills, weakening the determinism of when the skill is considered complete. Solution: Add a <success_criteria> block matching the sibling pattern, e.g. 'Complete when the case was retrieved via get_case, normalized into every <output_format> section, every empty/missing required field recorded in Gaps, every credential/PII redacted and recorded — OR the not-found/auth-failure/transport-error path in <failure_handling> was followed and the user re-prompted.' |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/api-test-spec-authoring/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/api-test-spec-authoring/references/templates-and-redaction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/swagger-contracts-analysis/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Bloat Control | Problem: SKILL.md is 18531 characters (10K-20K band). The redaction guidance is stated three times in overlapping prose: the <safety_boundaries> block, the <validation_checklist> 'Redaction scan ran' item, and the <pitfalls> 'Copying literal Bearer tokens...' bullet all restate the same credential/PII redaction rule. The success_criteria block also partially restates validation_checklist items it claims to defer to.Reason: A resident skill prompt of this size with repeated content raises per-call token cost and dilutes the load-bearing instructions; the sibling skill api-test-spec-authoring already extracted its redaction catalog to references/, so this one is inconsistently heavier for the same concern. Solution: Keep the canonical redaction targets + grep pattern list once in <safety_boundaries>; reduce the validation_checklist and pitfalls entries to a single pointer (e.g. 'redaction scan ran per <safety_boundaries>') without re-describing the targets. Consider moving the long redaction/grep catalog to a references/ file the way api-test-spec-authoring does, since the SKILL.md resident cost is large. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/swagger-contracts-analysis/references/canonical-example.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/repository-implementation-standards/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/requirements-synthesis/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/requirements-synthesis/references/output-schemas.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/sequential-workflow-execution/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Input Contract | Problem: The skill consumes several parent-supplied bindings referenced only in prose: 'current phase id and its ACQUIRE target' (step 1), 'the workflow state file path provided by the parent workflow' (step 5), and the parent's HITL transition declaration (step 8). There is no structured input contract section listing these required inputs, their source, and required/optional status, unlike the sibling skills user-approved-code-changes and testrail-test-case-export which both use an explicit input-contract table. Reason: Without an explicit contract, an agent running this process shell can guess the state-file path or proceed when the phase ACQUIRE target was never bound, defeating the linear-execution guarantee the skill exists to provide. Solution: Add an <input_contract> section (table form) listing the required bindings: phase id + phase ACQUIRE target, workflow state file path, parent HITL-transition declaration, and dispatch contract. Mark each required/optional and name the source (parent workflow phase file). |
| ⚪ Low | Output Contract | Problem: The only declared output is the state-delta snippet under (Status/Completed/Outputs/Notes). The 'required announcement' lines (step 10b, e.g. 'skip refused: state row missing -> starting at Phase 0') and the '3-6 bullet phase summary' (best_practices) are described but not given an output format or example, so the emitted artifact shape is partly implicit. Reason: Phase summaries and refusal announcements are user-facing handoff artifacts; leaving their shape implicit produces inconsistent output across phases and agents. Solution: Either add the announcement line and the phase-summary bullets to an <output_format> section with one canonical example each, or cross-reference them from the existing block so all emitted artifacts are specified in one place. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/testrail-test-case-authoring/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/testrail-test-case-export/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/user-approved-code-changes/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📋 Prompt Quality Validation Report❌ Validation FailedSummary by File
📄
|
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/api-test-spec-authoring/references/templates-and-redaction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/aqa-codebase-analysis/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/aqa-codebase-analysis/references/report-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/aqa-requirements-elicitation/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/aqa-selector-management/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/aqa-selector-management/references/strategy-and-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/aqa-test-authoring/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: The done-condition is stated three times in near-identical prose: process step 5 ('The skill is complete after step 5 emits and only after step 4's validation passed'), the entire <success_criteria> block ('Complete when... NOT complete if...'), and again restated in <validation_checklist>. The full five-subsection list (Test File, Implementation Summary, Uncovered Assertions, Conflicts and Precedence, Validation) is spelled out verbatim in step 5, <success_criteria>, and <output_format>. This pushes the file to ~15.3K chars where a leaner version would carry the same contract. Reason: Per pa-hardening DRY and Bloat Control, repeating the same contract three times grows surface area and risks the copies drifting apart on future edits; a single canonical statement with references is more reliable. Solution: Keep the full done-condition only in <success_criteria> (its canonical home) and have process step 5 plus <output_format> reference it by name rather than restating 'Complete when / NOT complete'. State the five required subsections once (in the template/output_format) and reference 'the five required subsections' elsewhere instead of re-listing them. Do not change behavior, only remove the duplicated restatements. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/aqa-test-authoring/references/test-implementation-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/aqa-test-debugging/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Cognitive Budget | Problem: The always-loaded SKILL.md is ~16K chars (10K-20K high-severity band per evaluation rules). It carries Part A steps 1-9, the full canonical taxonomy, <input_contract> table, <safety_boundaries> (including verbatim redaction examples), <success_criteria>, a full <validation_checklist> duplicating success criteria item-by-item, and that restate the same rules a third time. A Part-A-only caller pays this entire surface even though Part B detail was already deferred to references/part-b-mechanics.md. Reason: Per SHARED_CONTEXT, an oversized always-loaded prompt risks forcing overflow/compaction and pa-hardening targets prompt size; progressive disclosure already exists for Part B mechanics but the Part-B safety/validation text was not deferred with it. Solution: Move the Part-B-specific halves of <safety_boundaries> (approval discipline, test-code-only writes), the Part-B <validation_checklist> block, and the Part-B lines into references/part-b-mechanics.md alongside the mechanics already there, leaving only a one-line pointer in SKILL.md. This keeps the read-only Part-A surface lean and defers write-path safety detail to the file already loaded when Part B runs. |
| 🔵 Medium | Bloat Control | Problem: <success_criteria>, <validation_checklist>, <safety_boundaries>, and restate the same rules multiple times. Example: the no-inferred-approval rule appears in <safety_boundaries> ('Inferred approval from prose ... is forbidden'), in the Part-B <validation_checklist> ('no inferred approval'), and again in ('inferred approval from looks good / silence is forbidden'). The application-source rule and the silent-skip-page-source rule are similarly triplicated. Reason: pa-hardening flags redundancy and 'compressible without value loss'; the triplication inflates the already-large SKILL.md without adding new behavior. Solution: Keep the canonical statement in <safety_boundaries>/<failure_handling> and reduce <validation_checklist> and to terse pointers (e.g. 'approval explicit per <safety_boundaries>') instead of re-stating the full rule, trimming the always-loaded surface. |
| 🔵 Medium | Single Responsibility | Problem: The skill bundles two responsibilities with different risk profiles: Part A (read-only report analysis, steps 1-6) and Part B (writes test source files + lint + iteration tracking, steps 7-9). The prompt itself acknowledges this in <when_to_use_skill>: 'The skill bundles two responsibilities with materially different risk profiles' and notes the split is preserved 'so future SRP tightening (extracting Part B to a sibling skill) is a one-step refactor.' Reason: pa-hardening enforces SRP (1-2 responsibilities); read-only analysis and repository-mutating correction are two distinct jobs. The explicit boundary and Part-A-only path mitigate the risk, so this is a documented compromise rather than a hidden flaw. Solution: Acceptable as shipped because the boundary is explicit and a Part-A-only caller is told not to run steps 7-9. To fully satisfy SRP, extract Part B into a sibling skill (e.g. aqa-test-correction) that consumes Part A's artifact, as the prompt's own note anticipates. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 3 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 3 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/aqa-test-debugging/references/escalation-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/aqa-test-debugging/references/part-b-mechanics.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/automation-test-execution-analysis/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Cognitive Budget | Problem: SKILL.md is ~15.7K chars (10K-20K high-severity band). The <safety_boundaries> redaction policy alone spans a full multi-row table PLUS a separate canonical grep-pattern list PLUS a structural-content rule PLUS a re-scan rule, and this skill has no references/ file to defer detail to (progressive disclosure not used). The whole redaction policy is loaded on every invocation even though it only fires when inputs embed secrets. Reason: Per SHARED_CONTEXT and pa-hardening, an oversized always-loaded skill risks compaction; the redaction detail is reference-grade material that fits progressive disclosure, which this skill does not yet use. Solution: Extract the <safety_boundaries> redaction table + grep-pattern list + structural-content rule into a references/redaction-policy.md loaded on demand, leaving a one-line trigger in SKILL.md. The prompt's own 'DRY note (future)' already anticipates a shared redaction reference; pulling it out also shrinks the always-loaded surface. |
| 🔵 Medium | Bloat Control | Problem: Non-operational provenance note in <safety_boundaries>: the '> DRY note (future): the redaction policy ... is shared verbatim with sibling skills (aqa-test-debugging, qa-test-debugging). A single sensitive-data redaction reference would let all three skills source from one canonical location - tracked in docs/TODO.md for the next family refactor.' This is a future-plan / rationale annotation aimed at maintainers, not an instruction the executing agent acts on, and it also introduces sibling-skill awareness (names aqa-test-debugging and qa-test-debugging). Reason: pa-patterns ai-issues warns against inserting non-operational clarifications (history, rationale, future plans) into target prompts, and pa-hardening requires no lateral/sibling awareness; the note violates both without changing agent behavior. Solution: Remove the 'DRY note (future)' block from the prompt and track the refactor in docs/TODO.md only (where it already says it is tracked). Keep the skill source-agnostic and free of sibling-name coupling. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 3 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/automation-test-implementation-handoff/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Cognitive Budget | Problem: SKILL.md is ~15K chars (10K-20K high-severity band) with no references/ deferral (no progressive disclosure). The step-4 domain-skill GATE rule is stated in full in <core_concepts>, again in the <input_contract> row, again at length in step 4 of , again in <failure_handling> (three separate bullets), again in <validation_checklist>, and again in — the same canonical rule re-expanded across six sections, all loaded every invocation. Reason: Per SHARED_CONTEXT, oversized always-loaded skills risk compaction; pa-hardening targets prompt size and pa-patterns flags redundancy. The rule is repeated rather than referenced, inflating cognitive load without new behavior. Solution: State the step-4 GATE rule canonically once (in step 4 as the prompt already designates 'canonical'), and reduce the other five sites to terse pointers ('domain skill required — see step 4 GATE') rather than re-expanding the rule. Optionally move the <recommended_foundational_skills> rationale paragraph to a reference. This shrinks the always-loaded surface below the high-severity band. |
| 🔵 Medium | Bloat Control | Problem: The 'Why this skill doesn't ACQUIRE/USE' paragraph in <recommended_foundational_skills> ('A skill that chains four-plus sibling skills behaves as a workflow phase, not a leaf skill, and couples to sibling names + load order ... By recasting these as recommended foundational skills ...') is design-rationale explaining a past authoring decision, not an instruction the executing agent acts on. Reason: pa-patterns ai-issues warns against injecting rationale/origin/explanatory meta-notes into target prompts (state-only, action-only); the paragraph adds no runtime behavior and enlarges an already high-band file. Solution: Remove the rationale paragraph; keep only the operative rule ('this skill verifies presence and applies discipline; it does NOT ACQUIRE/USE') already stated in <core_concepts>. Move the design reasoning to the change-log or PR description. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 3 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/confluence-source-harvesting/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Success Criteria | Problem: The skill has no explicit <success_criteria> section stating testable done-when conditions. The sibling skill mcp-confluence-data-collection defines a dedicated <success_criteria> block ('Complete when target pages were retrieved... OR the failure path was followed'), but confluence-source-harvesting only has a <validation_checklist> that lists post-conditions to verify rather than a single completion statement. An agent reading this skill must infer when the harvest is 'done' from step 10 plus the checklist. Reason: Without an explicit completion statement, the agent may treat a partial harvest (e.g. parents fetched, children skipped) as done, which the validation_checklist forbids but does not gate at the right moment. Solution: Add a <success_criteria> section mirroring the sibling skill: state the skill is complete when all user URLs/derived pages were fetched and embedded as page entries, children were checked or waived, truncation/redaction applied, the step 10 summary was written, OR a <failure_handling> stop path was followed; and that it is NOT complete on a silent zero-page emit. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/confluence-source-harvesting/references/redaction-and-normalization.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/gap-and-contradiction-analysis/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Conflict Resolution | Problem: The SKILL's <safety_boundaries> rule 3 declares the three-tier risk scheme High/Medium/Low the single source of truth and explicitly forbids introducing 'Critical/Urgent/Blocker as a fourth tier'. But the output document skeleton it points to (referenced from <output_format>) defines an Executive Summary field 'Severity: [Critical / High / Medium / Low]', introducing exactly the Critical tier the SKILL prohibits. The SKILL and its own referenced template contradict each other on the allowed tier vocabulary. Reason: An agent following rule 3 will refuse to write 'Critical' while the template instructs it to, producing inconsistent documents across runs and a contradiction the validation_checklist ('single risk tier from <risk_assessment>') cannot resolve. Solution: Align the two: either remove 'Critical' from the Executive Summary Severity field in entry-templates-and-document-skeleton.md so it reads '[High / Medium / Low]', or add an explicit note in <safety_boundaries> rule 3 that the Executive-Summary Severity field is a separate overall-document rating distinct from per-finding risk tiers and may use Critical. Pick one and state it in the SKILL so the vocabulary is unambiguous. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/gap-and-contradiction-analysis/references/entry-templates-and-document-skeleton.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Conflict Resolution | Problem: The Output Document Skeleton's Executive Summary defines 'Severity: [Critical / High / Medium / Low]', which includes a 'Critical' value the parent SKILL.md's <safety_boundaries> rule 3 explicitly forbids ('Do not introduce Critical/Urgent/Blocker as a fourth tier'). As a detail layer of the parent skill, this template contradicts the parent's authoritative tier rule. Reason: The reference is loaded at document-assembly time; if it instructs the agent to emit 'Critical' while the parent forbids it, the agent gets contradictory write-time guidance and document tier vocabulary becomes non-deterministic across runs. Solution: Remove 'Critical' from the Severity field so it reads '[High / Medium / Low]' to match the parent SKILL's three-tier rule, OR add an inline note in this skeleton clarifying the Executive-Summary Severity is a whole-document rollup distinct from the per-finding High/Medium/Low tiers (matching whichever resolution the parent SKILL adopts). |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/mcp-confluence-data-collection/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/mcp-confluence-data-collection/references/cql-and-redaction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/mcp-confluence-data-collection/references/vendor-swap.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/mcp-jira-data-collection/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/mcp-jira-data-collection/references/vendor-swap.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/mcp-testrail-data-collection/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Safety Boundaries | Problem: The redaction target list in mcp-testrail-data-collection <safety_boundaries> is weaker than its Jira sibling. It names the credential and PII categories but gives no concrete grep patterns (no Bearer , Authorization:, JWT eyJ..., BEGIN PRIVATE KEY, email/phone/card regex shapes), and omits database connection strings entirely. The <validation_checklist> then says 'grepped ... per <safety_boundaries>' but <safety_boundaries> provides nothing to grep.Reason: Without concrete patterns the agent decides ad hoc what looks like a secret, so embedded tokens or PII in step text or preconditions can pass the scan and land in a tracked artifact. Solution: Add the same concrete grep-pattern set and the database-connection-string category to the testrail <safety_boundaries> so the redaction scan is operational, matching the depth of the Jira skill that the parallel chain re-emits into version control. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/qa-data-collection/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Cognitive Budget | Problem: At ~16.8K chars this SKILL.md sits in the 10K-20K high-severity band. It is the orchestrator skill yet it carries seven full process steps, a long output_format template, pitfalls, safety_boundaries, success_criteria, failure_handling, and a validation_checklist all always-loaded. Step 5 (Discover Existing Test Patterns) restates many enumerations (frameworks, HTTP clients, directory globs) that overlap with the deferred backend-source-analysis reference, adding to the always-loaded budget. Reason: A large always-loaded orchestrator skill that itself delegates to MCP sub-skills consumes context the parent workflow also needs; the bigger it is, the higher the risk of skipped steps and compaction during the multi-skill collection chain. Solution: Push the detailed step-5 enumerations (framework/import/HTTP-client/test-structure lists) into a deferred reference the same way step 4 already defers to references/backend-source-analysis.md, keeping step 5 as a thin orchestration entry. This trims the always-loaded surface back toward the lean-SKILL target. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/qa-data-collection/references/backend-source-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/qa-gap-analysis/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/qa-project-config/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/qa-test-debugging/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Cognitive Budget | Problem: The file is ~18.3K characters, the largest of the QA skills and within the 10K-20K high-severity band flagged in the shared context. The size comes largely from carrying two full responsibilities (Part A analysis + Part B corrections) plus four heavy governance sections (<safety_boundaries>, <failure_handling>, <success_criteria>, <validation_checklist>) where <success_criteria> and <validation_checklist> restate the same Part A/Part B contract (e.g., 'every <output_format> section present', 'no literal credential/PII', 'iteration 3 escalation recorded' appear in both). A single skill load this large competes for context budget against the rest of the QA chain artifacts the agent must also hold. Reason: At 18.3K a single on-demand skill consumes a large share of the working context window; the overlap between success_criteria and validation_checklist is non-functional repetition that can be compressed without losing any check, lowering load cost for every invocation. Solution: Reduce duplication between <success_criteria> and <validation_checklist>: keep <success_criteria> as the high-level done-condition and have it reference <validation_checklist> for item-level checks (the file already declares <validation_checklist> as 'single source of truth' but still restates the items in <success_criteria>). If the Single Responsibility split is adopted, the per-skill size drops naturally below the high band. No behavioral change is required, only de-duplication of the two governance sections. |
| 🟡 High | Single Responsibility | Problem: The skill bundles two responsibilities with materially different risk profiles into one file: Part A (steps 1-5) is read-only report analysis producing execution-report.md, and Part B (steps 6-8) writes test source files and runs lint after user approval. The <when_to_use_skill> section itself states 'The skill bundles two responsibilities with materially different risk profiles' and has to add a 'Part A / Part B usage boundary' note plus a rule that 'a Part-A-only invocation MUST NOT execute steps 6-8' to manage the coupling. A read-only analyzer and a code-mutating fixer are two distinct jobs (the schema target is 1-2 related responsibilities, here the second one mutates source and carries approval/lint/iteration machinery the first does not). Reason: One skill doing both read-only analysis and approval-gated source mutation enlarges the cognitive search space and risks an agent sliding from analysis into applying changes without the explicit approval gate; the prompt already needs guardrail prose to prevent exactly that, which signals the responsibilities are separable. Solution: Consider splitting into two skills: a read-only qa-report-analysis skill (current Part A, steps 1-5, producing execution-report.md) and a separate qa-test-correction skill (current Part B, steps 6-8, consuming execution-report.md as its input contract and owning the approval/lint/3-iteration policy). If the QA family intentionally keeps them together for chaining, keep the current explicit Part A/Part B boundary note but make the split-skill option an explicit recorded decision so the coupling is deliberate rather than incidental. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 3 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 3 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/qa-test-implementation/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/qa-test-implementation/references/multi-language-examples.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/repository-implementation-standards/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/requirements-synthesis/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/requirements-synthesis/references/output-schemas.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/sequential-workflow-execution/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/swagger-contracts-analysis/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Cognitive Budget | Problem: The SKILL.md is ~16KB. The (5 numbered sections, each with multi-level sub-bullets), <output_format> (full per-endpoint template), <validation_checklist> (8 proof items), <success_criteria>, <failure_handling> (7 distinct branches including GraphQL adaptation), and all stay loaded together. An agent executing a single endpoint extraction carries the GraphQL branch, the reconciliation-conflict branch, and the citation-source-unavailable branch in context even when none apply, exceeding the reliable ~5-decision-at-once budget per pa-patterns.md ai-issues. Reason: Progressive disclosure keeps the always-loaded surface lean and within the cognitive budget; the skill already proves it can defer detail (redaction-catalog.md), so the rarely-hit failure branches are the natural next deferral. Solution: Move the lower-frequency <failure_handling> branches (GraphQL API adaptation, spec-vs-code reconciliation-conflict-beyond-Notes, citation-source-unavailable) into a references/ file loaded on demand, mirroring the redaction-catalog lazy-loading the skill already uses; keep only the common locate/coverage/parse-failure stops inline in the base SKILL.md. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/swagger-contracts-analysis/references/canonical-example.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/swagger-contracts-analysis/references/redaction-catalog.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/testrail-test-case-authoring/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Cognitive Budget | Problem: The SKILL.md is ~16KB. Three full worked (Happy Path, Negative-with-parameterized, Role-based-merged) plus a complete <test_case_template>, an 8-item <validation_checklist>, a 5-branch <failure_handling>, a <safety_boundaries> redaction catalog inline, an <epistemic_honesty> per-field gap-marker list, and are all loaded together to author a single test case. The example bodies and the full safety catalog stay in context for every authoring call, pushing past the reliable ~5-decision budget noted in pa-patterns.md ai-issues (overload causes skipped steps). Reason: Worked examples and the verbatim redaction catalog are detail layers consulted only when a field-shape question or a redaction arises; loading them on every authoring call inflates the always-resident surface area without adding per-call value. Solution: Defer the three full blocks and the inline redaction target/placeholder catalog in <safety_boundaries> to a references/ file loaded on demand, keeping the template, format_rules, success_criteria, and the gap-marker rules inline. The sibling swagger skill already shows this lazy-loading pattern (references/redaction-catalog.md, references/canonical-example.md). |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/testrail-test-case-export/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Cognitive Budget | Problem: The SKILL.md is ~16KB. The full <vendor_replacement> section (~25 lines describing how to fork the skill for Zephyr/Xray/qTest/Polarion, the per-vendor re-binding catalog, and the workflow-side coupling note) stays loaded during every actual TestRail export run even though it governs a future authoring task, not the runtime export. Combined with , <input_contract>, <safety_boundaries>, <validation_checklist>, and , this carries non-execution maintenance guidance in the execution context. Reason: Vendor-porting guidance is consumed by a prompt-maintainer task, not by the export-runtime agent; keeping it inline mixes a maintenance concern into the runtime cognitive budget with no per-export value, per pa-patterns.md work-curiosity-limit and progressive disclosure. Solution: Move <vendor_replacement> to a references/ file (e.g. references/vendor-porting.md) loaded only when someone is forking the skill to a new TMS, leaving the runtime export path (process, input_contract, safety_boundaries, validation_checklist) as the resident surface. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/user-approved-code-changes/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/aqa-flow-code-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/aqa-flow-data-collection.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/aqa-flow-requirements-clarification.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Example Grounding | Problem: The diff deletes the concrete Example Questions block (critical/edge/test-flow sample questions such as 'should we match exact text "Success!" or just verify message contains "Success"') and the inline Define Explicit Assertions task with per-step assertion examples. The new file gives typed assertion patterns (Presence/State/Content/Behavioral templates) but no worked sample question and relies on bound skills for question content.Reason: The deleted sample questions showed the agent the desired specificity (exact-match vs contains); without an equivalent example downstream, question quality may regress to vague prompts. Solution: Verify questioning and aqa-requirements-elicitation carry sample questions and a worked typed-assertion example; if not, add one short example question and one filled assertion bullet to those skills (not the workflow) so the abstract templates have a concrete anchor. |
| ⚪ Low | Conflict Resolution | Problem: The new <description_and_purpose> and <workflow_context> assert assertion derivation happens 'inside the bound skill at step 2.1' while step 2.4 transcribes; but <identify_gaps step=2.1> only says 'USE SKILL aqa-requirements-elicitation' and 'Prepare a list of unknowns', not that it derives typed assertions. The authority-chain claim and the step body are slightly out of sync.Reason: If step 2.1 does not visibly produce derived assertions, step 2.4's 'collect every derived assertion' has no clearly defined source within the phase text. Solution: Add an explicit bullet to <identify_gaps> stating the skill also derives the typed Derived assertion field per item, so step 2.1 and the authority-chain narrative agree. |
| ⚪ Low | Decision Branching | Problem: The base Task 4 had explicit DO NOT PROCEED to Phase 3 until answers received. The new <wait_for_user> keeps STOP AND WAIT but the only else-branch (zero derived assertions) is handled in step 2.4; there is no explicit branch for 'user provides partial answers' or 'user declines to answer'.Reason: Partial-answer handling is a realistic HITL path; leaving it implicit risks the agent silently proceeding or stalling. Solution: Add a one-line else-branch in <wait_for_user> or <update_test_plan> for partial/declined answers (record gap, proceed with documented unknowns vs re-ask), matching the None-clause pattern already used for assertions. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 3 | ⬇️ Slightly worse |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-selector-identification.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Example Grounding | Problem: The diff deletes the entire detailed user-facing page-source capture protocol: the step-by-step F12 / right-click Inspect / Copy outerHTML / include 2-3 parent levels instructions, the {page-name}.html kebab-case naming convention, and the full User Interaction Format message template. The new <handle_page_source> reduces all of this to 'Provide clear instructions to user for capturing HTML'.Reason: This is user-facing instruction content; 'provide clear instructions' is non-operational and gives the agent no template, so the user-facing capture message quality regresses and non-technical users may not capture usable HTML. Solution: Confirm aqa-selector-management (Part A) owns this user-facing capture protocol; if it does not carry the HTML-capture steps and naming convention, restore a compact version (or an explicit pointer) in <handle_page_source> since this is user-facing output that compression rules protect. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 2 | ⬇️ Slightly worse |
| Failure Handling | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/aqa-flow-selector-implementation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Output Contract | Problem: The diff deletes the Phase 5 test-plan output template (Page Objects Modified/Created with added selectors and methods, Implementation Notes, Files Modified) and the code-pattern examples (TypeScript selector/getter/new-page-object skeletons). The new file documents only state-file fields in <update_state> and a checklist; the implementation artifact shape now depends on aqa-selector-management Part B.Reason: The concrete getter/accessor code examples grounded the 'follow existing patterns exactly' instruction; the <skill_precedence> positive/anti examples partly compensate but the general page-object skeleton is gone, so consistency guidance is thinner if the skill lacks it.Solution: Confirm aqa-selector-management Part B carries the page-object code pattern and the modified/created selector reporting shape; if not, add a minimal anchor. Do not re-inline the full TypeScript examples — they belong in the skill. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/aqa-flow-test-correction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-test-implementation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔴 Critical | Conflict Resolution | Problem: Step 6.1 says 'Do not ACQUIRE or USE coding, testing, repository-implementation-standards, or aqa-test-authoring directly from this phase file - the handoff delegates internally and is the only entry point that loads them', and sub-step 3 says 'the handoff is responsible for ACQUIRing and applying it [aqa-test-authoring]'. But the bound skill automation-test-implementation-handoff/SKILL.md states the opposite: 'This skill does NOT drive skill loading... it does NOT itself ACQUIRE/USE other skills' and its step-4 GATE STOPS with a failure if the domain/foundational skills were not already loaded by the calling workflow. The phase and the skill it delegates to give contradictory loading responsibilities. Simulation confirms this is a hard execution deadlock, not just inconsistent prose: the parent's repository-implementation-standards load does not rescue it because coding, testing, and aqa-test-authoring remain unloaded (parent lists them only as 'Recommended skills'), so the handoff still STOPS at its verify GATE and Phase 6 cannot complete. Reason: The agent following the phase will withhold loading the domain/foundational skills expecting the handoff to load them; the handoff then hits its step-4 GATE, reports 'foundational skill not loaded', and stalls Phase 6. A wrong instruction that contradicts the delegated skill reliably breaks the chain. Escalated to critical because the chain fails (hard stall), not merely degrades. Solution: Align step 6.1 with the handoff contract: instruct this phase (or the parent aqa-flow Phase 6, which already lists coding/testing/aqa-test-authoring as recommended skills) to ACQUIRE+load coding, testing, repository-implementation-standards, and aqa-test-authoring BEFORE USE SKILL of the handoff, and reword sub-step 3 so the handoff 'verifies presence and applies discipline' rather than 'is responsible for ACQUIRing'. Remove the 'only entry point that loads them' claim. |
| 🟡 High | Reference Integrity | Problem: Step 6.1 binds aqa-test-authoring as the 'domain test implementation skill the handoff must apply' and asserts the handoff 'delegates internally and is the only entry point that loads them'. The referenced handoff skill explicitly disclaims internal delegation/loading (recommended_foundational_skills: 'this skill only verifies presence... does NOT itself ACQUIRE/USE other skills'). The reference resolves to a file, but the described behavior of that file is incorrect. Reason: A reference whose described semantics contradict the target file misleads the agent about which component performs loading, producing the same chain stall as the Conflict Resolution issue. Solution: Update the prose so the reference matches the handoff's actual contract: the parent workflow loads the foundational + domain skills, the phase names aqa-test-authoring as the domain skill, and the handoff verifies-and-applies. Cite the handoff's recommended_foundational_skills section instead of claiming it loads skills. |
| 🔵 Medium | Example Grounding | Problem: The rewrite deleted all concrete code examples that the base provided (TypeScript test scaffold, setup, assertions like expect(welcomeMessage).toBeVisible(), cleanup hooks, state-file template). The new phase has no canonical example of the implemented artifact it governs. Reason: Examples grounded the previous implementation step; their removal is a deletion. The loss is partly mitigated by delegation, but the phase itself now offers no concrete anchor, so the comparison is slightly worse. Solution: Since authoring detail is now delegated to aqa-test-authoring, this is acceptable IF the phase notes that concrete authoring examples live in aqa-test-authoring's output_format. Add one short canonical example of the expected state-file update or validation_checklist outcome to keep the phase self-grounding. |
| 🔵 Medium | Decision Branching | Problem: The rewrite collapsed the prior explicit branching (new-file vs existing-file in old Task 2, cleanup-needed vs not in old Task 8) into a single 'Execute test authoring' delegation. Within this phase the only remaining branch is the skill_handoff acceptable/unacceptable check; authoring decision points now live only inside the delegated skill, so the phase no longer states the if/then for the common authoring forks. Reason: Without a pointer, an agent reading only this phase cannot tell whether the missing branches are intentionally delegated or accidentally dropped, risking skipped decisions. Solution: Either add a one-line note that file-location and cleanup branching are owned by aqa-test-authoring (so the reader knows where the decision logic moved), or keep a brief if/then pointer. No need to restore the full task list. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 2 | ⬇️ Slightly worse |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 3 | ⬇️ Slightly worse |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 3 | ⬇️ Slightly worse |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-test-report-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| ⚪ Low | Example Grounding | Problem: The rewrite removed the base's concrete failure-record markdown template (Error Type / Error Message / Stack Trace / Page Source Analysis fields) and Phase 7 plan section template. The new file defers the artifact schema to the domain skill's output_format and gives no inline example of a labeled root-cause entry. Reason: The evidence-strength labeling rule is the central new behavior; one concrete labeled example would reduce mislabeling risk. Low severity because the Confirmed/Assumption/Unknown definitions and tie-break are spelled out in prose. Solution: Add one short inline example of a labeled root cause (e.g. a 'Confirmed' line with a one-line rationale) or explicitly point to aqa-test-debugging output_format for the artifact shape. Schema delegation is fine; a single anchor example keeps the evidence-label rule concrete. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/adhoc-flow.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow-test-report-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-api-spec-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-data-collection.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-documentation-mcp-subflow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: The subflow uses heavy cross-referencing indirection: the <execute_documentation_mcp> intro states 'branch triggers live in <output_contract> and are referenced by name', 'Config-key precedence lives in <workflow_context> and is referenced, not relisted', and the early-exit rule plus <verify_remediation> all point back to <output_contract> branch names (SKIPPED_NO_CONFIG, ACQUIRE_FAILED, EMPTY_HARVEST, COMPLETED). This DRY-by-reference style avoids duplication but forces the agent to jump between four sections to resolve a single branch, adding cognitive overhead for what is one optional collection branch. Reason: Per pa-patterns ai-issues, agents skip steps and lose context when forced to resolve directives across multiple distant sections; co-locating the resolved value reduces round-trips and skipped-branch risk. Solution: Inline the one-line outcome string next to each branch trigger in <harvest_and_collect> and (e.g. after 'apply SKIPPED_NO_CONFIG' append the literal outcome line), keeping <output_contract> as the canonical table but removing the need to cross-jump on every branch. Reduces lookups without reintroducing the full duplication the author was avoiding. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 3 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-execution-and-report-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-gap-and-requirements-clarification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-project-config-loading.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-test-case-specification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-test-correction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow-test-implementation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔴 Critical | Conflict Resolution | Problem: Step 5.1 forbids the phase from loading the foundational/domain skills ('do not USE SKILL or ACQUIRE coding, testing, repository-implementation-standards, or qa-test-implementation directly from this phase file - the handoff delegates to them internally and is the only entry point that loads them'), but the bound skill automation-test-implementation-handoff/SKILL.md states the opposite: it 'does NOT drive skill loading' and STOPS if those skills are not already loaded by the calling workflow. Parent qa-flow.md Phase 5 lists them only as non-binding 'Recommended skills'. So nothing actually loads them and the handoff stops at its first verify GATE - Phase 5 deadlocks every run. Reason: Tracing the chain shows the phase forbids exactly the skills the bound skill gates on, so the documented happy path cannot complete. Reliability is the primary goal; a broken execution chain is critical. Solution: Make the parent qa-flow.md Phase 5 (or this phase file) ACQUIRE+load coding, testing, repository-implementation-standards, and qa-test-implementation as a BINDING load before USE SKILL of the handoff; reword step 5.1 so the handoff 'verifies presence and applies discipline' rather than being 'responsible for ACQUIRing' or 'the only entry point that loads them'. Mirror the consistent pattern used by automation-test-execution-analysis (which correctly drives loading). |
| 🟠 Very High | Reference Integrity | Problem: Step 5.1 prose describes the handoff's loading behavior incorrectly ('delegates to them internally and is the only entry point that loads them'), contradicting the bound automation-test-implementation-handoff contract which verifies-but-does-not-load. The reference to the handoff's responsibility does not resolve to the skill's actual behavior. Reason: A phase that misstates what a bound skill does makes the agent rely on behavior that never happens, breaking the chain. Solution: Correct the prose to match the handoff's verify-only contract, and point Phase 5 to the binding skill-load step that must precede it. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 2 | ⬇️ Slightly worse |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/testgen-flow-data-collection.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Output Contract | Problem: The base file included a full verbatim testgen-state.md template (Phase Completion Status checklist, Metrics block, Phase Details block). The new <update_state step="1.4"> replaced it with the single instruction 'Update testgen-state.md with Phase 1 complete and metrics' and no field list.Reason: Without an explicit state schema the agent may write inconsistent state files across phases, weakening the cross-phase self-check the parent flow relies on. Solution: List the minimum required state-file fields inline (current phase, completion checkbox, Jira fields count, Confluence pages count) or add a TERM reference to one shared state-file template defined once in the parent flow, so each phase produces a deterministic state shape. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-gap-and-contradiction-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-project-config-loading.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-question-generation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/testgen-flow-requirements-document-generation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Output Contract | Problem: The new <create_requirements_document step="4.3"> defers the main document structure to the requirements-synthesis skill ('using the output format from the skill') and only specifies the testgen-specific Executive Summary and Traceability Matrix additions. The base file carried a full inline requirements document template; that full template is now gone from this phase.Reason: If the skill's output_format does not enumerate the same sections the phase expects, the agent has no in-phase contract to verify the document is complete, risking an under-structured requirements.md. Solution: Confirm requirements-synthesis actually defines the full requirements section structure (US/FR/NFR/C/D/A/R bodies); if it does, keep the deferral but add one line naming the canonical section list expected so the agent can self-validate. If the skill does not define it, restore a minimal inline section list. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-test-case-export.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-test-case-generation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📋 Prompt Quality Validation Report❌ Validation FailedSummary by File
📄
|
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: The per-value honesty / [ASSUMED: ...] discipline and the redaction discipline are each stated three+ times. The honesty rule appears in step 3 ('Per-value honesty rule', 'Confident fabrication is forbidden'), in <pitfalls> ('Confidently emitting an invented field value...'), in <failure_handling> (the empty-schema branch), and again in <validation_checklist> ('Exact-value rule' + 'Assumptions section populated'). The same redundancy occurs for the GATE conditions, which are restated almost verbatim across step 1 GATE, <failure_handling>, and <validation_checklist>.Reason: The same rule written four ways inflates the resident SKILL.md (~10.9KB) and forces the agent to reconcile near-duplicate phrasings, which is where drift and contradictions creep in. Solution: Keep the canonical statement in step 1 GATE / step 3 and have <failure_handling> and <validation_checklist> cross-reference it by name (e.g. 'per step 1 GATE' / 'per the per-value honesty rule') instead of re-expanding the full condition text. The file already uses this pattern in places — apply it consistently to the honesty and GATE conditions. |
| ⚪ Low | Input Contract | Problem: The <prerequisites> block lists 'Raw test case data available', 'API endpoint contracts available', 'Gap analysis and user clarifications completed' but never states the concrete input paths/format; step 1 says only 'Read all input documents provided by the calling workflow' with no path defaults or shape, unlike the sibling aqa-codebase-analysis skill which has an explicit input table with default paths.Reason: Without an explicit input shape the agent can misread which document is the contract vs the test cases, weakening the otherwise strong GATE that depends on telling them apart. Solution: Add a short input table (input name, expected format e.g. markdown/JSON, who supplies it) mirroring the aqa-codebase-analysis <input_contract> table, even if all paths are workflow-supplied with no defaults. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/api-test-spec-authoring/references/templates-and-redaction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/aqa-codebase-analysis/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: The 'Coverage epistemic-honesty rule' and the 9-required-sections requirement are each restated across multiple blocks. The Coverage rule is defined in step 8 ('canonical — referenced from steps 3, 4, <failure_handling>') yet its full effect is re-expanded in steps 3, 4, 7-path note, <failure_handling> (two branches), <validation_checklist> ('Coverage section enumerates every optional input...'), and <pitfalls> ('Silently omitting absent optional inputs...'). 'All 9 sections required / no section blank' likewise appears in step 8, <output_format>, and <validation_checklist>. The SKILL is ~12.3KB resident.Reason: Multiple full restatements of the same two rules bloat the always-resident SKILL.md and create maintenance drift risk when one copy is edited and the others are not. Solution: Since step 8 already declares the Coverage rule canonical, replace the re-expansions in steps 3/4/7 and the pitfall with a short pointer ('apply the Coverage rule, step 8') — the steps 3 and 4 already do this for the trigger but then the rule's full text reappears elsewhere. Collapse the duplicate 9-section assertions to one canonical statement plus pointers. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/aqa-codebase-analysis/references/report-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/aqa-requirements-elicitation/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Reference Integrity | Problem: The skill hard-codes specific step numbers of its parent workflow phase: <when_to_use_skill> says "those are the parent phase's responsibility (<ask_questions step=\\"2.2\\"> uses the questioning skill)"; step 6 repeats <ask_questions step="2.2">; the worked example references "step 2.2 of aqa-flow-requirements-clarification" and "step 2.4 of the clarification phase". A skill is not supposed to know which workflow/phase runs it or that phase's internal step numbering (sibling/reverse awareness). If the clarification phase renumbers 2.2/2.4, this skill silently goes stale.Reason: Embedding a sibling phase's internal step numbers couples the skill to that phase's structure; the numbers will drift out of sync and mislead the agent about where the handoff actually goes. Solution: Refer to the handoff by role/keyword only — e.g. 'the parent clarification phase's questioning step (uses the questioning skill)' — and drop the literal step="2.2" / step 2.4 numbers. Keep the questioning-skill keyword as a semantic contract cue (allowed), but remove the phase-internal step identifiers. |
| 🔵 Medium | Reference Integrity | Problem: References to aqa-flow-code-analysis.md <naming_convention> appear in <prerequisites> and <failure_handling> for resolving <test-name>. The naming convention is owned by a workflow phase the skill does not execute; the skill points into that phase's internal anchor for an operational term it depends on.Reason: Depending on another phase's internal anchor for a core operational term means the skill breaks if that anchor is renamed, and it assumes knowledge of a sibling the skill should not have. Solution: Either define the <test-name> slug rule inline in the <input_contract> (it is a simple filename-parse rule), or describe it as 'the <test-name> slug supplied/resolved by the calling workflow' without deep-linking the phase file's internal section. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/aqa-selector-management/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Single Responsibility | Problem: The skill bundles two responsibilities that the file itself says are invoked by two different phases: Part A (read-only identification, invoked by aqa-flow-selector-identification) and Part B (writes page-object files, invoked by aqa-flow-selector-implementation). The healthy SRP target is 1-2 related responsibilities; here identification and implementation have different safety profiles (read-only vs file-writing) and different invoking phases, which is why every block (<safety_boundaries>, <failure_handling>, <validation_checklist>, <pitfalls>) has to fork into Part-A-inline vs Part-B-deferred halves.Reason: Two phases with opposite write profiles in one skill raises the risk that a read-only Part A run accidentally follows a Part B write instruction; tight scope binding is the only thing preventing a safety-boundary crossover. Solution: This is acceptable IF the design rationale holds, but it should be challenged: the 'why one file' rationale (shared 4-tier taxonomy + inventory shape + handoff semantics) is real, so the merge is defensible — keep it, but make the resident SKILL.md carry ONLY Part A inline and move ALL Part B mechanics/checklist/pitfalls to the reference (already mostly done), so a Part A invocation never pays Part B cognitive cost. Verify no Part B write-path detail leaks into the always-resident SKILL.md. |
| 🔵 Medium | Bloat Control | Problem: The Part-A-vs-Part-B scope rule and the fragile-selector discipline are each restated several times. The scope rule is declared 'canonical' in <when_to_use_skill> but re-expanded in <input_contract> ('Existence + scope validation'), <safety_boundaries> ('Part A / Part B scope is governed by the canonical rule... Enforcement:'), and implicitly in every forked block. The fragile-selector rule appears in step 7 (gate), <safety_boundaries> ('Fragile-selector discipline'), <failure_handling>, and <pitfalls>. SKILL.md is ~13.3KB — the largest of the set.Reason: At 13.3KB resident, repeated full restatements push the always-loaded portion up and invite drift between the copies; the file already adopts a 'canonical + pointer' pattern, so the duplicates are avoidable. Solution: State the Part-A/Part-B scope rule and the fragile-selector rule once as canonical (already labeled so in <when_to_use_skill> / step 7) and have the other blocks cross-reference them by name rather than re-expanding the enforcement text. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/aqa-selector-management/references/strategy-and-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/aqa-test-authoring/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Bloat Control | Problem: SKILL.md is 15KB. Much of the volume is meta-commentary about where the single source of truth lives rather than instructions: e.g. step 5 'Template load point (canonical): the verbatim template at [...] is loaded once at step 5 (the emit step) — <output_format> references this load point, not its own.' and <output_format> 'The verbatim template is loaded at the canonical load point declared in step 5 (see process step 5; not repeated here).' These two blocks restate the same DRY-bookkeeping fact in both directions. Reason: Per-call cost is paid on every invocation. The agent does not need prose explaining why a rule is stated once; it needs the rule. Trimming meta-commentary lowers cognitive load and token cost without losing any behavior. Solution: Collapse the bidirectional 'canonical load point' notes to a single one-line pointer at step 5 and drop the mirrored disclaimer in <output_format>. Remove parenthetical self-justifications ('not its own', 'not repeated here', 'mirrors the sibling pattern') that explain the DRY mechanism rather than instruct. |
| 🔵 Medium | Cognitive Budget | Problem: The 10K-20K size band combined with dense cross-reference bookkeeping (repeated 'canonical', 'single source of truth', 'referenced here, not restated' phrases across <success_criteria>, <output_format>, <validation_checklist>, ) raises resident cognitive load for what is fundamentally a write-the-test skill. Reason: Repeated DRY-pointer phrasing competes for attention with the actual procedure and risks the agent over-weighting bookkeeping over the authoring task. Solution: Deduplicate the repo-docs-win / silent-drop pointers: state each once and let other blocks name it in <=5 words instead of re-explaining the SoT relationship each time. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/aqa-test-authoring/references/test-implementation-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/aqa-test-debugging/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Single Responsibility | Problem: The skill explicitly bundles two responsibilities with different risk profiles: '<when_to_use_skill>' states 'The skill bundles two responsibilities with materially different risk profiles: Part A — Report Analysis (read-only)... Part B — Corrections (writes test source files...)'. It even pre-announces the eventual split: 'The split is preserved so future SRP tightening (extracting Part B to a sibling skill) is a one-step refactor.' A read-only analyzer and an approval-gated code mutator are two skills sharing one file. Reason: Combining read-only triage with an approval-gated write path in one skill widens the blast radius: a Part-A-only caller still loads the write-path framing, and the safety boundary for Part B must be re-asserted defensively throughout. SRP separation would make the read-only contract unambiguous. Solution: Either split Part B into a sibling skill now (the file already documents this as a one-step refactor), or, if kept together for the v1, drop the speculative future-refactor sentence — it is non-operational meta-commentary that does not change agent behavior. |
| 🔵 Medium | Cognitive Budget | Problem: The SKILL.md is large (~12-15KB, in the 10K-20K band) and is an always-loaded entry file; the repeated DRY-bookkeeping meta-commentary about where rules live adds to the resident token cost on every invocation. Reason: Smaller always-loaded entry files leave more room for task context and reduce the risk of context compaction that makes the agent unreliable. Solution: Move the heavy Part A/Part B procedural detail into the existing references/ files and keep only the dispatch logic and canonical rules in SKILL.md; declare each rule once. |
| 🔵 Medium | Bloat Control | Problem: SKILL.md is ~13KB. A recurring meta-pattern adds volume without instruction: phrases like 'Canonical taxonomy (single source of truth)', 'Downstream sections reference this list by name', 'canonical', 'always-loaded', 'Loaded only when Part B runs' recur across nearly every block. The Part B half is a bare list of 'see references/part-b-mechanics.md#...' pointers that duplicates the reference's own pitfalls header. Reason: The repeated load-split bookkeeping is paid on every invocation and competes with the procedural steps the agent must actually execute. Solution: State the taxonomy-is-canonical and Part-A/Part-B load-split facts once near the top; let later blocks omit the repeated 'canonical/always-loaded' qualifiers. Replace the Part B stub list with a single line pointing to part-b-mechanics.md pitfalls. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/aqa-test-debugging/references/escalation-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/aqa-test-debugging/references/part-b-mechanics.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/automation-test-execution-analysis/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: SKILL.md is ~12.6KB. The domain-skill-contract concept is restated three times in near-identical wording: <core_concepts> ('This skill orchestrates around the domain skill's read-only output contract... without knowledge of the domain skill's internal structure'), <input_contract> row ('invoked under its analysis-only / read-only output contract — its job here is to emit the categorized analysis artifact, not to mutate source'), and process step 6 ('USE the resolved domain analysis skill under its analysis-only / read-only output contract — it MUST emit the categorized analysis artifact and MUST NOT mutate source files'). The same MUST/MUST-NOT pair appears each time. Reason: Triple restatement of the same contract inflates the always-loaded body and dilutes the single authoritative statement the agent should anchor to. Solution: State the read-only domain-skill contract once (it belongs in <core_concepts>) and reference it tersely at step 6 and in the input_contract row rather than re-spelling the MUST/MUST-NOT each time. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/automation-test-execution-analysis/references/redaction-policy.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/automation-test-implementation-handoff/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Bloat Control | Problem: The file is ~14.2K chars (in the 10K-20K high-severity bloat band per the audit spec). The canonical 'domain skill required + no silent fallback' rule is restated at least five times: core_concepts ('Canonical ... rule lives in step 4 GATE'), the <recommended_foundational_skills> table row, process step 4 GATE ('Silent fallback to coding + testing alone is forbidden'), <failure_handling> ('Domain skill name not supplied AND no conventional fallback discoverable'), and <pitfalls> ('Silently proceeding when the parent did not name a domain skill'). The same applies to the 'verify foundational skill is loaded, this skill does NOT load it' statement, which appears in core_concepts, the table preamble, every process step 1-4, and <failure_handling>.Reason: The same constraint repeated five ways inflates the prompt without adding meaning and increases the chance the agent skips a step under load; one authoritative statement plus pointers is more reliable and far smaller. Solution: State the 'no silent fallback' rule once at step 4 GATE and replace the other four occurrences with a bare cross-reference (e.g. 'see step 4 GATE'). State the 'verify-don't-load' contract once (already in core_concepts) and drop the repeated 'this skill does NOT load it; the calling workflow recommends + loads it' clause from steps 1-4 and the table. |
| 🔵 Medium | Cognitive Budget | Problem: Five overlapping list sections carry near-identical content: <recommended_foundational_skills> table, <process> steps 1-4, <failure_handling>, <validation_checklist>, and <pitfalls> all enumerate the same foundational-skill and domain-skill verification logic. An agent must hold all five in working memory to act on step 4.Reason: Fewer non-overlapping sections reduce the cognitive search space and the risk the agent reconciles contradictory-looking duplicates. Solution: Collapse <pitfalls> into <failure_handling> (they restate the same failure modes) and trim <validation_checklist> to outcomes not already implied by <process> GATEs. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/aqa-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Conflict Resolution | Problem: The <orchestration_and_escalation> 'Verification-failure unilateral-start override' instructs the agent to start the earliest incomplete phase in the same turn and explicitly 'do NOT call AskUserQuestion, present options, or ask how do you want to proceed'. This is an auto-proceed decision embedded in a workflow, which sits in tension with the file's own NO-ASSUMPTIONS rule and the principle that HITL/user-involvement defaults live in the hitl skill / bootstrap-hitl-questioning. The override is well-guarded (3-part precondition, ambiguity-defaults-to-ASK fallback, scoped to one gate, cites the hitl skill as authority), so the risk is contained, but the workflow still hardcodes a no-ask branch.Reason: Embedding a 'do not ask the user' branch in a workflow can be over-applied by an agent under load; an explicit one-line scope-lock keeps the narrow exception from leaking into other decisions. Solution: Keep the gate but add one explicit deference line stating that this single override is the only sanctioned deviation from the hitl skill defaults and applies only when all three preconditions hold, so a reader cannot generalize the no-ask behavior to other branches. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ✅ Much better |
| Cognitive Budget | 4 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/aqa-flow-code-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/aqa-flow-data-collection.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Structural Coherence | Problem: The <workflow_context> block mixes operational inputs/outputs with non-operational KB-taxonomy meta-notes that belong to documentation, not phase execution. The bullet 'KB catalog / ACQUIRE success: Tags above resolve to Rosetta markdown in this repository (instructions/r3/core/skills/confluence-source-harvesting/SKILL.md, instructions/r3/core/rules/bootstrap-guardrails.md). Broader taxonomy: docs/definitions/skills.md, docs/definitions/rules.md' spells out internal repository file paths and taxonomy pointers that an executing agent does not need to run Phase 1, and per pa-rosetta a phase should reference prompts by logical name (ACQUIRE tag) only, not by deep file path.Reason: Deep internal file paths and taxonomy notes are documentation, not runtime instructions; they add length and risk drift if files move, while the logical ACQUIRE tag is the agent-agnostic contract that actually matters. Solution: Keep the operational part ('Successful ACQUIRE means Rosetta returns >=1 non-empty document for the tag') and drop the explicit instructions/... file paths and docs/definitions/* taxonomy pointers from the bullet; reference the skills by their ACQUIRE tags only. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/aqa-flow-requirements-clarification.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: The mandatory ### Explicit Assertions transcription rule and its 'typed Presence/State/Content/Behavioral + per-assertion granularity + None-clause' requirement are stated four times: in <description_and_purpose>, in <workflow_context> ('Assertion authority chain'), in step 2.4, and twice in <validation_checklist> ('Explicit Assertions subsection present...' and 'Per-assertion granularity'). The None-clause text 'None — no observable behavior derivable from current clarifications; Phase 6 will surface this as Uncovered' is reproduced verbatim three times.Reason: Repeating the same multi-clause rule and its exact fallback string four times bloats the phase and makes future edits error-prone (one copy can drift); a single authoritative copy with pointers is smaller and safer. Solution: Define the typed-assertion format and the None-clause once in step 2.4 (the operational owner) and reduce the <description_and_purpose>, <workflow_context>, and checklist mentions to a one-line pointer to step 2.4 instead of restating the full rule and verbatim None string. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/aqa-flow-selector-identification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/aqa-flow-selector-implementation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/aqa-flow-test-correction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-test-implementation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: The handoff loading-responsibility contract is restated nearly verbatim across three blocks. <workflow_context> says the handoff "does NOT drive skill loading ... The handoff itself only verifies presence at its step-4 GATE"; <skill_handoff> repeats "The handoff verifies presence ... it does NOT ACQUIRE/USE them" and "its step-4 GATE STOPS ..."; then <execute_authoring> step 1 and the closing "User-instruction-override refusal" paragraph repeat "missing-load causes the handoff's step-4 GATE to STOP, halting Phase 6" again. The same single fact (workflow loads, handoff verifies) is asserted at least four times.Reason: Repeating the same contract four times inflates the phase and competes for attention with the actual ordered steps, making the genuinely load-bearing instructions harder to spot. Solution: Keep the contract statement once in <skill_handoff> and reduce the <workflow_context> "Loading responsibility" bullet and the step-1/refusal repetitions to a one-line back-reference (e.g. "per <skill_handoff>"). Do not delete the refusal rule itself, only the re-explanation of why. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 3 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-test-report-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow-test-report-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: The added evidence-label machinery uses heavy cross-reference scaffolding for a single concept. Task 5 bullet 3 opens with a meta-sentence ("This is the single source of truth for evidence labels — Task 3, Task 6, Completion Criteria, Update State, and Important Notes all reference this block by name ..."), and the same single-source-of-truth pointer is then echoed in Task 3, Task 6, Completion Criteria, Update State, and the **Evidence Labels** Important Note — a non-operational provenance/cross-link layer on top of the actual rule.Reason: The rule and its labels are valuable, but the repeated bookkeeping about which sections reference it is non-operational noise that grows the phase without changing agent behavior. Solution: Keep the definitions, tie-break, output rule, and undecidable fallback in Task 5. Trim the repeated "per Task 5 / single source of truth" annotations elsewhere to a bare reference (e.g. "label per Task 5") and drop the self-describing meta-sentence listing every referencing section. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/qa-data-collection/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Cognitive Budget | Problem: The SKILL.md packs an 8-step process, a full output template (~100 lines), pitfalls, safety, success_criteria, failure_handling, and validation_checklist into one always-loaded file (16.6K). Two reference files already use progressive disclosure for steps 4 and 5; the output template and the safety/validation triplet remain inline. Reason: The output template is only needed at the final write step, so deferring it keeps earlier steps lighter and reduces the chance of context overflow on the entry file. Solution: Consider moving the verbatim <output_format> markdown template to a referenced asset loaded on demand at step 7 (the same lazy-load pattern steps 4/5 already use), leaving a thin pointer in the SKILL body. |
| 🔵 Medium | Bloat Control | Problem: SKILL.md is 16.6K chars. The safety/validation layer is partly duplicated across <safety_boundaries>, <pitfalls>, <success_criteria> step 6.1, <failure_handling>, and <validation_checklist> — e.g. the secret-scan requirement is restated in step 6.1, success_criteria, the validation_checklist 'Safety re-check' item, and the pitfalls 'Copying literal .env values' bullet. The skill itself acknowledges this ('single source of truth' is invoked repeatedly), but the cross-referencing prose adds length on every load.Reason: This is the entry-point file loaded on every invocation; trimming repeated restatements lowers per-call context cost without losing the single-source guarantee. Solution: Keep <safety_boundaries> as the single source and shorten the re-statements in success_criteria / validation_checklist / pitfalls to one-line pointers (e.g. 'secret-scan per <safety_boundaries>') rather than re-describing the credential list each time. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/qa-data-collection/references/backend-source-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/qa-data-collection/references/existing-test-patterns.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/qa-gap-analysis/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: SKILL.md is 14.7K chars with notable overlap between <success_criteria> and <validation_checklist>: the Cross-Reference-per-step requirement, Executive-Summary-counts-match-body, redaction re-scan, and assumption-fields checks each appear in both sections phrased differently (e.g. success_criteria 'Every test step ... has been cross-referenced' vs validation_checklist 'Cross-Reference entry-per-step grep'). The file flags this overlap itself ('section-presence ... enforced by <success_criteria>; this checklist verifies things <success_criteria> cannot directly assert') yet still restates the shared items.Reason: This entry file loads on every invocation; collapsing the duplicated done-conditions lowers per-call cost while keeping a single authoritative statement of each rule. Solution: Reduce the restated items to a one-line pointer so each contract is stated once, keeping only the genuinely proof-oriented grep checks unique to <validation_checklist>. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/qa-project-config/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: SKILL.md is 15.6K chars. The canonical-path constraint ( agents/qa/qa-project-config.md project-wide, not per-IDENTIFIER) and the redaction-at-intake rule are each restated 3-4 times — appearing in <process> step 3/5, <safety_boundaries>, <failure_handling>, <pitfalls> ('Writing the project config under .../{IDENTIFIER}/...'), and <validation_checklist> ('Canonical paths only' + 'No literal credentials persisted'). The IDENTIFIER-consistency rule is similarly spread across step 2, failure_handling, pitfalls, and validation_checklist.Reason: Repeating the same constraint four times on an always-loaded entry file inflates per-call context without strengthening the contract. Solution: State the canonical-path rule and the redaction-at-intake rule once each in their authoritative section and replace the duplicate restatements in pitfalls/success_criteria/validation_checklist with short pointers. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/qa-test-debugging/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Single Responsibility | Problem: The skill explicitly bundles two responsibilities with 'materially different risk profiles' — Part A (read-only report analysis, steps 1-5) and Part B (writes test source files + runs lint, steps 6-8): '<when_to_use_skill>' itself states 'The skill bundles two responsibilities'. A read-only analysis capability and a code-mutating correction capability are gated together in one 17.1K skill, and the prompt has to add a 'Part A / Part B usage boundary' guard plus a 'must not be conflated' warning to keep them separate. Reason: Coupling a safe read-only path with a destructive write path in one skill forces extra guard prose and raises the risk a caller accidentally authorizes mutation when only analysis was wanted. Solution: Consider splitting into two skills (e.g. qa-test-report-analysis = Part A, qa-test-correction = Part B) so the read-only and code-mutating mandates are independently invocable; if kept together, the explicit Part-A-only guard is the right mitigation but the SRP cost remains. |
| 🔵 Medium | Cognitive Budget | Problem: This 17.1K SKILL.md carries an 8-step two-part process, a 7-entry failure-category catalog, two embedded markdown templates (per-failure entry + <output_format>), pitfalls, safety, failure_handling, success_criteria, and validation_checklist all inline with no progressive disclosure, unlike its sibling qa-data-collection which offloads detail to references.Reason: Keeping all detail inline on the entry file increases the chance of context pressure and makes the >5-step process harder to execute reliably. Solution: Apply the same reference-file split the qa-data-collection skill uses (failure catalog and/or per-failure template as on-demand assets), so the entry file stays light and the heavy detail loads only when the relevant step runs. |
| 🔵 Medium | Bloat Control | Problem: At 17.1K chars this is the largest assigned file. The failure-category catalog (7 categories, each with Symptoms/Root Cause/Action) in step 3 plus the full per-failure markdown template, the <output_format> template, and the <validation_checklist> create overlap — e.g. the safety re-scan target list appears in <safety_boundaries>, in step 3's inline note, in <pitfalls>, and in the validation_checklist 'Safety re-scan ran' item.Reason: 17K on an always-loaded entry file is a real per-invocation cost; the detailed category catalog is only needed when failures are actually being classified. Solution: Move the 7-category failure catalog (step 3) to an on-demand reference file (same progressive-disclosure pattern qa-data-collection uses), leaving a thin category list in the SKILL body; reduce the duplicated safety-scan restatements to pointers. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/qa-test-implementation/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/qa-test-implementation/references/multi-language-examples.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/repository-implementation-standards/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/requirements-synthesis/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/requirements-synthesis/references/output-schemas.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/sequential-workflow-execution/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: The skill is ~12.9KB. The <gate_priority> block restates the same step-8-vs-step-10 precedence three times: once in the table (step 8 wins), once in the Precedence rule paragraph, and again in the Reconciliation with hitl skill paragraph, and it is restated a fourth time in <pitfalls> ('when in doubt, gate_priority says step 8 wins'). The same fact is also embedded in step 8 and step 10 of <process>.Reason: The precedence rule is correct but repeated 4-5 times, inflating context cost for every agent that loads this MUST-apply skill on every multi-phase workflow without adding new information. Solution: Collapse <gate_priority> to the table plus a single one-line precedence rule; drop the Reconciliation with hitl skill paragraph (its content is already implied by the table's 'User input role' column) and remove the redundant restatement in <pitfalls>. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/user-approved-code-changes/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/swagger-contracts-analysis/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/swagger-contracts-analysis/references/canonical-example.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/swagger-contracts-analysis/references/failure-handling-edge-cases.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Reference Integrity | Problem: The Spec-vs-code branch trigger references the calling workflow's internal step numbers verbatim: "the routine 'spec vs code cross-check' step (step 1.5 / step 2.4 equivalent in the calling workflow's process)". This skill reference asset hard-codes another artifact's (the workflow phase's) internal numbering, which crosses the skill/workflow isolation boundary — the skill should not know the calling workflow's step IDs. Those exact numbers do not even appear in SKILL.md (its own cross-check is step 5.1). Reason: Hard-coding a sibling artifact's step numbers breaks if the workflow renumbers, and leaks workflow internals into the skill, which is a boundary violation that makes the reference brittle. Solution: Refer to the skill's own reconciliation step by name (e.g. SKILL.md <process> step 5 "Reconcile and Validate") instead of citing the calling workflow's step 1.5 / step 2.4. Drop the cross-workflow step IDs. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/swagger-contracts-analysis/references/redaction-catalog.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Safety Boundaries | Problem: The <skip_rules> block (line ~29) states that a user instruction to bypass a gate without supplying artifacts "must be refused ... and Phase 0 still begins in the same turn", while line ~28 forbids calling AskUserQuestion. This overrides an explicit user instruction and auto-starts Phase 0 unilaterally, with no "ambiguity defaults to ASK" carve-out and no scope-lock deferring to the hitl skill — the strongest unilateral-start form in the PR. Reason: Auto-overriding an explicit user instruction is a stronger HITL deviation than the contained verification-failure case; without an ambiguity fallback, borderline phrasings get force-restarted and the agent appears to ignore the user. Solution: Add an ambiguity-defaults-to-ASK carve-out and an aqa-flow-style scope-lock sentence: when the user gives an explicit skip instruction with missing artifacts, announce-and-proceed in one line (state artifacts are missing) rather than framing it as "refusing a user instruction"; route any partial/uncertain state to the normal HITL ask path. |
| 🔵 Medium | Bloat Control | Problem: The <skip_rules> block (lines 24-37) is disproportionately large for a workflow entry file (~12K chars total, in the 10K-20K signal range). The single skip-gate concept is restated many times: "MUST NOT ... call AskUserQuestion; present a list / menu / options block; ask the user 'how do you want to proceed', 'should I start at X', 'do you want me to', or any equivalent confirmation request; pause for input" plus a near-duplicate restatement "User instruction to bypass the gate without supplying the artifacts must be refused with the same one-line announcement and Phase 0 still begins in the same turn." The anti-confirmation rule is asserted three separate ways.Reason: A workflow entry file is loaded into context on every run; redundant restatement of one gate inflates the always-loaded budget and buries the numbered phase list that is the file's real job. Solution: Collapse the skip-gate prohibition into one MoSCoW line plus the example. Keep the (a)/(b)/(c) verification and the one-line refusal announcement; remove the duplicated 'in the same turn' / 'must be refused' restatements that repeat the same behavior. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 3 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-api-spec-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Reference Integrity | Problem: Step 2.1 cites RefSrc/{project-name}/docs/ twice ("from Rosetta docs at RefSrc/{project-name}/docs/" and "read ARCHITECTURE.md and CODEMAP.md from RefSrc/{project-name}/docs/"). The canonical Rosetta target-project folder is refsrc/ (lower-case, per the standard structure). The wrong casing will not resolve on case-sensitive filesystems and is inconsistent with the rest of the Rosetta folder vocabulary.Reason: A mis-cased path silently fails to resolve on Linux, so the agent skips the architecture/codemap pre-read it was told to do, degrading the analysis. Solution: Change RefSrc/{project-name}/docs/ to the canonical refsrc/{project-name}/docs/ (both occurrences) to match the standard target-project folder name. |
| 🔵 Medium | Reference Integrity | Problem: Step 2.1 deep-links into another skill's internal step numbering: "(see qa-data-collection skill, step 4 for full discovery logic)". A workflow phase should not depend on a sibling skill's private step numbers; if qa-data-collection renumbers, this pointer breaks, and it crosses the phase/skill boundary.Reason: Citing a sibling skill's internal step number couples the two artifacts and breaks silently when the skill is edited. Solution: Reference the qa-data-collection skill by name and the logical activity (backend-source discovery) rather than its internal step 4. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-data-collection.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-documentation-mcp-subflow.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-project-config-loading.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Bloat Control | Problem: At 9835 chars this phase file is the second largest in the flow and directly violates the phase schema's explicit directive ( docs/schemas/phase.md): 'the file must be small and short, skills already define how things work! Be concise! Save tokens!'. The <config_contract> block carries a 9-row field table AND a full illustrative markdown snippet AND a separate <initial_data_contract> template, while the file simultaneously admits the authoritative template 'lives in the qa-project-config skill'. The config field semantics are thus maintained in two places.Reason: Duplicated config templates drift apart over time; the larger file also eats cognitive budget that the schema deliberately reserves for the skill. Solution: Keep the bound-field table (downstream phases need exact key names) but drop the full illustrative # QA Project Config snippet, since the skill owns the canonical template and this duplicates it. Reference the skill template by name instead of reproducing a representative shape. |
| 🔵 Medium | Cognitive Budget | Problem: The same fact (project-wide config lives at agents/qa/qa-project-config.md and is NOT copied per-session) is restated in <workflow_context> Output bullet, <execute_config> steps 3 and 4, <config_contract> intro, <failure_handling>, and <validation_checklist> — five repetitions of one invariant.Reason: Repeating one invariant five times inflates the prompt without adding new instruction, increasing the chance the agent skips a real step buried among restatements. Solution: State the canonical-path / not-per-session invariant once in <workflow_context> and reference it; remove the re-explanations in the contract intro and failure-handling prose. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-gap-and-requirements-clarification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-test-case-specification.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Precision & Explicitness | Problem: Step 4.4 full-approve branch says approval requires 'an exact approval token (per the strict-token rule shared with step 7.2)'. The 'strict-token rule' is defined in a different phase file ( qa-flow-test-correction.md step 7.2), but no token list is stated here. Per the phase-isolation model phases do not read sibling phases, so an agent running Phase 4 standalone has no concrete token list to enforce 'exact'.Reason: A cross-phase pointer to a token list the agent cannot see at Phase-4 time makes the 'exact token' requirement unenforceable, weakening the approval gate it is meant to harden. Solution: Inline the closed token list (e.g. approved / approve / yes, case-insensitive) directly in step 4.4, or define it once in the parent qa-flow.md and reference the parent. Do not point laterally to step 7.2 as the source of the rule. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-test-implementation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Bloat Control | Problem: The handoff contract (calling workflow loads the 4 skills; handoff only verifies presence at its step-4 GATE; does NOT itself ACQUIRE/USE) is stated four separate times: in <workflow_context> ('Loading responsibility' bullet), in <skill_handoff> (full restatement plus acceptance criteria), in <execute_implementation> 'Routing' preamble, and again inside step 5.1 item 1 and item 4. This is the core redundancy in a 9724-char file that the phase schema asks to keep 'small and short'.Reason: Restating one contract four ways quadruples reading cost and risks the agent treating the variants as distinct rules; the schema explicitly reserves this detail for the skill, not the phase. Solution: Keep <skill_handoff> as the single authoritative statement of the handoff contract; reduce <workflow_context> and the <execute_implementation> preamble to one-line pointers ('handoff contract: see <skill_handoff>'). Remove the duplicated GATE explanation from step 5.1 items 1 and 4. |
| 🔵 Medium | Cognitive Budget | Problem: The phase file (~9.7KB) restates the handoff contract four times and carries two overlapping validation lists, inflating the per-turn context for a phase that should be small per the phase schema. Reason: Leaner phase files keep the orchestration loop within the reliable step budget and lower token cost on every call. Solution: State the handoff contract once and merge the two validation lists into one MECE checklist, deferring mechanics to the referenced skill. |
| 🔵 Medium | Structural Coherence | Problem: Phase-exit criteria are split across three overlapping blocks — <validate> step 5.2 ('in-progress validation items'), <validation_checklist> ('authoritative exit gate'), and the contract prose — with explicit cross-annotations like 'covers <validate> item 1'. The file itself acknowledges the divergence ('Supersedes any divergence with <validate> step 5.2'), signaling the two lists are not MECE.Reason: Two overlapping validation lists with cross-reference annotations force the agent to reconcile which list governs, inviting missed or double-counted checks. Solution: Collapse <validate> 5.2 and <validation_checklist> into one authoritative checklist; if an in-progress vs exit distinction is truly needed, keep 5.2 to only the items NOT in the exit checklist instead of restating overlapping items with 'covers item N' annotations. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 2 | ⬇️ Slightly worse |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-execution-and-report-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-test-correction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/confluence-source-harvesting/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/confluence-source-harvesting/references/redaction-and-normalization.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/gap-and-contradiction-analysis/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/gap-and-contradiction-analysis/references/entry-templates-and-document-skeleton.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/mcp-confluence-data-collection/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/mcp-confluence-data-collection/references/cql-and-redaction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/mcp-confluence-data-collection/references/vendor-swap.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/mcp-jira-data-collection/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: Step 6 "Custom-field discovery fallback: see step 3 custom-fields branch (canonical) — no separate procedure." is a pure pointer step that carries no instruction; the custom-field logic is already fully stated in step 3. It exists only to be cross-referenced from <failure_handling> and <validation_checklist>, padding the numbered process. The same redaction content also appears across <safety_boundaries>, step 5, <validation_checklist>, and <pitfalls> (grep patterns and placeholder vocabulary partially restated), and at ~13.3KB the file sits in the 10K-20K bloat-signal band.Reason: An instruction-free numbered step and repeated pattern lists inflate the read cost of an always-loaded SKILL.md without adding behavior, raising the chance an agent skips or mis-sequences steps. Solution: Delete the empty step 6 and renumber, or fold its cross-reference into step 3's heading; keep grep patterns/placeholders only in <safety_boundaries> as the single source of truth and have other sections reference it by name rather than partially restating pattern lists. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/mcp-jira-data-collection/references/vendor-swap.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/mcp-testrail-data-collection/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: The full vendor-replacement porting guide is inlined in <vendor_replacement> (lines 139-156: per-vendor rebind list for Zephyr/Xray/qTest/Polarion, identifier formats, field semantics, swap pattern). The two sibling skills (mcp-jira-data-collection, testrail-test-case-export) instead push this maintainer-only material to a references/vendor-swap.md / vendor-porting.md loaded on demand, keeping the always-loaded SKILL.md lean. Here the porting guide is always loaded even though the file states it is "load only when forking, not at runtime" elsewhere in the family. The <safety_boundaries> redaction targets/patterns are also restated again in <validation_checklist> and <pitfalls>.Reason: Maintainer-only fork instructions are loaded into every runtime extraction, wasting context the family deliberately reserves via progressive disclosure; inconsistency with the two sibling skills also confuses future maintainers. Solution: Move the <vendor_replacement> body to references/vendor-swap.md and leave only a one-line on-demand pointer in SKILL.md, matching the sibling Jira/export skills; reference the redaction targets from <safety_boundaries> by name in the checklist/pitfalls instead of re-listing patterns. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/testrail-test-case-authoring/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: The redaction discipline is stated three times across the always-loaded SKILL.md: <success_criteria> ("No literal credentials / tokens / real PII appear"), <pitfalls> ("Pasting literal real-account passwords ... apply <safety_boundaries> placeholders"), and the full <safety_boundaries> operational block — which then ALSO points to the references catalog. The shape-preserving-placeholder sentence ("If a real production value would be the natural example, replace it with a clearly-fake placeholder of the same shape") appears in both <safety_boundaries> and again verbatim in the references file.Reason: Repeating the same safety rule across four blocks inflates an already 13KB always-loaded skill and risks the copies drifting out of sync on future edits. Solution: Keep the operational redaction rule once in <safety_boundaries>; in <success_criteria> and <pitfalls> reference it by tag name rather than restating the rule text. Remove the duplicated shape-placeholder sentence from one location. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/testrail-test-case-authoring/references/examples-and-redaction.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Reference Integrity | Problem: Line 6 deep-links a sibling skill's private internals: "Mirrors the same lazy-loading pattern the sibling swagger-contracts-analysis skill uses (references/redaction-catalog.md + references/canonical-example.md)." This names a sibling skill and points into its private references/ files. Per the skill-isolation boundary (no lateral/sibling awareness, no cross-skill deep linking), a skill's own reference file must not know about or link into another skill's private content. The paths happen to exist today but the coupling is a boundary violation regardless.Reason: Cross-skill awareness couples two independently-evolving skills: if swagger-contracts-analysis renames or removes those reference files, this note silently rots, and it teaches the maintainer that deep-linking sibling internals is acceptable, eroding the isolation guarantee.Solution: Delete the parenthetical sibling reference on line 6. If a rationale for the lazy-loading split is wanted, state it generically ("split per progressive-disclosure best practice") without naming swagger-contracts-analysis or its private file paths. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/testrail-test-case-export/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Dependency Management | Problem: Numeric vendor mappings are baked directly into the runtime <process>: priority priority_id: 4/3/2/1 (step 3) and type type_id: 1/7/6/8/9/10 (step 4). These are TestRail-instance-specific magic numbers. The skill does note <input_contract> allows optional per-case priority_id/type_id overrides and <pitfalls> warns they "may differ per TestRail instance", but the defaults remain hardcoded constants rather than retrieved from project config.Reason: Hardcoded priority/type IDs silently mis-map cases when a TestRail instance uses a customized priority/type table, producing wrong-priority cases in an irreversible external write; this is the portability concern the gate targets. Solution: Keep the numeric defaults but explicitly state they are the documented TestRail-default fallback and instruct the agent to prefer the parent workflow's TMS-config mapping when supplied (the override path already exists in <input_contract>); reference the config source by name so the baked-in numbers are clearly the last resort. |
| 🔵 Medium | Bloat Control | Problem: The step-7 confirmation-gate / dedup / sensitive-scan rules are stated in full in <process> step 7, then restated almost in full again in <safety_boundaries> (no-write-without-confirmation, dedup-pre-scan-every-run, redaction targets) and a third time in <validation_checklist> and a fourth time in <pitfalls>. The placeholder examples diverge across blocks — <safety_boundaries> uses {valid_token}, {admin_token}, <bearer-token-for-test-user> while the sibling authoring skill standardizes on <valid bearer token> shape — risking inconsistent placeholders in exported cases.Reason: Four near-duplicate copies of the destructive-write gate enlarge an always-loaded skill and can drift apart on edits; mismatched placeholder styles between author and export steps can let an inconsistent or unredacted value slip into an irreversible external write. Solution: State the gate procedure once in step 7; have <safety_boundaries>/<validation_checklist>/<pitfalls> reference it by step number instead of re-describing it. Align the placeholder vocabulary with the authoring skill's catalog so the same token shapes are used end-to-end. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/testrail-test-case-export/references/vendor-porting.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/adhoc-flow.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/testgen-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Instruction Ordering | Problem: The verification-failure no-ask override is placed as two bare top-level bullets (lines ~22-23) in the shared <workflow_phases> preamble, directly adjacent to "USER CONFIRMATION: Wait for approval before next phase" (line ~28), with no precedence marker, no scope-lock, and no "ambiguity defaults to ASK" fallback. aqa-flow.md isolates the identical rule in a dedicated <orchestration_and_escalation> block with both guards. Reason: The same rule is safe in aqa-flow and leak-prone here purely due to placement and a missing ambiguity fallback; under context compaction an agent can carry the absolute "MUST NOT call AskUserQuestion" framing into the happy path. Solution: Mirror aqa-flow: move the two override bullets into a dedicated fenced block, add the "if any precondition is uncertain or only partially true → fall back to the normal HITL ask path; ambiguity defaults to ASK" sentence, and add one precedence line stating the per-phase USER CONFIRMATION still governs the happy path. |
| 🟡 High | Conflict Resolution | Problem: The <workflow_phases> preamble contains two directly competing instructions with no stated precedence. The added anti-skip gate says: "the only correct next action is a one-line announcement ... followed by beginning the earliest incomplete phase in the same turn, without yielding to user input" and "the agent MUST NOT ... call AskUserQuestion; ... pause for input before starting the earliest incomplete phase." Two bullets later the same block still says "USER CONFIRMATION: Wait for approval before next phase." A reader cannot tell whether to pause for approval or to proceed same-turn, and the gate's scope (verification-failure only) is not fenced off from the general per-phase confirmation rule.Reason: Without an explicit hierarchy the agent may either skip a legitimate HITL confirmation or stall when it should resume, producing inconsistent behavior across runs. Solution: Explicitly scope the no-pause/no-AskUserQuestion gate to the verification-failure resume case only (e.g., prefix it "On verification failure ONLY:") and add one precedence line stating that normal per-phase USER CONFIRMATION still applies for the happy path. Keep both rules but mark which wins in which situation. |
| 🔵 Medium | Dependency Management | Problem: New phase headers hardcode dated model identifiers, e.g. subagent_recommended_model="claude-opus-4-6, gpt-5.4-high" on phases 2,3,4. These version pins are not parameterized and will rot; a retired model id can fail or silently downgrade subagent dispatch. Reason: Baked-in model version strings become wrong within one model cycle; tier-based hints stay correct across vendors and releases, matching Rosetta's agent-agnostic principle. Solution: Replace concrete dated model ids with capability tiers (e.g. tier: complex / tier: workhorse) defined in the bootstrap, or centralize the model map in one referenced config instead of per-phase pins. |
| 🔵 Medium | Safety Boundaries | Problem: The new gate forbids AskUserQuestion and any confirmation request "before starting the earliest incomplete phase" and asserts "there is nothing for the user to confirm." This is a broad suppression of HITL that, if read out of its intended narrow scope, overrides the session-wide HITL questioning policy that normally governs approval gates.Reason: Over-broad suppression of user confirmation can cause the agent to bypass required human approval, which is a safety regression in an enterprise workflow. Solution: Constrain the prohibition to the exact verification-failure branch and add an explicit carve-out that genuine HITL approval gates (Phase 3, Phase 6) and any safety/destructive confirmations are unaffected. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 3 | ⬇️ Slightly worse |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 3 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 3 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-data-collection.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Reference Integrity | Problem: The rewrite generally replaced hardcoded mcp_Jira_MCP_* calls with skill-mediated operations, but the <common_issues> block still contains a residual concrete tool name: "Always check for child pages using confluence_get_page_children() for each found page," and <pitfalls> references get_page_children. These bypass the new skill-abstraction (confluence-source-harvesting / mcp-confluence-data-collection) that the rest of the file deliberately routes through.Reason: Mixed abstraction levels make the dependency portability inconsistent; a target project whose MCP exposes a differently named operation gets contradictory guidance. Solution: Replace the literal confluence_get_page_children() / get_page_children references in <pitfalls> and <common_issues> with the abstracted phrasing already used elsewhere (e.g., "the child-page traversal operation per confluence-source-harvesting"), matching the jira_search_fields treatment. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-gap-and-contradiction-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Structural Coherence | Problem: The <create_analysis_document> section is dominated by meta-instructions about its own structure rather than the document structure. The same point is restated three times: the intro says "The fence is the complete append-only target", then "End of Pass 2 append-only block. Nothing else is appended in Pass 2. If you find yourself adding a section here, it belongs to Pass 1's skill-owned set and is misplaced.", then "### Modifiers ... These are not additional sections to append". The Pass-1/Pass-2/Modifier framing wraps a fairly small concrete output (two appended sections + a vague/specific table) in heavy self-referential scaffolding.Reason: Repeated self-referential meta-commentary inflates cognitive load and obscures the small concrete action the phase actually performs, making the instruction harder to follow reliably. Solution: Collapse the repeated "do not duplicate sections 1-6 / nothing else is appended / these are not sections" warnings into a single sentence after the fenced block, and drop the "If you find yourself adding a section here..." introspective aside. |
| 🔵 Medium | Bloat Control | Problem: The section spends roughly half its length re-explaining the Pass 1 / Pass 2 / Modifier ownership split ("This phase does NOT duplicate that template", "The fence is a delta on top of the skill's output, NOT the whole document template", "One positive / one negative pair kept inline so the rule survives even when the skill is not loaded"). These are provenance/rationale notes about why content lives where it does, which target prompts should avoid. Reason: Non-operational rationale and redundant boundary reminders are compressible without value loss and dilute the actionable steps. Solution: Remove the rationale clauses explaining why sections are split between skill and phase; keep only the operative instruction (run the skill for sections 1-6, then append the two-section delta verbatim). |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-project-config-loading.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-question-generation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-requirements-document-generation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-test-case-export.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-test-case-generation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
…dditional_context length limit
… of code-analysis-flow + requirements-authoring-flow; regenerate plugins
📋 Prompt Quality Validation Report❌ Validation FailedSummary by File
📄
|
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/api-test-spec-authoring/references/templates-and-redaction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/aqa-codebase-analysis/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/aqa-codebase-analysis/references/report-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/aqa-requirements-elicitation/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/aqa-selector-management/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/aqa-selector-management/references/strategy-and-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/aqa-test-authoring/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/aqa-test-authoring/references/test-implementation-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/aqa-test-debugging/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/aqa-test-debugging/references/escalation-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/aqa-test-debugging/references/part-b-mechanics.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/automation-test-execution-analysis/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/automation-test-execution-analysis/references/redaction-policy.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/automation-test-implementation-handoff/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Bloat Control | Problem: The file is 13302 chars (10K-20K band). The verify-don't-load contract is restated three times: in <core_concepts>, in the <recommended_foundational_skills> table prose, and again in each numbered process step (steps 1-4 each repeat 'Verify X is loaded ... If absent, stop per <failure_handling>'). The same 'no silent fallback to coding+testing' rule appears in step 4 GATE and in <failure_handling>. Reason: Instructions are not user-facing, so compression is expected; the triple restatement of the same contract adds tokens to every resend without adding behavior. Solution: Collapse the per-step 'Verify ... If absent stop per <failure_handling>' repetition by stating the verify-then-apply contract once and letting the table's 'Verified at' / 'If not loaded' columns carry the per-skill detail; keep only the domain-skill GATE inline since it is the high-signal one. |
| 🔵 Medium | Conflict Resolution | Problem: Step 5 says validate that 'tests compile or parse', while <core_concepts> says 'parsing failures belongs to a later analysis phase'. A reader could read these as competing (does this phase handle parse outcomes or not?). The intended distinction (compile/parse of authored code here vs. parsing test-run reports later) is implied but not stated explicitly. Reason: Without the explicit boundary the agent may either skip the step-5 parse check or attempt report parsing it should defer. Solution: Add one clause distinguishing 'static compile/parse of the authored test code' (in scope at step 5) from 'parsing of execution reports' (later analysis phase) so the two statements are not read as contradictory. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-best-practices.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Reference Integrity | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-extract.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Reference Integrity | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Reference Integrity | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-rosetta.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Reference Integrity | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/coding/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Epistemic Honesty | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/confluence-source-harvesting/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Cognitive Budget | Problem: At 14112 chars this single SKILL.md falls in the 10K-20K range that the rubric flags as a reliability concern. Decision-time content (GATEs, failure branches, safety rules) competes for attention with the redundant success/validation restatements, raising the chance an agent skips a step in a 10-step process. Reason: Smaller decision surface improves step-following reliability, the rubric's stated primary goal; the file is large enough to trip the 10K-20K threshold. Solution: Move the redundant 'NOT complete' enumeration out (see Bloat Control fix) so the remaining decision content fits well under 10K; the per-pattern detail is already correctly offloaded to references/redaction-and-normalization.md. |
| 🟡 High | Bloat Control | Problem: The same completion rules are stated three times: as positive done-conditions in <success_criteria> 'Complete when', again as the negative 'NOT complete' list (silent zero-page emit, children skipped, permission errors hidden, missing required input, redaction skipped), and a third time as line items in <validation_checklist>. Each of the five NOT-complete bullets restates a <failure_handling> branch or a <validation_checklist> line almost verbatim.Reason: Instructions are token-billed on every turn; triple-stating the same five conditions inflates the file to 14112 chars without adding behavior an agent does not already get from the checklist and failure-handling blocks. Solution: Drop the bulleted 'NOT complete' list in <success_criteria> and rely on <validation_checklist> plus <failure_handling> (which already own those checks); keep <success_criteria> to the single 'Complete when' sentence pointing at the checklist. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/confluence-source-harvesting/references/redaction-and-normalization.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/gap-and-contradiction-analysis/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/gap-and-contradiction-analysis/references/entry-templates-and-document-skeleton.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/gitnexus-cli/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Safety Boundaries | Problem:clean deletes the .gitnexus/ directory and unregisters the repo, and wiki --gist publishes a PUBLIC GitHub Gist of generated documentation, yet the skill states no caution or confirmation requirement for either. --force is documented as 'skip confirmation prompt' with no warning that data loss follows.Reason: An agent could run clean --force or wiki --gist without realizing the action is destructive or publishes content externally.Solution: Add a brief safety note that clean is destructive (recommend status first) and that wiki --gist makes content public (warn about leaking private-repo documentation); do not bake in a tool-specific gate, just flag the irreversible/public effects. |
| 🔵 Medium | Success Criteria | Problem: No explicit testable done-condition for the skill. <commands> documents what each command does and <when_to_use_skill> says when, but there is no 'done when X' so an agent invoking this skill to index a repo has no completion check.Reason: Missing completion criteria means the agent may run a command and stop without verifying the intended state was reached. Solution: Add a short success line per primary action, e.g. 'analyze done when status reports index present and not stale'; reuse the existing freshness signal already named in the analyze 'When to run' note. |
| 🔵 Medium | Output Contract | Problem: This is a CLI reference card with no <output_contract> or stated expected-result of running each command. After analyze an agent has no canonical signal of success (e.g. what status should then show), so it cannot confirm the index built. Acceptable for a reference card but weaker than sibling skills which all define output expectations.Reason: Without an expected result the agent cannot self-confirm a command achieved its purpose, only that it ran. Solution: Add a one-line expected-result per command (e.g. 'success: status reports a fresh index with symbol/relationship counts'); no full schema needed for a reference card. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/gitnexus-setup/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/gitnexus-tools/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Reference Integrity | Problem: The block points to gitnexus-usage/assets/gn-examples.md, but the asset actually lives at gitnexus-tools/assets/gn-examples.md. There is no gitnexus-usage skill in the release (only gitnexus-cli, gitnexus-setup, gitnexus-tools).Reason: A wrong asset path means the agent ACQUIRE fails or fetches nothing, so the worked examples never load when an agent needs them to pick the right tool. Solution: Change the ACQUIRE path in the block from gitnexus-usage/assets/gn-examples.md to gitnexus-tools/assets/gn-examples.md so the reference resolves to the bundled asset. |
| 🔵 Medium | Failure Handling | Problem: No guidance for the ambiguous case beyond the context tool's name-collision note, and none for when query returns no processes or when no repo is indexed.Reason: Missing empty-result handling can leave the agent stuck or silently producing no tool call when GitNexus has no match. Solution: Add a brief fallback line: if query returns nothing or no repo is indexed, READ gitnexus://repos first, then broaden the query or fall back to standard Rosetta code search. |
| 🔵 Medium | Success Criteria | Problem: The skill states its purpose (pick the right GitNexus tool with the right params) but gives no explicit 'done when' test for a correct selection, so there is no self-check that the chosen tool/params actually match intent. Reason: Without a testable completion marker the agent cannot verify it selected correctly, lowering reliability on the skill's only job. Solution: Add one short success line in <core_concepts>, e.g. selection is complete when the chosen tool/resource and its required parameters match the user intent and the schema was read before any cypher query. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 2 | ⬇️ Slightly worse |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/gitnexus-tools/assets/gn-examples.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Precision & Explicitness | Problem: Examples call tools as gitnexus_query(...), gitnexus_context(...), gitnexus_impact(...), gitnexus_rename(...), gitnexus_detect_changes(...), but the parent SKILL.md defines them as query, context, impact, rename, detect_changes. The same concept uses two different names across the skill and its asset.Reason: Divergent tool names can make an agent guess the actual MCP tool identifier, risking a wrong or invalid tool call. Solution: Make the tool names consistent: either prefix the tool definitions in SKILL.md with gitnexus_ or drop the prefix in the examples, so one term maps to one concept. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/init-workspace-documentation/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Output Contract | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/init-workspace-rules/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/skills/init-workspace-verification/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Workflow Completeness | Problem: The NEW file deletes the 'DEPRECATED ARTIFACTS (notify user, do NOT auto-delete)' block that flagged the r1 state file agents/init-rosetta-shells-flow-state.md and the local init-rosetta-shells-flow.md. Verification no longer instructs the agent to notify the user about these stale r1 artifacts during an upgrade.Reason: Removing the notice means an N-1 upgrade can silently leave obsolete r1 files behind, so the workspace is left in an inconsistent state with no user notification. Solution: Restore a short deprecated-artifacts notice in the verification process (notify user, do NOT auto-delete) covering leftover r1 shell-flow state/files, or confirm equivalent cleanup guidance exists in another init-workspace skill so the upgrade path still surfaces stale artifacts. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Workflow Completeness | 4 | ⬇️ Slightly worse |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/load-context-instructions/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Failure Handling | Problem: Fallback mode reads bootstrap files from the repo and lists docs, but there is no guidance for when none of the bootstrap files exist or get_context_instructions fails partway. The blocking-gate language ('do not proceed until complete') has no paired branch for an unrecoverable load failure.Reason: Without an explicit failure path the agent may pass the gate with an empty/partial bootstrap and run without guardrails, which is the exact unreliable state this skill exists to prevent. Solution: Add a short failure branch: if no bootstrap files are found in fallback mode, or the MCP call fails after retry, stop and tell the user that Rosetta context could not be loaded rather than silently proceeding. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/load-context/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Precision & Explicitness | Problem: The new step 2 grep uses ^#{1,3} both in prose and in the inline bash command grep -n "^#{1,3}" .... In plain grep/ripgrep this matches a literal # followed by literal {1,3}, not 1-3 leading hashes; markdown headers will not be matched. The base file had no such command.Reason: A non-functional header grep returns nothing, so IMPLEMENTATION/MEMORY/PATTERNS/REQUIREMENTS headers are never surfaced and the agent loads incomplete project context. Solution: Use a pattern that actually matches 1-3 leading hashes, e.g. grep -nE "^#{1,3} " (extended regex) or rg "^#{1,3} ", and align the prose accordingly. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 3 | ⬇️ Slightly worse |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/load-workflow/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Failure Handling | Problem:<process> step 1 is ACQUIRE <workflow TAG from available workflows> FROM KB but the skill has no branch for when no workflow tag matches the request, when ACQUIRE returns nothing, or when the request is too ambiguous to map to one workflow. Sibling loader skills handle their failure case: load-context has a <troubleshooting> block for missing files, load-context-instructions has per-mode handling. This router has none, yet it gates the whole session.Reason: Without a no-match branch the agent may silently pick a wrong workflow, sending the whole session down an incorrect execution path. Solution: Add a no-match branch in <process>: if no workflow tag matches or ACQUIRE returns empty, stop and ask the user to confirm intent or fall back to the ad-hoc/lightweight workflow; state which fallback applies. |
| 🔵 Medium | Output Contract | Problem: The skill produces a side effect (plan phases injected) but defines no output marker or confirmation the orchestrator can check. Reason: A router with no observable output makes it hard for the parent agent to confirm the correct workflow was activated. Solution: Specify the expected post-state as the contract: name the workflow selected and confirm phases were upserted, so the orchestrator has a deterministic signal the router ran. |
| 🔵 Medium | Success Criteria | Problem: There is no explicit testable done-when. <next-steps> describes what happens next, not the completion condition of this skill (workflow selected, phases injected, state restored when resuming). The sibling load-context base carried an explicit completion gate (its deletion was itself flagged).Reason: Without a testable completion condition the agent cannot reliably tell when routing is finished, risking premature handoff to execution. Solution: Add a one-line success condition: complete when the best-matching workflow is loaded, its phases are upserted into the plan via OPERATION_MANAGER, and resume-state is restored if the user asked to continue. |
| 🔵 Medium | Self-Validation | Problem:<next-steps> says only Execute all accumulated plan phases and steps. There is no verification that the workflow was actually selected/loaded, that its phases were injected into the plan, or that resume-state (step 2) was restored before execution begins.Reason: If phase injection or state restore silently fails, the agent proceeds with an empty or stale plan and skips workflow steps — the primary reliability failure this skill exists to prevent. Solution: Add a verification step after step 4: confirm the chosen workflow's phases are present in OPERATION_MANAGER and (when resuming) that completed steps and current phase were restored, before declaring the skill done. |
| 🔵 Medium | Decision Branching | Problem: Step 3 Handle planning and auto mode correctly — distinguish auto vs No HITL`` states a decision but gives no explicit if/then/else. Only step 2 (resume) carries a real branch. Sibling load-context-instructions scores well here because it spells out explicit mode branches.Reason: Left implicit, the agent may treat auto-approval mode as `No HITL` and skip required human gates, which is the exact failure the bootstrap warns against. Solution: Convert step 3 into explicit branches: if `No HITL` requested → proceed without approval gates; else (including auto/auto-approval) → keep HITL approval gates active per the `hitl` skill. |
| 🔵 Medium | Input Contract | Problem: Step 1 consumes an <available workflows> list but the skill never states where that list comes from (bootstrap prep step, KB listing, or context). The sibling load-context names its exact input files; this skill leaves its primary input source implicit.Reason: If the workflow list is not reliably present the router cannot match, and the agent cannot tell whether an empty match means no workflow or a missing input. Solution: Name the source of the available-workflows list in <prerequisites> or step 1 (e.g. the workflow catalog listed during bootstrap prep steps), so the router has a defined input rather than an implicit one. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/mcp-confluence-data-collection/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/mcp-confluence-data-collection/references/cql-and-redaction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/mcp-confluence-data-collection/references/vendor-swap.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/mcp-jira-data-collection/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Dependency Management | Problem: The redaction catalog (grep patterns, placeholder vocabulary, the 5 redaction target categories) is fully baked into <safety_boundaries> inline in the SKILL.md, while the sibling confluence skill moved the identical catalog into references/cql-and-redaction.md and loads it on demand. The jira skill keeps it always-loaded inline, duplicating domain knowledge that could be retrieved.Reason: Always-loading the full pattern catalog inflates the jira skill's context cost on every invocation and creates two copies of the same redaction knowledge that can drift apart. Solution: Move the inline redaction pattern/placeholder catalog out of jira's <safety_boundaries> into a lazy-loaded reference (mirroring the confluence skill), keeping only the operational decision-time rules inline. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/mcp-jira-data-collection/references/vendor-swap.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/mcp-testrail-data-collection/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Bloat Control | Problem: SKILL.md is 10,603 chars (10K-20K band). The <safety_boundaries> block repeats the redaction policy at high length with regex patterns, and <failure_handling> plus <validation_checklist> restate the same failure cases and read-only contract already covered in , , and <safety_boundaries>. Reason: Per the rubric the 10K-20K size band warrants a high-severity flag; the redaction regex detail is maintainer-grade and not needed in every runtime extraction, so it inflates resent history tokens without changing runtime behavior. Solution: Move the detailed regex redaction pattern catalog into the existing references/vendor-swap.md sibling or a new references/redaction.md and leave a one-line pointer in <safety_boundaries>, mirroring the on-demand <vendor_replacement> split already used in this file. Collapse the duplicate read-only / case-not-found statements so each appears once. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/mcp-testrail-data-collection/references/vendor-swap.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/operation-manager/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Reference Integrity | Problem: The <core_concepts> fallback bullet has an unbalanced/misplaced backtick in the CLI template: npx rosettify@latest <command> <subcommand> <plan_file> — the closing backtick sits after plan_file's angle bracket, so the inline-code span and the <plan_file> placeholder render incorrectly.Reason: A garbled command template can be copied verbatim by the agent, producing a malformed CLI call; it is a precision defect on an operational reference, not a style nitpick. Solution: Fix the backtick placement to npx rosettify@latest <command> <subcommand> <plan_file> so the code span closes after the placeholder, matching the correctly-formatted invocations used elsewhere in . |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/operation-manager/assets/om-schema.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/orchestrator-contract/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/qa-data-collection/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/qa-data-collection/references/backend-source-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/qa-data-collection/references/existing-test-patterns.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/qa-data-collection/references/output-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/qa-gap-analysis/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Self-Validation | Problem: The validation_checklist item 'Question count <= 20 per batch (pitfall 2)' counts only Critical+Important questions, but the success_criteria 'Questions Asked = Critical+Important+Optional combined' and the Executive Summary 'Questions Asked' count include Optional too. The batching cap and the reported count use different denominators, so an artifact with many Optional questions could pass the <=20 grep while the Executive Summary reports a much higher 'Questions Asked'. Reason: Two adjacent rules use the same word 'questions' with different scopes, which can make the self-validation grep and the reported count disagree without an actual error. Solution: In qa-gap-analysis/SKILL.md make the batch-cap basis explicit and consistent: state that the <=20 cap applies to Critical+Important only (already implied) and that the Executive Summary 'Questions Asked' total is the combined Critical+Important+Optional, so the two numbers are expected to differ; or align both on the same basis. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/qa-project-config/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/qa-test-debugging/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Input Contract | Problem: Unlike the sibling qa-test-implementation skill (which has a dedicated <input_contract> table with default paths and required content), qa-test-debugging specifies inputs only via and step 1 prose. The expected report format (JUnit XML / JSON / plain log) is named only implicitly in <failure_handling> ('malformed JSON/XML/JUnit'), not declared as an accepted-input contract up front. Reason: Inputs are recoverable from prerequisites + failure handling, but the format contract is implicit, slightly weaker than the sibling skill's explicit table. Solution: Add an explicit accepted-report-formats line near or step 2 listing the parseable formats (JUnit XML, JSON, plain text log) so the agent validates format against a stated contract rather than inferring it from the failure branch. |
| 🔵 Medium | Single Responsibility | Problem: The skill explicitly bundles two responsibilities with different risk profiles: Part A (read-only report analysis) and Part B (writes test source + runs lint). The <when_to_use_skill> section acknowledges this and defends it ('A caller may invoke Part A only'), so it is well-managed, but it remains two jobs in one prompt rather than the healthy 1-2 single-purpose ideal. Reason: Read-only analysis and write-path correction are coupled, but the boundary statement and progressive disclosure mitigate the coupling, so this is a minor note not a regression. Solution: Acceptable as-is given the explicit Part-A-only invocation boundary and the lazy-loaded part-b-mechanics.md keeping Part B material out of context for analysis-only calls. If future drift adds a third job, split Part B into a sibling skill. No change required now. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/qa-test-debugging/references/failure-catalog.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/qa-test-debugging/references/part-b-mechanics.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/qa-test-implementation/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Safety Boundaries | Problem: The skill writes test + utility source files and runs lint, yet has no dedicated <safety_boundaries> section. Write-scope protection is distributed: 'No hardcoded URLs / credentials / production data' and 'Synthetic test data only' live in the <validation_checklist> and , and the step 1 GATE handles approval. There is no single statement bounding which files the skill may write (test/helper only) versus app source, unlike the sibling qa-test-debugging which states a 'Test-code-only writes' rule. Reason: Approval gate and no-hardcoded-secrets checks exist, but the affirmative write-scope boundary (no app-source writes) is implicit, a subtle gap for a write-path skill. Solution: Add a short <safety_boundaries> section (or a write-scope line in step 1/step 4) stating the skill writes only test files, shared helper/utility files, and never application/product source, mirroring qa-test-debugging's Test-code-only-writes rule. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/qa-test-implementation/references/multi-language-examples.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/repository-implementation-standards/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/requirements-synthesis/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Input Contract | Problem: The skill consumes multiple input artifacts (raw-data files, analysis output, user answers / answers.md) but never names their expected paths or formats. lists them only as informal bullets ('Collected raw data from at least one source'), and step 1 says 'Load all source data' without a path contract. The sibling g10 skill sequential-workflow-execution has an explicit input-contract table with Source/Required columns; this skill lacks an equivalent, so an agent must guess where inputs live. Reason: Without an explicit input contract the agent may read the wrong files or mislocate answers.md, weakening the otherwise strong source-priority and failure-handling logic that depend on those inputs. Solution: Add a short input table (or extend ) naming the expected inputs explicitly: raw-data file location, analysis-output location, and the answers.md path the <failure_handling> 'No user answers' branch already references — with required/optional flags and the supplying source (parent workflow phase). |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 3 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/requirements-synthesis/references/output-schemas.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/sequential-workflow-execution/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/swagger-contracts-analysis/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/swagger-contracts-analysis/references/canonical-example.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/swagger-contracts-analysis/references/failure-handling-edge-cases.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/swagger-contracts-analysis/references/redaction-catalog.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/testrail-test-case-authoring/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: The redaction/safety discipline is restated three times: inline in <safety_boundaries>, again in (literal-credential line), and again as a re-scan item in <validation_checklist>, plus the catalog in the reference file. The SKILL.md body is ~12.9K chars (10K-20K range), partly driven by this repetition of the same MUST-not-leak-credentials rule. Reason: Same instruction repeated in four places is re-sent every call with no added behavioral value and pushes the file into the 10K-20K size band the rubric flags. Solution: Keep the operational redaction rule canonical in <safety_boundaries> only; have and <validation_checklist> reference it by a short pointer (e.g. 'safety re-scan per <safety_boundaries>') rather than re-listing the same grep targets and placeholder logic. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/testrail-test-case-authoring/references/examples-and-redaction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/testrail-test-case-export/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Reference Integrity | Problem: <safety_boundaries> deep-links into a SIBLING skill's private reference: ../testrail-test-case-authoring/references/examples-and-redaction.md#targets-to-placeholder-never-literal. Cross-skill deep linking into another skill's internal references violates the skill-folder isolation boundary (skills must not deep-link private content of another skill).Reason: A consumer reaching into another skill's private references couples the two skills' internal layouts; if the authoring skill renames its reference or anchor, the export skill's safety catalog link silently breaks at the exact moment redaction matters. Solution: Replace the cross-skill deep link with an inline copy of the small placeholder vocabulary this skill actually needs (the 4-5 placeholder tokens already partly listed inline), or move the shared placeholder vocabulary to a neutral shared location both skills ACQUIRE; do not reach into the authoring skill's private references folder. |
| 🔵 Medium | Bloat Control | Problem: The confirmation-gate and dedup-pre-scan rules are stated three times: full detail in step 7, restated in <safety_boundaries>, and again line-by-line in <validation_checklist>. Same with the redaction targets list. SKILL.md body is ~14K chars (10K-20K band). Reason: Triplicated procedure text is re-sent every call without behavioral value and is the main driver pushing the file into the 10K-20K size band the rubric flags. Solution: Keep step 7 as the canonical gate description; have <safety_boundaries> and <validation_checklist> point to 'step 7 (canonical)' for the procedure rather than re-listing the dedup/scan/confirm sequence and the redaction target list a second and third time. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/testrail-test-case-export/references/vendor-porting.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/user-approved-code-changes/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/adhoc-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Conflict Resolution | Problem: The rename of the <plan_manager> block to <OPERATION_MANAGER> DELETED the orchestrator/subagent coordination directives that the base carried: 'todo tasks/built-in planners are for tracking INSIDE step execution only', 'Orchestrator MUST tell subagents all above MUST as MUST (within their scope)', and 'MUST tell subagents: "tell orchestrator to modify plan if work is outside your scope"'. The new generic command-catalog block (copied from the bootstrap) does not carry the subagent-scope-escalation rule. Reason: adhoc-flow delegates to subagents (see <building_blocks> subagent-delegation and phase 4); losing the explicit 'subagent reports out-of-scope work to orchestrator' rule lets subagents silently mutate the plan or drift, which the deleted lines were specifically there to prevent. Solution: Re-add the deleted orchestrator->subagent coordination lines (subagent escalates out-of-scope work back to orchestrator; built-in/todo planners are intra-step tracking only) to the new <OPERATION_MANAGER> or block; the generic command list does not replace this scope-coordination contract. |
| 🔵 Medium | Reference Integrity | Problem: The base block ended with 'ACQUIRE plan-manager/assets/pm-schema.md FROM KB for data structure reference.' The rename dropped any equivalent schema-acquire pointer; the new block references command shapes inline but gives no path to the plan/data-structure schema for upsert authoring.Reason: upsert with RFC-7396 merge needs the data-structure schema; the base gave an explicit acquire path and the new version removed it, so a plan author building upserts has no in-workflow pointer to the schema. Solution: Add an 'ACQUIRE operation-manager/assets/.md FROM KB' pointer (matching the renamed skill's actual asset) so plan authors retain the structured-schema reference the base provided for upsert payloads. |
| 🔵 Medium | Bloat Control | Problem: The base <plan_manager> block was a compact ~13-line workflow-scoped contract. It was replaced with the full ~20-line OPERATION_MANAGER command catalog (help plan, next, create-with-template, upsert-with-template, update_status, query, show_status, RFC 7396 note, loop note) which is verbatim identical to the OPERATION_MANAGER block already always present in the bootstrap/CLAUDE.md that is resent every turn. Reason: Re-stating the entire bootstrap command catalog inside a workflow that loads only after the bootstrap duplicates always-in-context content, adding tokens every call with no new information. Solution: Since the full command catalog is already guaranteed in the always-loaded bootstrap, the workflow only needs the workflow-specific deltas (which building blocks call operation-manager, the loop/upsert obligations, and the subagent-coordination rules). Replace the duplicated catalog with a short pointer plus those deltas. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Conflict Resolution | 3 | ⬇️ Slightly worse |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Bloat Control | 3 | ⬇️ Slightly worse |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow-code-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Workflow Completeness | Problem: BASE listed six explicit numbered tasks (read project description, read user-instructions, frontend analysis, page-object inventory, similar-test search, reusable-utility identification, plan update) inline. NEW collapses all detail into a single <execute_analysis> step (USE SKILL aqa-codebase-analysis) plus a 4-item <validate_findings> checklist. The ordered sub-task detail is no longer in the phase file.Reason: Content was relocated to a well-formed bound skill via progressive disclosure, not lost; instructions are not user-facing so this compression is acceptable and reduces context cost. Minor severity because phase no longer self-documents the step sequence, relying on skill being loaded. Solution: No fix required for correctness — verified the relocated detail (project description, user-instructions Must/Should/Nice categorization, frontend analysis, page-object inventory, similar-test search, reusable utilities, 9-section report template) is fully present in the aqa-codebase-analysis SKILL and its report-template reference. If any phase-level traceability is desired, keep the <validate_findings> checklist as the anchor (already present). |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/aqa-flow-data-collection.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow-requirements-clarification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow-selector-identification.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Workflow Completeness | Problem: BASE had 7 explicit tasks including Task 1 (map every test step to required UI interactions), Task 2 (check existing page objects with an Available/Missing/Uncertain categorization), and the selector-strategy preference order (data-testid > id > class > XPath). NEW reduces selector work to one line: USE SKILL aqa-selector-management Execute Part A only. The interaction-mapping and existing-selector-check steps are no longer enumerated in the phase.Reason: The page-source capture protocol (the genuinely user-facing part) was correctly kept verbatim with a documented rationale; the deleted detail is mechanical selector analysis suited to the skill. Low severity assuming the skill covers it; flagged because the phase no longer makes the existing-selector-check an explicit gate. Solution: Confirm aqa-selector-management Part A owns interaction-mapping, the existing-page-object availability check, and the selector-strategy preference order; if any of those are not covered by the skill, restore a one-line pointer in <execute_identification> naming them as Part A deliverables. Do not re-inline the full BASE tables. |
| ⚪ Low | Output Contract | Problem: BASE prescribed a full ## Phase 4: Selector Identification test-plan section (interaction map, existing-vs-missing, identified selectors, selector strategy, notes). NEW only updates agents/aqa-state.md fields and a 5-item validation checklist; the structured selector documentation written into the test plan is no longer specified here.Reason: Likely relocated to the skill output contract; minor because the state-file echo still captures counts and strategy. Cosmetic-level traceability concern only. Solution: Verify the identified-selector documentation output is owned by aqa-selector-management Part A's output template; if so this is fine. Otherwise add one line stating where the selector map is recorded (test plan vs report). |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/aqa-flow-selector-implementation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Workflow Completeness | Problem: BASE Task 4 ( Add Documentation (If Project Uses It) — JSDoc/TSDoc for new selectors and methods) is not represented anywhere in NEW. NEW's steps are: ACQUIRE+USE the two skills, run Part B, lint, update state. The conditional documentation-of-selectors step was dropped and is not echoed in the validation checklist.Reason: BASE explicitly made selector documentation conditional on project convention; if no skill owns it the convention-matching behavior is silently lost. Severity 2 because it is conditional and low-blast-radius, but it is a genuine deleted behavioral step, so comparison on Workflow Completeness is below neutral. Solution: Confirm aqa-selector-management Part B (or repository-implementation-standards) covers conditional selector/method documentation matching existing project doc style. If neither does, add a one-line item to <execute_implementation> or the <validation_checklist>: 'document new selectors/methods only if the project already uses JSDoc/TSDoc, matching existing style.' |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/aqa-flow-test-correction.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Workflow Completeness | Problem: BASE enumerated fix categories explicitly (Selector / Timing / Assertion / Setup / Test-code issues) in Task 2 and a fix-prioritization scheme (Critical/High/Medium-Low). NEW delegates all correction preparation to the user-approved-code-changes skill (with a debugging->coding->aqa-test-debugging Part B fallback) and does not enumerate the categories or prioritization in the phase.Reason: The critical HITL approval gate is preserved and hardened (explicit approval tokens, preparation-only guardrail forbidding writes before step 8.3, disambiguation rule, fallback chain) — net safety improvement. The dropped item is the fix taxonomy/prioritization, mechanical detail suited to the skill. Severity 2: only a regression if no bound skill carries the taxonomy. Solution: Verify user-approved-code-changes and/or aqa-test-debugging Part B own the fix-type taxonomy and prioritization. If not, add a single pointer line in <execute_corrections> referencing where the Selector/Timing/Assertion/Setup taxonomy and Critical/High/Low prioritization live. No need to re-inline the BASE tables. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow-test-implementation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Output Contract | Problem: The NEW state-file template was reduced to a bare phase checklist. BASE included per-phase 'Test Details' subsections capturing concrete fields (TestRail Case, Confluence Pages, Existing Page Objects, Test File path, Tests Failed count, Root Causes, etc.) plus completion dates per row. Those structured capture fields are gone from the NEW template. Reason: The state file is the cross-phase memory; dropping the structured fields makes later spot checks rely on free-form text and weakens deterministic resume after compaction. Solution: Re-add a minimal set of per-phase capture fields to the state-file template (at least Phase 3 page-object list, Phase 6 test file path, Phase 7 root-causes list) or point the template at the per-phase docs that own those fields, so downstream phases and the success-criteria spot checks have a defined place to read prior outputs. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow-test-report-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/coding-agents-prompting-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Reference Integrity | Problem: BASE prerequisite 3 ('Orchestrator and subagents MUST USE SKILL coding-agents-prompt-authoring') was deleted from the rewritten prerequisites block. Behavioral tracing shows phases 2-7 still delegate to the prompt-engineer subagent, which independently binds that skill, so the skill is not unbound system-wide; but the orchestrator no longer has an explicit mandate to load it for the coordination/blueprint work it performs directly.Reason: The subagent binding mitigates the deletion, but removing the explicit orchestrator-level mandate weakens the guarantee that the authoring skill governs orchestrator-side decisions. Solution: Restore an explicit one-line binding in the workflow prerequisites that the orchestrator MUST USE SKILL coding-agents-prompt-authoring, or confirm in the workflow that all skill-dependent work is delegated to the prompt-engineer subagent so the orchestrator-level binding is intentionally unnecessary. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Reference Integrity | 3 | ⬇️ Slightly worse |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/coding-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| ⚪ Low | Structural Coherence | Problem: The newly added user_review_design phase=3 block lists all four items with the marker 1. (four 1. lines instead of 1-4). Every other phase in the file uses sequential 1-N numbering.Reason: Repeated 1. markers in an ordered HITL gate are inconsistent with the rest of the file and can blur step ordering, though the steps still read sequentially so impact is cosmetic.Solution: Renumber the four items in user_review_design to 1, 2, 3, 4. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/external-lib-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Workflow Completeness | Problem: The added **Phase 0: Prerequsites** block numbers its two items as 1. then 3. (skips 2). In a workflow that explicitly stresses 'Do not skip steps!' and 'Make sure to have todo tasks for each step', a gap in the numbered list can cause an agent to expect/look for a missing step 2.Reason: Sequence integrity matters in a step-driven onboarding flow; a 1-then-3 list is a small but concrete ordering defect introduced by this change. Solution: Renumber the Phase 0 items to 1 and 2. |
| ⚪ Low | Structural Coherence | Problem: The added Phase 0 header is misspelled 'Prerequsites'. Sibling workflows label this section 'Prerequisites'/'prerequisites'. Reason: Cosmetic typo in a section header; does not change behavior but reduces consistency with other workflow files. Solution: Fix the spelling to 'Prerequisites'. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/workflows/init-workspace-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| ⚪ Low | Workflow Completeness | Problem: The verification phase (now Phase 9) deleted base step 4 'Notify user: delete init-rosetta-shells-flow.md.' The cleanup notification to remove the obsolete bootstrap shells flow file no longer fires at the end of the workflow.Reason: Removing the cleanup notice is safe only if the stale file is never produced; otherwise leftover bootstrap files could re-trigger an old flow. Solution: Confirm the init-rosetta-shells-flow.md artifact is no longer generated by Phase 2; if it can still appear in upgrade-from-R2 workspaces, restore a step instructing the user to delete it. Grep of instructions/r2 shows no remaining references, so the deletion is consistent cleanup; only restore if upgrade paths can leave the stale file. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/init-workspace-flow-context.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/workflows/init-workspace-flow-discovery.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/workflows/init-workspace-flow-questions.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/workflows/init-workspace-flow-rules.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/workflows/init-workspace-flow-shells.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/workflows/modernization-flow.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/qa-flow-api-spec-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/qa-flow-data-collection.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/qa-flow-documentation-mcp-subflow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Cognitive Budget | Problem: The <execute_documentation_mcp> preamble paragraph and the early-exit rule add a layer of meta-narration ('Branch triggers reference <output_contract> by name; the literal outcome line is inlined parenthetically at each trigger site...') that explains the file's own cross-referencing design rather than giving directives. An agent must parse the indirection between the inlined parenthetical outcome lines, the <output_contract> table, and the <verify_remediation> block to execute a single branch.Reason: The narration is design commentary, not an instruction; removing it reduces parsing load without losing any branch behavior. Solution: Drop the meta-explanation paragraph (lines describing why outcome lines are inlined) and keep only the operative early-exit rule. The branch directives already inline the literal outcome line, so the narration restating that design is non-functional. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/qa-flow-execution-and-report-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/qa-flow-gap-and-requirements-clarification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/qa-flow-project-config-loading.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Reference Integrity | Problem: The <validation_checklist> requires agents/qa-state.md to be 'created with Phase 0 marked complete and IDENTIFIER: field matching the agents/qa/{IDENTIFIER}/ directory name' and requires the {IDENTIFIER} value to be identical across the qa-state.md IDENTIFIER field. But the parent qa-flow.md state-file template (its <state_file> block) has no IDENTIFIER: row — it lists Last Updated / Current Phase / Test Case Source / Feature / API Base URL only. The checklist binds to a state field the canonical template does not define.Reason: The validation step checks a field that the producing template never writes, so the check can never pass against a state file built only from the parent template. Solution: Either add an **IDENTIFIER:** line to the qa-flow.md <state_file> template, or change this phase's <update_state> step 0.2 to explicitly write the IDENTIFIER: line into qa-state.md so the validation target exists. Reference the qa-flow.md template field name exactly. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/qa-flow-test-case-specification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/qa-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Bloat Control | Problem: The <skip_rules> block (lines 24-47) is a single dense decision-tree for ONE override (skip Phases 0-2). It restates the same scope-lock idea many times: 'only sanctioned no-ask deviation', 'Scope: applies ONLY at this skip-verification gate', 'Authority on ask-before-action elsewhere', plus a separate Rationale line, plus an explicit carve-outs list that duplicates the per-phase HITL gate type="HITL" markers already present on each phase block. The redundancy makes the single most safety-relevant branch harder to parse.Reason: A safety override that is restated four ways increases the chance the agent skips or misreads a step; compressing it improves reliability of the one branch most likely to bypass HITL. Solution: Collapse the repeated scope-lock statements (the 'Deference', 'Scope', 'Authority', and 'Rationale' clauses) into one short precondition + one action + one fallback. Drop the carve-out re-listing of HITL gates since each phase block already carries type="HITL"; reference them by pointer instead of re-enumerating. |
| 🔵 Medium | Cognitive Budget | Problem: The router mixes high-density prose (the skip-rules override at lines 27-37) with the otherwise clean phase table. The override paragraph packs ~7 nested conditions (a/b/c preconditions, uncertain-partial branch, unambiguous-instruction branch, carve-outs) into prose rather than decomposed if/then lines, exceeding the ~5-step reliable handling guidance for a single block. Reason: Prose with many embedded conditions is processed less reliably than enumerated branches; decomposition reduces skipped-condition risk at the only no-ask gate. Solution: Decompose the override into a short numbered if/then/else list (precondition check -> hold -> skip; fail+unambiguous -> announce+start Phase 0; uncertain -> ASK). Keep one line per branch so the agent can execute it as discrete steps. |
| ⚪ Low | Output Contract | Problem: The router itself defines the agents/qa-state.md template (lines 159-178) and per-phase output paths, but does not give a canonical example of the skip-gate refusal one-liner beyond an inline parenthetical at line 34; the announced format ('skip-gate refused: ...') is described in words only.Reason: The format is already specified inline and understandable; a fenced example would marginally improve determinism but is not a behavioral gap. Solution: No change required for correctness; optionally add the refusal line as a fenced example next to the existing state-file fence so the emitted format is deterministic. Low priority. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/qa-flow-test-implementation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Reference Integrity | Problem: The phase asserts (lines 25-34) the exact internal contract of the automation-test-implementation-handoff skill — that it declares a <recommended_foundational_skills> block and a 'step-4 GATE' that emits foundational skill <name> not loaded by calling workflow. This couples the phase to the named internal anchors of a sibling skill. If the handoff skill's internal section names drift, the acceptance-criteria check at step 5.1 sub-step 4 (lines 31-34) becomes a false-negative and blocks the phase even when the skill is correct.Reason: Asserting a sibling skill's private section names violates skill isolation and creates a brittle gate that can deadlock the phase on a cosmetic rename of the handoff skill. Solution: Keep the behavioral contract (handoff must verify-presence and must not ACQUIRE foundational skills) but soften the dependency on the skill's exact internal section name; check the observable behavior (does it verify presence / does it claim to ACQUIRE) rather than the literal <recommended_foundational_skills> tag name. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/qa-flow-test-correction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/research-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Failure Handling | Problem: Phase 1 directs reading CONTEXT.md, ARCHITECTURE.md, and IMPLEMENTATION.md but gives no handling if those files are missing or empty, and no fallback when the research subagent returns no grounded references. This was unchanged from base, but the diff restructured the adjacent prerequisites without adding any missing-input handling. Reason: Without a missing-file fallback the researcher subagent may stall or fail silently on workspaces lacking those standard files. Solution: Add a prerequisite or phase-1 note: if any of CONTEXT/ARCHITECTURE/IMPLEMENTATION is absent, proceed with available context and record the gap in research-flow-state.md rather than aborting. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Success Criteria | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/self-help-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Structural Coherence | Problem: The diff wrapped phase-0 prerequisites in a <prerequisites phase="0", applies="ALL"> block, but the opening tag is duplicated (lines 20 and 27) with NO closing </prerequisites>, so the block leaks into phase 1. The opener also uses malformed XML attribute syntax (comma-separated phase="0", applies="ALL"). Sibling research-flow uses the correct single, closed block.Reason: The unclosed/duplicated tag breaks the XML section boundaries the agent relies on to delimit phases, risking prerequisites and phase-1 content being read as one block. Solution: Replace the duplicate line-27 opener with a closing </prerequisites> tag and remove the comma between attributes, matching research-flow's correct pattern. |
| 🔵 Medium | Reference Integrity | Problem: Same change as above: the two identical <prerequisites phase="0", applies="ALL"> opening tags with no matching close make the section structure self-inconsistent. Sibling workflow research-flow.md uses the same wrapper correctly (open + close), confirming the intended pattern was a properly closed block. Reason: Consistent, resolvable section tags keep the workflow parseable and aligned with the sibling flow's convention. Solution: Mirror research-flow.md: a single <prerequisites phase="0", applies="ALL"> opener and a single closer around the four numbered prerequisite items. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 3 | ⬇️ Slightly worse |
| Structural Coherence | 2 | ⬇️ Slightly worse |
📄 instructions/r2/core/workflows/testgen-flow-data-collection.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/testgen-flow-gap-and-contradiction-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Output Contract | Problem: The detailed per-entry document formats for contradictions (C1), gaps (G1), and ambiguities (A1) — including their exact field structure (Type, Source quotes, Impact, Needs Clarification) — were deleted from this phase. The NEW file delegates sections 1-6 entirely to the gap-and-contradiction-analysis skill and only keeps the appended sections 7 + Metadata.Reason: Phase 2's primary output (categorized findings with source quotes) must have its shape defined somewhere; the phase moved ownership to the skill rather than dropping it, so the regression risk is contained to skill-coverage verification, not a hard contract loss. Solution: This is acceptable since the skill now owns the per-entry schema, but confirm gap-and-contradiction-analysis/SKILL.md (or its references) actually defines C/G/A entry shapes with source quotes. If the skill defines them, no action needed; if not, the contract for the core analysis output is now undefined in either place. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/testgen-flow-project-config-loading.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/testgen-flow-question-generation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Output Contract | Problem: The user-facing questions.md template lost its in-document How to Answer numbered procedure (open file, fill answer, save, notify) detail and dropped the ## Additional Questions or Comments free-text section and the ## Completion Checklist. The NEW How to Answer paragraph is terser. Since questions.md is filled out by a human, this is a user-facing artifact where clarity matters per rubric.Reason: The dropped sections were partly redundant with the agent-side validation, but the free-text Additional Questions or Comments slot was the only channel for user input outside the generated questions, so its loss slightly narrows the HITL capture surface.Solution: The NEW How to Answer paragraph still tells the user to replace [Leave blank for user] and how to mark UNKNOWN, so the core instruction survives. Optionally restore a one-line ## Additional Comments slot so users can add context not covered by generated questions; the completion checklist removal is low-impact since validate_answers (step 3.3) re-checks completeness agent-side. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/testgen-flow-requirements-document-generation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/testgen-flow-test-case-export.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/testgen-flow-test-case-generation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/testgen-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Example Grounding | Problem: The BASE router contained concrete 'Common Patterns' examples (initial-prompt formats, Jira/Confluence input formats, Confluence search CQL type=page AND space=PROJ AND text ~ 'feature'). The NEW router drops all of these; the only remaining inline example is in <phase_5_6_standards_gate> (cypress path).Reason: Input-format and CQL examples are execution detail for the data-collection phase, not the router. Moving them down keeps the router thin; grounding for the agent is preserved at the phase level. Low impact, hence severity 2. Solution: Verify the Confluence search/CQL examples now live in testgen-flow-data-collection.md (the phase that actually performs collection) so the guidance is available where used. The router is a router; examples belong in phase files. Keep at least the one standards-gate example present. |
| 🔵 Medium | Failure Handling | Problem: The BASE router had a dedicated 'Error Handling' section (Jira ticket not found, no Confluence results, user doesn't answer questions, incomplete requirements) plus per-phase 'Validation Rules' (e.g. raw-data.md must contain both sections; >=80% exported). The NEW router removes both sections; the router itself now states no failure-path behavior inline. Reason: Removed-from-router error/validation content was relocated to phase files, not deleted; router stays thin. Minor because failure handling for the orchestrator-level concerns is still covered via the verification-failure override and the per-phase files. Solution: Confirmed these cases migrated into child phase files (testgen-flow-data-collection.md handles 'Jira ticket not found'; testgen-flow-test-case-export.md handles the 80% threshold with PARTIAL/HALT logic). No content was truly lost, so this is a relocation appropriate for a thin router. No action required beyond ensuring the router's <validation_checklist> keeps pointing at the same artifacts. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/api-test-spec-authoring/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/api-test-spec-authoring/references/templates-and-redaction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/aqa-codebase-analysis/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/aqa-codebase-analysis/references/report-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/aqa-requirements-elicitation/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/aqa-selector-management/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/aqa-selector-management/references/strategy-and-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/aqa-test-authoring/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/aqa-test-authoring/references/test-implementation-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/aqa-test-debugging/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/aqa-test-debugging/references/escalation-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/aqa-test-debugging/references/part-b-mechanics.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/automation-test-execution-analysis/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/automation-test-execution-analysis/references/redaction-policy.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/automation-test-implementation-handoff/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Cognitive Budget | Problem: The SKILL.md is ~13.3K chars (10K-20K band). As a SKILL.md shell it lands in agent context automatically, so its full weight is paid on every call that loads this skill. The verify-don't-load contract, input contract table, 10-step process, output templates, and failure-handling are all detailed inline. Reason: SKILL.md shells are always in context; trimming the always-loaded portion lowers per-call token cost and compaction risk while leaving decision-time rules where the agent needs them. Solution: Move the two verbatim output templates (the user-facing handoff message block and the state-update template) and the per-stack command examples into a references/ file loaded on demand at step 7/step 10, keeping the GATEs and contract tables inline. This trims the auto-context shell without losing decision-time content. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/confluence-source-harvesting/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Cognitive Budget | Problem: The SKILL.md is ~14.1K chars (10K-20K band) and is the largest of the three g23 SKILL shells. It restates the success-criteria done-condition, NOT-complete list, and validation-checklist with substantial overlap (the five NOT-complete items mirror the five validation-checklist items), all loaded into context automatically as a SKILL shell. Reason: The NOT-complete list and validation checklist duplicate the same five failure conditions; one canonical list cuts shell tokens paid on every load without losing any gate coverage. Solution: Collapse the redundancy between <success_criteria> 'NOT complete' bullets and <validation_checklist> — both enumerate the same five regressions (silent zero-page, children skipped, permission hidden, missing input, redaction skipped). Keep one canonical list and have the other reference it, reducing the always-loaded shell size. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/confluence-source-harvesting/references/redaction-and-normalization.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/gap-and-contradiction-analysis/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/gap-and-contradiction-analysis/references/entry-templates-and-document-skeleton.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/mcp-confluence-data-collection/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/mcp-confluence-data-collection/references/cql-and-redaction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/mcp-confluence-data-collection/references/vendor-swap.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/mcp-jira-data-collection/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Dependency Management | Problem: The redaction catalog (grep patterns, placeholder vocabulary, the 5 redaction target categories) is fully baked into <safety_boundaries> inline in the SKILL.md, while the sibling confluence skill moved the identical catalog into references/cql-and-redaction.md and loads it on demand. The jira skill keeps it always-loaded inline, duplicating domain knowledge that could be retrieved.Reason: Always-loading the full pattern catalog inflates the jira skill's context cost on every invocation and creates two copies of the same redaction knowledge that can drift apart. Solution: Move the inline redaction pattern/placeholder catalog out of jira's <safety_boundaries> into a lazy-loaded reference (mirroring the confluence skill), keeping only the operational decision-time rules inline. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/mcp-jira-data-collection/references/vendor-swap.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/mcp-testrail-data-collection/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Bloat Control | Problem: SKILL.md is 10,603 chars (10K-20K band). The <safety_boundaries> block repeats the redaction policy at high length with regex patterns, and <failure_handling> plus <validation_checklist> restate the same failure cases and read-only contract already covered in , , and <safety_boundaries>. Reason: Per the rubric the 10K-20K size band warrants a high-severity flag; the redaction regex detail is maintainer-grade and not needed in every runtime extraction, so it inflates resent history tokens without changing runtime behavior. Solution: Move the detailed regex redaction pattern catalog into the existing references/vendor-swap.md sibling or a new references/redaction.md and leave a one-line pointer in <safety_boundaries>, mirroring the on-demand <vendor_replacement> split already used in this file. Collapse the duplicate read-only / case-not-found statements so each appears once. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/mcp-testrail-data-collection/references/vendor-swap.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/qa-data-collection/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/qa-data-collection/references/backend-source-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/qa-data-collection/references/existing-test-patterns.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/qa-data-collection/references/output-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/qa-gap-analysis/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Self-Validation | Problem: The validation_checklist item 'Question count <= 20 per batch (pitfall 2)' counts only Critical+Important questions, but the success_criteria 'Questions Asked = Critical+Important+Optional combined' and the Executive Summary 'Questions Asked' count include Optional too. The batching cap and the reported count use different denominators, so an artifact with many Optional questions could pass the <=20 grep while the Executive Summary reports a much higher 'Questions Asked'. Reason: Two adjacent rules use the same word 'questions' with different scopes, which can make the self-validation grep and the reported count disagree without an actual error. Solution: In qa-gap-analysis/SKILL.md make the batch-cap basis explicit and consistent: state that the <=20 cap applies to Critical+Important only (already implied) and that the Executive Summary 'Questions Asked' total is the combined Critical+Important+Optional, so the two numbers are expected to differ; or align both on the same basis. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/qa-project-config/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/qa-test-debugging/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/qa-test-debugging/references/failure-catalog.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/qa-test-debugging/references/part-b-mechanics.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/qa-test-implementation/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/qa-test-implementation/references/multi-language-examples.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/repository-implementation-standards/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/requirements-synthesis/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/requirements-synthesis/references/output-schemas.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/sequential-workflow-execution/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/swagger-contracts-analysis/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/swagger-contracts-analysis/references/canonical-example.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/swagger-contracts-analysis/references/failure-handling-edge-cases.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/swagger-contracts-analysis/references/redaction-catalog.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/testrail-test-case-authoring/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Cognitive Budget | Problem: The SKILL.md is 12907 chars (10K-20K band). The success_criteria, safety_boundaries, failure_handling, and validation_checklist sections each restate the same MUST/MUST-NOT rules (BDD ban, gap-marker discipline, redaction discipline) in slightly different wording, increasing the always-loaded budget. Reason: Smaller always-loaded shell reduces per-turn token cost and the chance of contradictory drift between the four restating sections; content is r2-identical so this is a pre-existing trait carried into r3, scored low severity. Solution: Keep the operational rules canonical in one section (e.g. format_rules + safety_boundaries) and have success_criteria / validation_checklist reference them by name rather than re-stating each rule's full text. No behavioral content needs to change. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/testrail-test-case-authoring/references/examples-and-redaction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/testrail-test-case-export/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Cognitive Budget | Problem: The SKILL.md is 14041 chars (10K-20K band). Step 7's confirmation gate, the safety_boundaries section, the validation_checklist, and the pitfalls block each re-describe the same destructive-write rules (no-write-without-confirmation, dedup pre-scan, ambiguity-defaults-to-cancel, redaction). The dedup/confirmation rule is stated at least four times. Reason: This is the largest always-loaded file in the group; trimming the restated mechanics lowers per-turn cost without weakening the gate, since step 7 already holds the authoritative version. Content is r2-identical, so low severity. Solution: Mark step 7 as the single canonical home for the confirmation-gate and dedup rules (it already labels itself 'canonical'), and reduce safety_boundaries / validation_checklist / pitfalls to one-line pointers to step 7 instead of re-stating the full mechanics. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/testrail-test-case-export/references/vendor-porting.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/user-approved-code-changes/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/adhoc-flow.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Reference Integrity | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-code-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Decision Branching | Problem: BASE had explicit inline if/then handling for two conditional tasks: Task 1.5 stated 'If agents/user-instructions/ directory does not exist or is empty, skip this task ... Document that no user instructions files were found', and Task 2 had 'If frontend code NOT available, skip to Task 3'. NEW removes both explicit branches and only leaves parenthetical conditions ('(if directory exists)', 'user instructions extracted (if available)') in the validation checklist, deferring the actual skip/empty handling to the aqa-codebase-analysis skill without restating the else-path in the workflow.Reason: The deleted else-paths told the agent what to do when an input is missing; relying only on a parenthetical 'if available' can let an agent stall or omit the 'document none found' step. Solution: In <execute_analysis> or <validate_findings>, add one line stating the explicit else for each conditional (e.g. 'if agents/user-instructions/ is absent/empty, record none-found and continue; if frontend source is absent, skip frontend analysis and continue') so the branch is anchored in the phase even though the skill performs the work. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 3 | ⬇️ Slightly worse |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/aqa-flow-data-collection.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-requirements-clarification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-selector-identification.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Output Contract | Problem: BASE provided a detailed ## Phase 4: Selector Identification test-plan output template (Required Selectors Analysis, Existing vs Missing, Frontend Code Analysis, Identified Selectors, Selector Strategy, Notes) plus a worked 'Identified Selectors' documentation example with HTML/selector/type/usage rows. NEW deletes the test-plan output template and the worked selector-documentation block entirely; the phase now only names 'complete selector map with values and strategy' in <workflow_context> and defers the format to aqa-selector-management Part A without an in-phase schema or example.Reason: The deleted template/example was the only concrete output shape; without confirming the skill owns it, Phase 5 (which consumes the 'selector map from Phase 4') has no guaranteed field set to read. Solution: Confirm aqa-selector-management Part A owns and emits the selector-map schema (HTML source, chosen selector, type, usage, strategy); if it does, add a one-line pointer in <execute_identification> ('selector-map schema owned by aqa-selector-management Part A'). If it does not own a concrete schema, restore a minimal selector-map field list in the phase. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/aqa-flow-selector-implementation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Output Contract | Problem: BASE carried concrete code-pattern examples (TypeScript page-object selector additions, helper methods, new-page-object template with imports/constructor) and a ## Phase 5: Selector Implementation test-plan output template. NEW removes all code examples and the test-plan template, keeping only a state-file echo of fields ('Page Objects Modified/Created, Total Selectors Added, Helper Methods Added, Linting'); the implementation pattern and conventions are deferred to aqa-selector-management Part B and repository-implementation-standards.Reason: The deleted examples/template were the in-phase output anchor; the new file's correctness now fully depends on the referenced skills owning the conventions, so a pointer keeps the contract traceable. Solution: This deferral is reasonable since the skills own 'follow project conventions exactly'; to fully close the gap, confirm aqa-selector-management Part B emits the page-object/test-plan output template, and add a one-line pointer to it in <execute_implementation>. No need to restore the inline code blocks. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/aqa-flow-test-correction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-test-implementation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Output Contract | Problem: BASE shipped a full worked TypeScript test example (imports, describe block, setup, actions, explicit assertions, cleanup) plus a ## Phase 6: Test Implementation test-plan output template. NEW deletes all of it and keeps only a state-file update example; the actual test-code shape and test-plan section are deferred to aqa-test-authoring's <output_format>. The <skill_handoff> contract is strong, but no in-phase test example remains.Reason: The deleted worked example was the concrete grounding for what a passing test looks like; correctness now depends entirely on the bound aqa-test-authoring skill, so the ownership pointer must stay accurate.Solution: Deferral is acceptable because aqa-test-authoring is explicitly named as owner of authoring decisions and <output_format>; confirm that skill carries the test-code/test-plan example and keep the existing pointer. No restoration of inline code needed. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-test-report-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow-api-spec-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/workflows/qa-flow-data-collection.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/workflows/qa-flow-documentation-mcp-subflow.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/workflows/qa-flow-execution-and-report-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/workflows/qa-flow-gap-and-requirements-clarification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/workflows/qa-flow-project-config-loading.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/workflows/qa-flow-test-case-specification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/workflows/qa-flow.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/workflows/qa-flow-test-correction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/workflows/qa-flow-test-implementation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/workflows/testgen-flow-data-collection.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-gap-and-contradiction-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Output Contract | Problem: The base inlined the full entry schemas for contradictions (C1: Type/Source1/Source2/Impact/NeedsClarification), gaps (G1: Type/Context/Missing/Impact/SuggestedQuestion), ambiguities (A1), and the whole analysis.md skeleton (sections 1-6 with formats). The new file delegates sections 1-6 to the gap-and-contradiction-analysis skill and only keeps the appended sections 7-8 plus a single vague-vs-specific example. The phase no longer states the per-entry field shapes.Reason: The schema moved to the skill rather than being lost, but the phase now depends entirely on the skill for the entry contract, which is a single point of failure if the skill drifts. Solution: Acceptable as progressive disclosure provided the gap-and-contradiction-analysis skill defines the C/G/A entry shapes and the sections 1-6 skeleton in its <output_format>. Verify the skill carries them; if not, the phase has dropped the only place the entry schema was specified. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/testgen-flow-project-config-loading.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/testgen-flow-question-generation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/testgen-flow-requirements-document-generation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/testgen-flow-test-case-export.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-test-case-generation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Failure Handling | Problem: The BASE router carried an explicit ## Error Handling block at the router level (Jira ticket not found -> verify key; no Confluence results -> proceed Jira-only or ask; user doesn't answer questions -> remind, cannot proceed to Phase 4). The NEW router removes this block. The router still names HITL gates and the verification-failure resume, but per-phase failure cases (e.g. ticket-not-found, no-Confluence-results) are no longer surfaced at the routing layer.Reason: Failure cases that exist in neither the router nor the phase file would be silently lost during the restructure. Verification showed the deleted detail largely relocated into phase reference files, so this is a relocation risk, not a confirmed loss. Solution: Confirm each removed failure case is owned by its ACQUIRE'd phase file (data-collection, question-generation). If a case has no home in a phase file, add a one-line router-level pointer or restore it. Do not re-inline full detail; a cross-reference is sufficient for a router. |
| ⚪ Low | Example Grounding | Problem: The BASE router included concrete grounding examples (initial-prompt formats, a sample Confluence CQL search string, contradiction/gap type catalogs). The NEW router drops these from the router body. The router now relies on phase files for examples. Reason: For a top router, moving examples into ACQUIRE'd phase files is the intended progressive-disclosure pattern and reduces router bloat, so the absolute capability is preserved as long as the phase files carry the examples. Solution: Verify the CQL search example and prompt-format examples are present in testgen-flow-data-collection.md and testgen-flow-project-config-loading.md. Keep them out of the router (correct for progressive disclosure); only act if a phase file lacks its example. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📋 Prompt Quality Validation Report❌ Validation FailedSummary by File
📄
|
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/api-test-spec-authoring/references/templates-and-redaction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/aqa-codebase-analysis/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/aqa-codebase-analysis/references/report-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/aqa-requirements-elicitation/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/aqa-selector-management/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/aqa-selector-management/references/strategy-and-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/aqa-test-authoring/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/aqa-test-authoring/references/test-implementation-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/aqa-test-debugging/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/aqa-test-debugging/references/escalation-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/aqa-test-debugging/references/part-b-mechanics.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/automation-test-execution-analysis/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/automation-test-execution-analysis/references/redaction-policy.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/automation-test-implementation-handoff/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/skills/automation-test-implementation-handoff/references/templates.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-best-practices.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Reference Integrity | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-extract.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Reference Integrity | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Reference Integrity | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-rosetta.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Reference Integrity | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/coding/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/skills/confluence-source-harvesting/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/skills/confluence-source-harvesting/references/redaction-and-normalization.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/skills/gap-and-contradiction-analysis/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/skills/gap-and-contradiction-analysis/references/entry-templates-and-document-skeleton.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/skills/gitnexus-cli/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Safety Boundaries | Problem: Destructive commands clean (deletes .gitnexus/ and unregisters repo) and clean --all (deletes ALL indexed repos) are documented with no guardrail; the --force flag to skip confirmation is presented neutrally. Reason: An agent could irreversibly delete indexes across all repos when only the current repo was intended. Solution: Add a boundary note that clean --all and clean --force are destructive and must not be auto-run without explicit user approval. |
| ⚪ Low | Failure Handling | Problem: The troubleshooting block covers three known symptoms but the skill defines no behavior for command failure in general (non-zero exit, missing API key for wiki, network failure during analyze). Reason: Silent command failure would let downstream steps run against a stale or absent index. Solution: Add a brief failure-handling rule: on non-zero exit, surface stderr to the user and stop; do not silently proceed as if indexing succeeded. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/skills/gitnexus-setup/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| ⚪ Low | Safety Boundaries | Problem: The install steps run npx gitnexus analyze and npx gitnexus setup, the latter writing global MCP config that auto-detects and modifies editor configuration, with no boundary requiring user awareness that a global, machine-wide config write occurs. Reason: A global config write is a system-level side effect; flagging it prevents unexpected machine-wide changes during init. Solution: Add a boundary note that setup writes global editor/MCP configuration and should run only with the user's awareness, consistent with the when-to-use opt-in gate. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/skills/gitnexus-tools/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟠 Very High | Reference Integrity | Problem: The new block references gitnexus-usage/assets/gn-examples.md, but the asset actually lives at gitnexus-tools/assets/gn-examples.md. No gitnexus-usage skill folder exists in r2. Reason: When the agent runs ACQUIRE on the wrong path the examples will not load, so the worked examples this skill points to are unreachable. Solution: Change the reference in the block to gitnexus-tools/assets/gn-examples.md to match the real asset path inside this skill. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ✅ Much better |
| Output Contract | 4 | ✅ Much better |
| Success Criteria | 4 | ✅ Much better |
| Conflict Resolution | 4 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ✅ Much better |
| Workflow Completeness | 4 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 2 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ✅ Much better |
| Failure Handling | 4 | ✅ Much better |
| Epistemic Honesty | 4 | ✅ Much better |
| Self-Validation | 4 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 4 | ✅ Much better |
| Dependency Management | 4 | ✅ Much better |
📄 instructions/r2/core/skills/gitnexus-tools/assets/gn-examples.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ✅ Much better |
| Output Contract | 4 | ✅ Much better |
| Success Criteria | 4 | ✅ Much better |
| Conflict Resolution | 4 | ✅ Much better |
| Decision Branching | 4 | ✅ Much better |
| Instruction Ordering | 4 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ✅ Much better |
| Failure Handling | 4 | ✅ Much better |
| Epistemic Honesty | 4 | ✅ Much better |
| Self-Validation | 4 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ✅ Much better |
📄 instructions/r2/core/skills/init-workspace-documentation/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Output Contract | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/init-workspace-rules/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Precision & Explicitness | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/init-workspace-verification/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Workflow Completeness | Problem: The PR deletes the DEPRECATED ARTIFACTS block that told the agent to notify the user (without auto-delete) about r1 leftovers agents/init-rosetta-shells-flow-state.md and local init-rosetta-shells-flow.md. The verification phase no longer surfaces these stale artifacts during an upgrade. Reason: Removing the only notification step means upgrades that still carry these r1 files leave stale artifacts behind silently, a small loss of an upgrade safety check. Solution: If these r1 artifacts are no longer reachable in supported upgrade paths, the deletion is fine; otherwise restore a short deprecated-artifacts notification step so upgrades from r1 still flag the leftover state/flow files for the user. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Workflow Completeness | 3 | ⬇️ Slightly worse |
| Bloat Control | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/load-context-instructions/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ✅ Much better |
| Output Contract | 4 | ✅ Much better |
| Success Criteria | 4 | ✅ Much better |
| Conflict Resolution | 4 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 4 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ✅ Much better |
| Safety Boundaries | 4 | ✅ Much better |
| Failure Handling | 4 | ✅ Much better |
| Epistemic Honesty | 4 | ✅ Much better |
| Self-Validation | 4 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/load-context/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Example Grounding | Problem: The added bash example uses grep -n "^#{1,3}" .... Plain grep is BRE and treats {1,3} literally, so this command matches nothing for markdown headers. The mitigation in step 3 (use built-in tools if available) does not fix the literal example the agent is shown. Reason: An agent that copies the shown command verbatim gets empty output and silently loses the IMPLEMENTATION/MEMORY/PATTERNS/REQUIREMENTS header context this step is meant to gather. Solution: Use grep -nE "^#{1,3} " (extended regex, with a trailing space) or grep -nE "^#+ " so the header-extraction example actually returns the intended ToC lines. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 2 | ⬇️ Slightly worse |
| Failure Handling | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
📄 instructions/r2/core/skills/load-workflow/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| ⚪ Low | Failure Handling | Problem: No handling for the case where no workflow matches the request or the ACQUIRE fails. Reason: An unmatched request currently has no defined path, risking a stall. Solution: Add a failure branch: if no workflow matches, fall back to the ad-hoc/lightweight workflow or ask the user. |
| ⚪ Low | Decision Branching | Problem: Step 2 (resume) and step 3 (auto vs No HITL) name branch conditions but give no explicit else/handling when the state file is missing or the mode is ambiguous. Reason: Variable resume/mode scenarios without an else can leave the agent stalled or silently picking a mode. Solution: Add the else branch: if no state file exists on a resume request, state that and start fresh; define default when mode is unclear. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/skills/mcp-confluence-data-collection/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Cognitive Budget | Problem: The always-loaded SKILL.md packs the full success_criteria, an 8-step process, output template, 5 safety rules, 8 failure cases, 9 validation items, and 8 pitfalls into one entry file (~14K chars), exceeding the ~5-step reliable-handling guidance the spec cites. Reason: A single oversized entry file increases the odds the agent drops steps; progressive disclosure of the checklist would cut active load without losing the gate. Solution: Move the validation_checklist (largely a mirror of success_criteria + failure_handling) to a reference loaded only at pre-emit time, leaving the entry file focused on process + safety + failure. |
| 🟡 High | Bloat Control | Problem: New 13.9K-char SKILL.md heavily restates the same rules across <success_criteria>, , <safety_boundaries>, <failure_handling>, <validation_checklist>, and (e.g. 'permission errors are not empty content', truncation-at-5000-words, redact-before-writing each appear 3-4 times). Reason: The 10K-20K size band is a high-severity reliability concern; repeated prose inflates the always-loaded context and raises the chance the agent skips list items. Solution: Keep each rule in its primary section (failure_handling for error paths, safety_boundaries for redaction) and have validation_checklist/pitfalls reference rather than re-state the full prose. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/skills/mcp-confluence-data-collection/references/cql-and-redaction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/skills/mcp-confluence-data-collection/references/vendor-swap.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/skills/mcp-jira-data-collection/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Cognitive Budget | Problem: The always-loaded entry file carries success_criteria + 5-step process + full output template + 5 safety rules + 7 failure cases + 8 validation items + 7 pitfalls (~12K chars), well beyond the ~5-step reliable-handling guidance. Reason: An oversized single entry file raises the chance the agent drops steps; the checklist is the most duplicative block and is the natural candidate for progressive disclosure. Solution: Move the validation_checklist (a near-duplicate of success_criteria + failure_handling) into a reference loaded only at the pre-emit step, trimming the entry file. |
| 🟡 High | Bloat Control | Problem: New 12.4K-char SKILL.md restates the same guarantees across <success_criteria>, step 3, <safety_boundaries>, <failure_handling>, <validation_checklist>, and (restricted-not-empty, redact-before-writing, comments-cap-at-10, no-fabrication each repeated 3-4 times). Reason: 10K-20K size band is a high-severity reliability concern; duplicated prose inflates always-loaded context and increases skipped-item risk. Solution: Keep each rule in one home section and have validation_checklist/pitfalls reference instead of duplicating the full prose. |
| 🔵 Medium | Workflow Completeness | Problem: New is numbered 1-5 and places jira_search_fields in step 3, but the new sibling references/vendor-swap.md (line 13) calls it 'step 6 fallback' — there is no step 6, so the step numbering referenced across the skill family is inconsistent. Reason: Mismatched step references in the same skill family make maintainer edits error-prone and signal the numbering was renumbered without updating refs. Solution: Fix the vendor-swap.md cross-reference to say 'step 3 + pitfalls' to match the actual SKILL.md step numbering. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/skills/mcp-jira-data-collection/references/redaction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/skills/mcp-jira-data-collection/references/vendor-swap.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Reference Integrity | Problem: Line 13 refers to 'jira_search_fields (step 6 fallback + pitfalls)' but the sibling SKILL.md only has steps 1-5 and places jira_search_fields in step 3 — the cited step number does not resolve. Reason: A dangling step reference inside the same skill family misleads maintainers porting the skill and indicates the SKILL.md was renumbered without updating this guide. Solution: Change 'step 6 fallback + pitfalls' to 'step 3 + pitfalls' to match the actual SKILL.md numbering. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/skills/mcp-testrail-data-collection/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
📄 instructions/r2/core/skills/mcp-testrail-data-collection/references/redaction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
📄 instructions/r2/core/skills/mcp-testrail-data-collection/references/vendor-swap.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/operation-manager/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
📄 instructions/r2/core/skills/operation-manager/assets/om-schema.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/orchestrator-contract/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/qa-data-collection/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Cognitive Budget | Problem: The newly added SKILL.md is ~14.2K chars in one file. Although step 4/5/7 enumerations were correctly extracted to references/, the SKILL.md body still carries the full <safety_boundaries>, <success_criteria>, <failure_handling>, and <validation_checklist> blocks, which overlap heavily (the secret-scan rule is restated across pitfalls, safety, step 6.1, success criteria, and validation checklist). Reason: A 14K single-load skill body with the same rule repeated in five sections raises cognitive load and dilutes the one-term-per-concept principle, making it harder for the agent to reliably act on the canonical version. Solution: Keep the single authoritative statement of the secret-scan and anti-assumption rules in <safety_boundaries> and reference them by tag from the other blocks instead of restating the full procedure, reducing the in-context body below the ~10K reliable-load band. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/qa-data-collection/references/backend-source-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/qa-data-collection/references/existing-test-patterns.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/qa-data-collection/references/output-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/qa-gap-analysis/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Cognitive Budget | Problem: The new SKILL.md is ~15K chars and is fully inline (no references/ extraction unlike its sibling qa-data-collection). The <validation_checklist> 'Question count <= 20 per batch' note (lines 270) and the Executive Summary 'Questions Asked' note (line 182) spend a large prose budget reconciling two different question-count denominators (Critical+Important vs Critical+Important+Optional), restating the same distinction three times. Reason: A 15K single-load body that triple-explains the same count-denominator nuance increases cognitive load and risks the agent applying the wrong denominator; consolidating to one canonical definition improves reliability. Solution: State the two count definitions once in <output_format> (Executive Summary) and reference that single definition from <validation_checklist> and <success_criteria> rather than re-explaining the deliberate denominator difference in each block; consider extracting the gap/contradiction/ambiguity entry templates to a references file as qa-data-collection does. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/qa-project-config/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Cognitive Budget | Problem: The new SKILL.md is ~14.9K chars fully inline. The 'Redaction at intake' rule is the canonical source in <safety_boundaries> but is then restated by pointer in <failure_handling> (line 215), (line 227), and <validation_checklist> (line 243); combined with the two embedded markdown templates (state-file stub + project-config template) the single-load body sits well above the ~10K reliable band. Reason: A ~15K skill body that re-points to the same redaction rule from four blocks raises cognitive load; consolidating to one canonical location keeps the agent reliably acting on a single version of the safety contract. Solution: Keep the project-config markdown template only in step 5 and the redaction rule only in <safety_boundaries>, and let the other blocks reference by tag without re-describing; this trims the single-load body toward the reliable range without losing any rule. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/qa-test-debugging/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ✅ Much better |
| Input Contract | 4 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ✅ Much better |
| Cognitive Budget | 4 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/qa-test-debugging/references/failure-catalog.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ✅ Much better |
| Instruction Ordering | 4 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ✅ Much better |
| Epistemic Honesty | 4 | ✅ Much better |
| Self-Validation | 4 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/qa-test-debugging/references/part-b-mechanics.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/qa-test-implementation/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ✅ Much better |
| Cognitive Budget | 4 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/qa-test-implementation/references/multi-language-examples.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ✅ Much better |
| Output Contract | 4 | ✅ Much better |
| Success Criteria | 4 | ✅ Much better |
| Conflict Resolution | 4 | ✅ Much better |
| Decision Branching | 4 | ✅ Much better |
| Instruction Ordering | 4 | ✅ Much better |
| Workflow Completeness | 4 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ✅ Much better |
| Failure Handling | 4 | ✅ Much better |
| Epistemic Honesty | 4 | ✅ Much better |
| Self-Validation | 4 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/repository-implementation-standards/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/requirements-synthesis/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/requirements-synthesis/references/output-schemas.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/sequential-workflow-execution/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/swagger-contracts-analysis/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/swagger-contracts-analysis/references/canonical-example.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/swagger-contracts-analysis/references/failure-handling-edge-cases.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/swagger-contracts-analysis/references/redaction-catalog.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/testrail-test-case-authoring/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/testrail-test-case-authoring/references/examples-and-redaction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/testrail-test-case-export/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/testrail-test-case-export/references/vendor-porting.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/user-approved-code-changes/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/adhoc-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Reference Integrity | Problem: Line 30 newly adds the OPERATION_MANAGER recovery path: 'MUST FALLBACK to built-in todo task tools ACQUIRE todo-tasks-fallback.md FROM KB'. That file exists only in r3 (instructions/r3/core/rules/todo-tasks-fallback.md); there is no todo-tasks-fallback.md anywhere in r2. Per release isolation (one agent works with one release, no cross-refs), an r2 agent that reaches this path ACQUIREs a document that resolves to zero results. Reason: Simulated r2 agent whose rosettify MCP and npx CLI both fail runs ACQUIRE on a non-existent doc, gets nothing back, and is left with no execution-tracking mechanism on its only recovery route. r3 agents are unaffected. Solution: Add todo-tasks-fallback.md to the r2 KB (mirroring r3), or change the r2 line to describe the built-in todo-task fallback inline instead of pointing at a missing file. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 3 | ⬇️ Slightly worse |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow-code-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow-data-collection.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow-requirements-clarification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow-selector-identification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow-selector-implementation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow-test-correction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow-test-implementation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/aqa-flow-test-report-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Safety Boundaries | Problem: The PR adds a new <orchestration_and_escalation> 'Verification-failure unilateral-start override' (lines 116-125, not present in origin/main) that authorizes the agent to start the earliest incomplete phase in the SAME turn without asking the user ('do NOT call AskUserQuestion'). This is a new sanctioned no-ask deviation from session-wide hitl defaults. The auditors flagged the identical override on the sibling testgen-flow.md as Safety Boundaries sev 3 but marked this r2 aqa-flow.md clean, an inconsistency. Reason: Any new sanctioned no-ask path is a HITL relaxation that deserves reviewer attention; the override is well scope-locked (3 preconditions, 'Ambiguity defaults to ASK', explicit carve-outs) so severity is moderate, not critical. Solution: Apply the same Safety Boundaries flag as testgen-flow.md: keep the carve-outs but make the override auditable (log the override decision into agents/aqa-state.md, not just print one line) and re-confirm it is clearly subordinate to bootstrap-hitl-questioning policy. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 3 | ⬇️ Slightly worse |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/coding-agents-prompting-flow.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/coding-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟠 Very High | Decision Branching | Problem: Phase renumber from 11 to 13 phases broke the SMALL-mode HITL combine target. New <user_review_impl phase="10"> line 126 says 'SMALL: combined with Phase 12 checkpoint', but in NEW Phase 12 is <review_tests>, a non-HITL subagent reviewer phase. In BASE the same impl-approval gate (then phase 8) combined with Phase 4, which was a HITL user-review gate. A user-approval gate now points at a subagent review phase that never presents to the user, and there is no later HITL phase to fold into. Reason: Simulated SMALL-mode agent at phase 10 reads 'combine with Phase 12', treats the impl user-approval as already covered by the test-review phase, and proceeds to write/run tests without the explicit 'Do NOT proceed to tests until explicit approval' user gate firing. The impl HITL approval can be silently skipped in SMALL runs. Solution: Repoint the SMALL combine target of phase 10 to an actual type="HITL" gate. Since no HITL phase follows phase 10, either keep phase 10 as a standalone HITL checkpoint in SMALL mode (remove the combine note) or pair it with phase 6 like the design gate does. The combine target MUST be a HITL phase. |
| 🟡 High | Reference Integrity | Problem: In the new <user_review_impl phase="10"> the cross-reference was renumbered to 'SMALL: combined with Phase 12 checkpoint' (line 126), but Phase 12 is now <review_tests>, a non-HITL subagent reviewer phase. The BASE pointed to 'Phase 4', which was a HITL user-review checkpoint. A HITL user approval gate cannot logically be merged into a subagent code-review phase. Reason: The phase renumber (old 8-phase -> new 13-phase) updated the number but not the semantics; pointing a user-approval combine at a subagent review breaks the SMALL-mode checkpoint collapsing logic. Solution: Repoint the SMALL combination target to a HITL checkpoint phase (e.g. the final HITL user review, or remove the combine note if no later HITL gate exists). The combine target must be a type="HITL" phase. |
| 🔵 Medium | Workflow Completeness | Problem: The new <user_review_design phase="3"> block lists all four of its sub-steps with the marker '1.' (lines 55-58) instead of 1,2,3,4. Step ordering/dependency within this newly added HITL phase is not numbered. Reason: These lines are newly added in this diff; repeated '1.' loses the intended ordering of the approval-gate sub-steps. Solution: Renumber the four sub-steps 1-4 so the present-solution -> present-specs -> do-not-assume-approval -> SMALL-combine sequence is explicit. |
| ⚪ Low | Structural Coherence | Problem: Same <user_review_design> block (lines 55-58) breaks atomic-step numbering by repeating '1.' four times, reducing scannability of the new HITL gate. Reason: Introduced by this diff; non-sequential numbering in a HITL gate degrades structural clarity of the changed section. Solution: Apply sequential numbering 1-4 to the added sub-steps. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Decision Branching | 2 | ⬇️ Slightly worse |
| Workflow Completeness | 3 | ⬇️ Slightly worse |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 3 | ⬇️ Slightly worse |
| Structural Coherence | 3 | ⬇️ Slightly worse |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/external-lib-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| ⚪ Low | Workflow Completeness | Problem: The newly added 'Phase 0: Prerequsites' block numbers its two items '1.' then '3.' (skips 2), so the prerequisite list has a broken sequence. Reason: These lines are added in this diff; the 1 then 3 numbering is a defect in the added content that misrepresents step count. Solution: Renumber the two prerequisite items as 1 and 2. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 3 | ⬇️ Slightly worse |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/init-workspace-flow-context.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/workflows/init-workspace-flow-discovery.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/workflows/init-workspace-flow-questions.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Input Contract | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/init-workspace-flow-rules.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/workflows/init-workspace-flow-shells.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/workflows/init-workspace-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Workflow Completeness | Problem: The new verification phase (now phase 9) deleted step 'Notify user: delete init-rosetta-shells-flow.md'. The old base had this cleanup step; the new version drops it, so a stale shell file may be left behind after init completes. Reason: Losing a cleanup instruction can leave stale config files that mislead later sessions. Solution: If the deletion is intentional because the file is no longer generated, leave as-is; otherwise restore a cleanup/notify step in the verification phase for any obsolete shell file. Confirm intent before final merge. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Success Criteria | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/modernization-flow.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/qa-flow-api-spec-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/workflows/qa-flow-data-collection.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/workflows/qa-flow-documentation-mcp-subflow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: Line 28's meta-paragraph explains the file's own authoring rationale ("the literal outcome line is inlined parenthetically ... so the agent does not have to cross-jump", "The canonical table ... still lives in <output_contract>", "Config-key precedence lives in <workflow_context> and is referenced, not relisted"). This is non-operational provenance/design commentary rather than an instruction the agent acts on. Reason: Non-operational design rationale inflates the prompt and competes for attention with the actual branch logic, which the hardening reference flags as AI slop to strip. Solution: Remove the explanatory meta-paragraph at line 28; retain only the operational Early-exit rule. The cross-references are already self-evident from the section names. |
| ⚪ Low | Cognitive Budget | Problem: The new <execute_documentation_mcp> block packs a meta-paragraph, an early-exit rule, three nested sub-blocks (resolve/harvest_and_collect/verify), a verify_remediation block, and an output_contract table into one fragment with heavy cross-jumps (each branch references <output_contract> by name plus an inlined parenthetical copy of the same outcome line). One agent must hold many interacting branch rules at once for a single artifact write. Reason: Duplicated outcome strings in both the table and inline parentheticals double the surface area for one decision and raise drift risk if the table later changes. Solution: Keep the single source of truth (the <output_contract> table) but drop the inlined parenthetical duplicate outcome lines at each trigger site introduced in /<harvest_and_collect>; let the table be the only place outcome strings live, reducing the parallel rule-set the agent juggles. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/workflows/qa-flow-execution-and-report-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/workflows/qa-flow-gap-and-requirements-clarification.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| ⚪ Low | Single Responsibility | Problem: <execute_gap_analysis> chains three analysis skills in sequence (lines 30-32): qa-gap-analysis, gap-and-contradiction-analysis, and aqa-requirements-elicitation. The first two have overlapping responsibility (gaps vs contradictions) and the file does not state how their outputs combine or whether one supersedes the other. Reason: Two similarly-scoped skills run back-to-back without a stated division of labor can produce duplicated or contradictory entries in the gaps/contradictions sections. Solution: Add a one-line note stating each skill's distinct contribution to analysis.md (e.g. gaps vs contradictions vs elicited requirements) so the agent does not double-count or produce conflicting sections. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/workflows/qa-flow-project-config-loading.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/workflows/qa-flow-test-case-specification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/workflows/qa-flow-test-correction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/workflows/qa-flow-test-implementation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/qa-flow.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/research-flow.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/self-help-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Structural Coherence | Problem: The new <prerequisites phase="0", applies="ALL"> block is opened twice and never closed. The added closing line is <prerequisites phase="0", applies="ALL"> (a second opening tag) instead of . The block bleeds into the following <list_capabilities phase="1" ...> section with no clear boundary. Reason: An unclosed/duplicated XML-style tag makes the section boundary ambiguous; an agent parsing the workflow can mis-attribute Phase-0 prerequisite rules to Phase 1 or lose the boundary entirely. The sibling research-flow.md edit closed the block correctly, so this is an isolated typo in this diff. Solution: Change the second added <prerequisites phase="0", applies="ALL"> line to a closing tag , matching the pattern used correctly in research-flow.md. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Structural Coherence | 3 | ⬇️ Slightly worse |
📄 instructions/r2/core/workflows/testgen-flow-project-config-loading.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Conflict Resolution | Problem: The rewritten <update_state step="0.6"> keeps the user-facing question "Ready to proceed to Phase 1 (Data Collection)?" but the base file's explicit step "3. Wait for confirmation" before loading Phase 1 was deleted and not replaced with a STOP/WAIT directive. Reason: Asking a question without an explicit wait instruction lets an agent ask and immediately proceed, weakening the phase-transition HITL gate that the deleted line provided. The parent testgen-flow may still gate the transition, but the per-phase explicitness was reduced by this diff. Solution: Add an explicit STOP-and-wait directive after the Phase-1 readiness question in step 0.6 (e.g. "STOP and wait for user confirmation before the parent flow advances to Phase 1"), matching the deterministic gate wording used elsewhere in the qa/testgen flows. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/testgen-flow-data-collection.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| ⚪ Low | Output Contract | Problem: The base file specified concrete Jira retrieval (exact fields= list and comment_limit=10) and detailed Confluence capture instructions (CQL query template, parent/child traversal limits, per-page capture fields) inline. The diff deletes these and delegates to mcp-jira-data-collection, mcp-confluence-data-collection, and confluence-source-harvesting. The phase's own retrieval output contract is now thinner and depends entirely on those skills defining the field set and capture shape. Reason: Verified the three delegated skills exist in r2, so Reference Integrity holds; the residual risk is that the phase no longer states what raw-data.md must contain, so a skill change could silently degrade the artifact without the phase catching it. Low severity because the validation_checklist still requires key Jira fields. Solution: Keep the delegation (correct per progressive disclosure), but add a one-line minimum-output assertion in <create_raw_data> naming the required captured fields (summary, description, status, priority, labels, components, comments, Confluence page title/url/content) so the phase still asserts its raw-data.md contract independent of skill internals. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/testgen-flow-gap-and-contradiction-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| ⚪ Low | Output Contract | Problem: 328 lines were deleted: the full contradiction/gap/ambiguity taxonomy, per-entry C/G/A field templates, and the analysis.md document skeleton sections 1-6 are removed and delegated to the gap-and-contradiction-analysis skill and its references/entry-templates-and-document-skeleton.md. The phase now only owns the appended sections 7-8 and relies on the skill emitting sections 1-6 with exact numbering for the section-7 append to attach correctly. Reason: Verified the skill and its entry-templates-and-document-skeleton.md exist in r2 and define sections 1-6 ending at section 6, so the append is currently consistent and Reference Integrity holds. The residual risk is a cross-file coupling: the phase's section-7 append silently breaks if the skill's section numbering changes. Low severity because the coupling is currently correct and the diff explicitly documents the canonical home. Solution: Keep the delegation. Optionally add a one-line guard in <create_analysis_document> instructing the agent to verify the skill output ended at a ## 6. section before appending section 7, so a skill drift in section numbering is caught rather than producing a misaligned document. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/testgen-flow-question-generation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/testgen-flow-requirements-document-generation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| ⚪ Low | Example Grounding | Problem: The new version deleted the inline worked examples (US-1 User Login, FR-1 Password Validation, NFR-1 API Response Time with concrete 200ms/95%/1000-user values). These showed the agent exactly what a filled-in entry looks like. The new file replaces per-entry shapes with a pointer to the requirements-synthesis skill's references/output-schemas.md and keeps only an abstract section table, so a concrete positive example no longer survives in this phase file. Reason: Without a concrete filled example the agent may emit vague or under-specified requirement entries, especially when the skill fails to load. Solution: Keep one short concrete example entry (e.g., one filled FR or NFR with real threshold values) inline, or confirm the cited requirements-synthesis output-schemas.md contains equivalent worked examples so grounding is preserved one hop away. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬇️ Slightly worse |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/testgen-flow-test-case-export.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/testgen-flow-test-case-generation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| ⚪ Low | Cognitive Budget | Problem: The new step 5.2 (identify_test_types) and step 5.3 (generate_test_cases) pack a large amount of content into single steps: full test-type taxonomy plus CRUD/Auth/API coverage patterns in 5.2, and in 5.3 the inline TC schema, the dual-path format constraint, the forbidden-fields list, the good/bad title table, and the merge anti-pattern note. Step 5.3 in particular carries 6+ distinct sub-rules in one directive block, near the ~5-directive reliability ceiling for a single read. Reason: Dense single steps raise the chance the agent drops one sub-rule (e.g., a forbidden field check) when executing under load. Solution: Consider splitting the 5.3 inline template concerns (field schema vs format-prohibitions vs title-quality) into clearly separated sub-blocks or moving the redundant prose into the testrail-test-case-authoring skill, keeping only the self-contained fallback schema inline. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/testgen-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Safety Boundaries | Problem: The new <orchestration_and_escalation> block introduces a 'verification-failure unilateral-start override' that authorizes the agent to start a phase in the same turn WITHOUT asking the user, deviating from the per-phase USER CONFIRMATION rule and session-wide hitl defaults. Although it is scope-locked with three preconditions and explicit carve-outs for Phase 3/Phase 6 HITL and destructive actions, it is the only place in the family that sanctions a no-ask deviation from HITL, so it expands the no-ask surface relative to base (which had no such override). Reason: Any sanctioned no-ask path is a HITL relaxation; even tightly scoped, it must be auditable and clearly subordinate to bootstrap HITL policy to avoid an agent over-generalizing the carve-out. Solution: Keep the carve-outs but consider tightening the trigger to also require the agent to log the override decision into testgen-state.md (not just print one line), so the relaxation leaves an auditable trail; and re-confirm this matches bootstrap-hitl-questioning policy precedence. |
| ⚪ Low | Bloat Control | Problem: The <orchestration_and_escalation> and surrounding workflow_phases bullets are heavily prose-dense for a router that is supposed to stay thin (the file itself repeatedly says 'router stays thin'). The single override rule spans ~10 nested bullets with repeated restatements of the same carve-outs (Phase 3, Phase 6, safety) in both <workflow_phases> and <orchestration_and_escalation>. Reason: Repetition of the same carve-outs in two blocks adds reading cost to the top-level router and slightly raises the chance of inconsistent edits later. Solution: Compress the duplicated carve-out lists into one canonical list referenced from both places; reduce restated rationale to a single line. |
| ⚪ Low | Conflict Resolution | Problem: The override rule and the happy-path USER CONFIRMATION rule govern overlapping territory (phase transitions). The new text works hard to disambiguate them ('does NOT generalize', 'Ambiguity defaults to ASK'), but the resolution is spread across <workflow_phases> bullets and the <orchestration_and_escalation> block rather than a single priority hierarchy, leaving the reader to reconcile two competing transition rules. Reason: Two transition rules with cross-references increase the chance an agent applies the no-ask override outside its intended single gate. Solution: State the precedence once as an explicit ordered hierarchy (e.g., 'safety/HITL gates > per-phase USER CONFIRMATION > verification-failure override') in one location and reference it, instead of repeating the carve-out conditions in both blocks. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/api-test-spec-authoring/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/api-test-spec-authoring/references/templates-and-redaction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/aqa-codebase-analysis/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/aqa-codebase-analysis/references/report-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/aqa-requirements-elicitation/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/aqa-selector-management/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/aqa-selector-management/references/strategy-and-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/aqa-test-authoring/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/aqa-test-authoring/references/test-implementation-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/aqa-test-debugging/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/aqa-test-debugging/references/escalation-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/aqa-test-debugging/references/part-b-mechanics.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/automation-test-execution-analysis/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/automation-test-execution-analysis/references/redaction-policy.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/automation-test-implementation-handoff/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/automation-test-implementation-handoff/references/templates.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/confluence-source-harvesting/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/confluence-source-harvesting/references/redaction-and-normalization.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/gap-and-contradiction-analysis/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/gap-and-contradiction-analysis/references/entry-templates-and-document-skeleton.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/mcp-confluence-data-collection/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: The redaction guidance and the cross-domain/permission-error rules are restated three times across <safety_boundaries>, <failure_handling>, <validation_checklist>, and (e.g. "Permission errors are not empty content" appears in step 4, safety_boundaries, failure_handling per-page case, validation_checklist, and pitfalls). The same "redact BEFORE writing" instruction repeats in process step 8, safety_boundaries, validation_checklist, and pitfalls. Reason: Repetition inflates the always-loaded SKILL.md and raises read cost on every invocation without adding new behavior; the rule is already enforced by the validation checklist pointer. Solution: Keep the operational rule once in <safety_boundaries> and have <validation_checklist> and reference it by name rather than re-stating the full rule, reducing the ~14KB file size. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/mcp-confluence-data-collection/references/cql-and-redaction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/mcp-confluence-data-collection/references/vendor-swap.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/mcp-jira-data-collection/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: The redaction-before-writing rule and the "permission-restricted is not empty content" rule are each stated 3-4 times across process step 3, safety_boundaries, failure_handling, validation_checklist, and pitfalls (e.g. the assignee/reporter/description restriction rule appears in step 3, safety_boundaries, failure_handling, and pitfalls). Reason: The duplication enlarges the always-loaded ~12KB SKILL.md and raises per-invocation read cost without changing behavior. Solution: State each operational rule once in <safety_boundaries> and have <validation_checklist> / reference it by name instead of restating the full rule. |
| 🔵 Medium | Reference Integrity | Problem: Sibling reference references/vendor-swap.md (line 13) maps jira_search_fields to "step 6 fallback + pitfalls", but SKILL.md has no step 6 — the process ends at step 5, and jira_search_fields is actually invoked in step 3 (custom-fields branch). The step-number citation is stale. Reason: A maintainer forking the skill follows the cited step number to find the call site; pointing at a non-existent step 6 sends them to the wrong place and erodes trust in the rebind list. Solution: In references/vendor-swap.md line 13 change "step 6 fallback + pitfalls" to "step 3 custom-fields branch + pitfalls" to match the actual SKILL.md process numbering. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/mcp-jira-data-collection/references/redaction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/mcp-jira-data-collection/references/vendor-swap.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Reference Integrity | Problem: Line 13 cites the jira_search_fields call site as "(step 6 fallback + pitfalls)", but the parent SKILL.md has no step 6 — the call lives in step 3 (custom-fields branch). The cross-reference into the parent skill is stale and will misdirect a maintainer. Reason: vendor-swap.md is the maintainer's authoritative rebind map; a wrong step number defeats its purpose and could cause the call site to be missed during a fork. Solution: Change "(step 6 fallback + pitfalls)" on line 13 to "(step 3 custom-fields branch + pitfalls)" so the rebind list points at the real process step. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/mcp-testrail-data-collection/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/mcp-testrail-data-collection/references/redaction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/mcp-testrail-data-collection/references/vendor-swap.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/qa-data-collection/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/qa-data-collection/references/backend-source-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/qa-data-collection/references/existing-test-patterns.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/qa-data-collection/references/output-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/qa-gap-analysis/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/qa-project-config/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/qa-test-debugging/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/qa-test-debugging/references/failure-catalog.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/qa-test-debugging/references/part-b-mechanics.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/qa-test-implementation/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/qa-test-implementation/references/multi-language-examples.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/repository-implementation-standards/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/requirements-synthesis/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/requirements-synthesis/references/output-schemas.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Single Responsibility | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/sequential-workflow-execution/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/swagger-contracts-analysis/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Cognitive Budget | Problem: The new SKILL.md is ~15.9K chars (10K-20K band). The 5-step plus inline <output_format> template, <validation_checklist>, <safety_boundaries>, <success_criteria>, <failure_handling>, and all live in the always-loaded entry file, so a single load carries a large directive surface even though three reference files already use progressive disclosure. Reason: Large always-loaded entry files raise per-invocation cognitive load and token cost; the skill already established a references/ lazy-load pattern that the inline template does not yet exploit. Solution: Move the full per-endpoint markdown template body (lines 116-176) out to references/ (the canonical-example already lives there) and keep only the field-name list inline, trimming the entry file toward the <300-line / sub-10K target. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/swagger-contracts-analysis/references/canonical-example.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/swagger-contracts-analysis/references/failure-handling-edge-cases.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/swagger-contracts-analysis/references/redaction-catalog.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/testrail-test-case-authoring/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/testrail-test-case-authoring/references/examples-and-redaction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/testrail-test-case-export/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/testrail-test-case-export/references/vendor-porting.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Self-Validation | Problem: The new file has no self-check step for the maintainer to confirm no TestRail-specific token survived the fork (tool names, 'section_id', 'C12345', 'custom_steps_separated'). Reason: A rebind that misses one TestRail-specific token leaves a broken or mis-targeted call in the forked skill; a quick self-check prevents it. Solution: Add a final 'grep for residual TestRail tokens (mcp_testrail_, section_id, custom_preconds, custom_steps_separated, C-prefix) and confirm none remain in the forked file' verification step. |
| 🔵 Medium | Failure Handling | Problem: The guide does not tell the maintainer what to do when a target vendor lacks a TestRail concept (e.g. no dedup list call, or no step/expected split) beyond the single note that some vendors store the test as one body — there is no general fallback rule for missing capabilities. Reason: Silently dropping a capability during a fork can remove a safety step (such as the dedup pre-scan) without the maintainer noticing. Solution: Add a brief fallback rule: when the target vendor lacks an equivalent for a step (dedup list, container auto-create, separated steps), document the gap explicitly in the forked SKILL.md and degrade safely (e.g. skip dedup pre-scan but keep the confirmation gate) rather than silently dropping the safety step. |
| ⚪ Low | Success Criteria | Problem: The new file lists items to rebind but gives no testable done-when criteria for a completed fork (e.g. 'every mcp_testrail_* call replaced', 'no TestRail-specific term remains', 'priority/type tables match vendor enum'). Reason: Without explicit done-when conditions the maintainer cannot confirm the fork is complete and may ship a skill with leftover TestRail bindings. Solution: Add a 'fork is complete when' list enumerating verifiable conditions: zero residual mcp_testrail_ references, container term replaced everywhere, ID-format check rebound, pitfalls rebound, user-prompt template re-branded. |
| ⚪ Low | Output Contract | Problem: The guide tells the maintainer to copy the file to '-test-case-export/SKILL.md' and 'edit only the items above' but does not define the expected end-state shape (which sections must change vs stay verbatim is described prose-style, with no concrete checklist or example of one rebound item). Reason: A porting guide with no concrete output exemplar leaves the rebind quality to interpretation, raising the chance of an incompletely-ported skill. Solution: Add a single concrete before/after example for one rebind item (e.g. the priority table for Xray) and a short closing checklist of the sections that must end up vendor-specific, so the output of a fork is verifiable. |
| ⚪ Low | Input Contract | Problem: The new reference is a maintainer-facing porting guide but states no input contract for the forking task it drives: it does not say which file the maintainer starts from (the sibling SKILL.md), what target-vendor facts must be gathered first (vendor MCP tool names, priority/type enums, container API capability), or what the maintainer must have on hand before editing. Reason: Without naming the required inputs, a maintainer can begin a fork missing the vendor facts the rebind steps depend on, producing a partially-rebound skill. Solution: Add a short 'before you start' list at the top naming the required inputs for a fork: source = sibling SKILL.md, plus the target vendor's MCP create/list/probe tool names, priority enum, type taxonomy, container auto-create capability, and case-ID shape. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/user-approved-code-changes/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/adhoc-flow.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-code-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-data-collection.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-requirements-clarification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-selector-identification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-selector-implementation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-test-correction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-test-implementation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Cognitive Budget | Problem: Step 6.1 (<execute_authoring>) packs sub-steps 1a-1d, 2, 3, 4, 5 plus a refusal clause and two embedded narrative paragraphs (Routing, authoring-decision-ownership) into one step. The single step carries the full handoff-contract reasoning, exceeding the ~5-actions-at-once reliable budget for one directive. Reason: A step with ~9 nested actions plus prose increases the chance the agent skips or merges sub-steps; smaller numbered steps execute more reliably. Solution: Split 6.1 into two steps: 6.1a 'load foundational + domain skills (1a-1d with zero-doc stop rule)' and 6.1b 'invoke handoff and verify (current 2-5)'. Move the Routing/ownership narrative into <workflow_context> so the step body is just numbered actions. |
| ⚪ Low | Bloat Control | Problem: The new <skill_handoff> block and step 6.1 sub-step 4 repeat the same handoff contract (foundational skills must be loaded by the caller, the handoff only verify-presence, acceptable vs unacceptable handoff doc) three times across <workflow_context>, <skill_handoff>, and <execute_authoring> step 4. The long prose sentences (e.g. step 4 is a single ~90-word sentence) restate the verify-presence/stale-KB contract already stated in <skill_handoff>. Reason: Repeating the same multi-clause contract in three places adds reading cost without adding behavior, and long single-sentence directives are harder to follow reliably than short numbered ones. Solution: Keep the contract once in <skill_handoff> and have step 6.1 sub-step 4 reference it in one short line (e.g. 'verify handoff doc matches <skill_handoff> acceptance criteria; on mismatch record warning and ask user'). Remove the duplicated acceptable/unacceptable restatement from the step body. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 3 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-test-report-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Cognitive Budget | Problem: The added <orchestration_and_escalation> 'Verification-failure unilateral-start override' is a single bullet with six deeply nested sub-bullets (Deference, Precondition a/b/c, If-holds, If-uncertain, Scope, Rationale) defining one conditional rule. This is the densest block in a workflow-level orchestration file that should stay high-level and delegate detail to phase files. Reason: A six-level nested single rule in the top-level workflow forces the orchestrator to hold a lot of conditional state at once, raising the chance it mis-applies the no-ask override outside the intended gate. Solution: Compress the override to a 3-line rule: trigger (all three preconditions), action (print failing line + start earliest incomplete phase same turn, no AskUserQuestion), and default (any uncertainty -> normal HITL ask). Move the rationale and scope-lock wording into one trailing sentence. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow-api-spec-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/workflows/qa-flow-data-collection.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/workflows/qa-flow-documentation-mcp-subflow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: The <execute_documentation_mcp> intro paragraph and the early-exit rule explain the cross-referencing strategy meta-narratively ('Branch triggers reference <output_contract> by name; the literal outcome line is inlined parenthetically at each trigger site so the agent does not have to cross-jump...'). This is authoring-rationale about how the doc is organized, not an executable directive, and the same outcome strings are then inlined at every branch AND listed in <output_contract>, duplicating each 'Outcome:' line twice. Reason: Explaining the document's own organization is non-operational filler, and duplicating every outcome string between branches and the table risks the two copies drifting out of sync on a later edit. Solution: Drop the meta-explanation paragraph and keep only the operational early-exit rule. Inline the outcome line at the branch OR list it in <output_contract>, not both; reference the table once from the branches. |
| ⚪ Low | Cognitive Budget | Problem: step 1.2b is realized through four interlocked sub-blocks (, <harvest_and_collect>, , <verify_remediation>) plus <output_contract>, with branch names (SKIPPED_NO_CONFIG, ACQUIRE_FAILED, EMPTY_HARVEST, COMPLETED) referenced across blocks and an early-exit jump rule. To execute one harvest the agent must hold five blocks and four branch identifiers in working memory simultaneously. Reason: Spreading one optional sub-phase across five mutually-referencing blocks increases the chance the agent loses track of which branch it took or skips the verify/remediation loop. Solution: Inline the verify_remediation cases into as numbered fallback steps and fold the early-exit outcome strings directly into each branch so the agent reads one linear block per branch without cross-jumping between five sections. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/workflows/qa-flow-execution-and-report-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/workflows/qa-flow-gap-and-requirements-clarification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/workflows/qa-flow-project-config-loading.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/workflows/qa-flow-test-case-specification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/workflows/qa-flow-test-correction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/workflows/qa-flow-test-implementation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/workflows/qa-flow.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/workflows/testgen-flow-data-collection.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Conflict Resolution | Problem: Modified by the PR, marked clean for HITL. BASE lines 372-373 had 'Ask: Ready to proceed to Phase 2?' + '3. Wait for confirmation'; NEW step 1.4 (lines 154-178, Ask at line 177) deleted the explicit Wait line with no STOP/WAIT directive. Same per-phase HITL-explicitness reduction the auditors caught on the r2 project-config-loading twin but missed here. Reason: Consistent per-phase HITL weakening across the testgen family; low severity because the parent flow gates the transition, but it should be flagged uniformly. Solution: Add an explicit STOP-and-wait directive after the 'Ready to proceed to Phase 2?' question in step 1.4, matching the deterministic phase-transition gate wording. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Single Responsibility | 5 | ⬆️ Slightly better |
| Conflict Resolution | 3 | ⬇️ Slightly worse |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-gap-and-contradiction-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Conflict Resolution | Problem: Modified by the PR, marked clean for HITL. BASE lines 361-362 had 'Ask: Ready to proceed to Phase 3?' + '4. Wait for confirmation'; NEW step 2.4 (lines 85-89, Ask at line 88) deleted the explicit Wait line with no STOP/WAIT directive. Same systematic per-phase HITL-explicitness deletion flagged on the r2 project-config twin but missed here. Reason: Part of the same family-wide pattern where every per-phase 'Wait for confirmation' line was removed; low severity given the parent-flow gate, but the loss is real and should be flagged consistently. Solution: Add an explicit STOP-and-wait directive after the 'Ready to proceed to Phase 3?' question in step 2.4, matching the other phase-transition gates. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 3 | ⬇️ Slightly worse |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/testgen-flow-project-config-loading.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Conflict Resolution | Problem: Modified by the PR but marked clean. BASE line 157 had '3. Wait for confirmation' after the 'Ready to proceed to Phase 1?' question; NEW step 0.6 (lines 95-99) keeps the Ask but deleted the explicit Wait line with no STOP/WAIT replacement. This is the IDENTICAL per-phase HITL-explicitness deletion the auditors flagged on the r2 twin (testgen-flow-project-config-loading.md, Conflict Resolution sev 2) but missed on r3. Reason: Asking without an explicit wait lets an agent ask and immediately proceed, weakening the per-phase HITL gate. The parent testgen-flow.md may still gate the transition, so severity is low, but the loss should be flagged consistently with its r2 twin. Solution: Add an explicit STOP-and-wait directive after the Phase-1 readiness question in step 0.6 (e.g. 'STOP and wait for user confirmation before advancing to Phase 1'), matching the r2 verdict and the deterministic gate wording used elsewhere. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 3 | ⬇️ Slightly worse |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/testgen-flow-question-generation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/testgen-flow-requirements-document-generation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/testgen-flow-test-case-export.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-test-case-generation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/testgen-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Safety Boundaries | Problem: New <orchestration_and_escalation> introduces a 'verification-failure unilateral-start override' that authorizes the agent to start a phase in the SAME TURN without asking the user (no AskUserQuestion). BASE had no such no-ask override. It expands the sanctioned no-ask surface relative to base. The same override was added to aqa-flow.md and r2 testgen-flow.md. Reason: Simulated agent triggers the override only when ALL three preconditions hold: user explicitly asserted phases complete in this turn, state file does NOT mark them complete, AND the named artifacts are absent on disk. In that exact case asking again would create a contradictory loop. The rule is scope-locked, defaults any uncertainty to ASK, and explicitly preserves Phase 3/6 HITL and all destructive confirmations. So this is a narrowly justified, defensible relaxation, not a broad HITL weakening. Residual concern is auditability and risk of an agent over-generalizing the carve-out, which the dense wording mitigates. Solution: Keep the carve-outs (they are strong). Optionally require the override decision to be logged into testgen-state.md (not just printed once) so the relaxation leaves an auditable trail, and re-confirm precedence under bootstrap-hitl-questioning. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 3 | ⬇️ Slightly worse |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📋 Prompt Quality Validation Report❌ Validation FailedSummary by File
📋 Full per-file findings (Problem / Reason / Solution + Gates Comparison) → Workflow run Summary (PR comments are capped at 65,536 chars; details live on the Actions run). |
📋 Prompt Quality Validation Report❌ Validation FailedSummary by File
📋 Full per-file findings (Problem / Reason / Solution + Gates Comparison) → Workflow run Summary (PR comments are capped at 65,536 chars; details live on the Actions run). |
📋 Prompt Quality Validation Report❌ Validation FailedSummary by File
📋 Full per-file findings (Problem / Reason / Solution + Gates Comparison) → Workflow run Summary (PR comments are capped at 65,536 chars; details live on the Actions run). |
📋 Prompt Quality Validation Report❌ Validation FailedSummary by File
📋 Full per-file findings (Problem / Reason / Solution + Gates Comparison) → Workflow run Summary (PR comments are capped at 65,536 chars; details live on the Actions run). |
📋 Prompt Quality Validation Report❌ Validation FailedSummary by File
📋 Full per-file findings (Problem / Reason / Solution + Gates Comparison) → Workflow run Summary (PR comments are capped at 65,536 chars; details live on the Actions run). |
📋 Prompt Quality Validation Report❌ Validation FailedSummary by File
📋 Full per-file findings (Problem / Reason / Solution + Gates Comparison) → Workflow run Summary (PR comments are capped at 65,536 chars; details live on the Actions run). |
📋 Prompt Quality Validation Report❌ Validation FailedSummary by File
📋 Full per-file findings (Problem / Reason / Solution + Gates Comparison) → Workflow run Summary (PR comments are capped at 65,536 chars; details live on the Actions run). |
QA, AQA, TestGen workflows transferred from CTO. "QA" is the new name of "API-QA".
10 skills transferred with them. They have passed the check on independence and reusability. The 4 least reusable skills are now provided with an instruction about how to reuse them (if there is need to reuse them once -- clone and edit; only if there is need to reuse them twice or more -- create a more general skill).
6 skills are extracted by me from QA, AQA, TestGen. They also have passed the check on independence and reusability.
Manual real-life test of the three workflows (with subsequent bug fix) is also performed.
plugin_generatorregenerates all 6 plugin trees cleanly.R3 changes were also propagated to R2 (please inform me if this is wrong).
99% of GitHub's suggestions are implemented. The remaining ones seem to be either out-of-scope for this PR -- or noise/fluctuation.
Some follow-up items are documented in
docs/TODO.md.Artifacts carried via rebase from v3 — not authored on this branch
plan-manageroccurrences tooperation-manager(the skill was renamed in v3).plugins/**/hooks.jsonfiles regenerated byplugin_generator.pyto reflect the hooks runtime + templates merged into main via the v3 release line. No changes to the hooks runtime (hooks/) or templates (*.tmpl) on this branch.gitnexus-cli,gitnexus-setup,gitnexus-toolsplusgn-examplesasset. No source changes on this branch — plugin-tree copies refreshed byplugin_generator.pypost-rebase.bootstrap-hitl-questioning.mdwas deleted in the v3 merge by another contributor. HITL enforcement is now in thehitlskill, referenced frombootstrap-guardrails.md. Discoverability verified. Six R2-phrasing gaps are tracked indocs/TODO.mdfor future review.Note on r2 vs r3 audit discrepancies
The audit reports more findings in r2 than r3 (~105 r2 files audited vs ~52 r3 files). Two factors drive the delta:
Scope. The r3 audit covers the qa/aqa/testgen surface this PR touched. The r2 audit additionally covers ~50 pre-existing legacy r2 files (
gitnexus-*,init-workspace-*, load-context, load-workflow, operation-manager, adhoc-flow, coding-flow, external-lib-flow, modernization-flow, research-flow, self-help-flow, etc.). These weren't touched by this PR or by main's recent commits; their findings are pre-existing and out of scope for this PR.Release-aware evaluation. ~14 of 15 same-named files are byte-identical between r2 and r3 (verified by diff), yet several evaluate stricter in r2 (e.g. automation-test-implementation-handoff, confluence-source-harvesting, adhoc-flow, qa-flow). The driver appears to be release-side context — different bootstrap rules, different load-context behavior, different surrounding skills — not file content. The single file that does differ between releases is load-context/SKILL.md.