From 354af832d9e90e4e05e00712d8bdbc45abd92af3 Mon Sep 17 00:00:00 2001 From: Alan Jowett Date: Wed, 3 Jun 2026 09:12:21 -0700 Subject: [PATCH 1/3] Add coverage gap audit workflow Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- manifest.yaml | 21 ++ protocols/reasoning/coverage-gap-analysis.md | 197 +++++++++++++++++++ templates/audit-coverage-gaps.md | 151 ++++++++++++++ 3 files changed, 369 insertions(+) create mode 100644 protocols/reasoning/coverage-gap-analysis.md create mode 100644 templates/audit-coverage-gaps.md diff --git a/manifest.yaml b/manifest.yaml index 5725b6d..43679cb 100644 --- a/manifest.yaml +++ b/manifest.yaml @@ -639,6 +639,14 @@ protocols: cross-model semantic matching, and classifies consensus levels to identify fragile prompt language. + - name: coverage-gap-analysis + path: protocols/reasoning/coverage-gap-analysis.md + description: > + Deterministic protocol for turning code coverage gaps into + specification drift candidates. Normalizes uncovered regions, + filters incidental code, and classifies missing validation + versus undocumented behavior. + formats: - name: requirements-doc path: formats/requirements-doc.md @@ -1137,6 +1145,19 @@ templates: format: investigation-report requires: [requirements-document, validation-plan] + - name: audit-coverage-gaps + path: templates/audit-coverage-gaps.md + description: > + Audit uncovered code regions against requirements, validation + artifacts, and tests. Uses coverage data as a deterministic + discovery signal for missing validation and undocumented + behavior. + persona: specification-analyst + protocols: [anti-hallucination, self-verification, operational-constraints, coverage-gap-analysis] + taxonomies: [specification-drift] + format: investigation-report + requires: [requirements-document, validation-plan] + - name: audit-integration-compliance path: templates/audit-integration-compliance.md description: > diff --git a/protocols/reasoning/coverage-gap-analysis.md b/protocols/reasoning/coverage-gap-analysis.md new file mode 100644 index 0000000..c4a5c10 --- /dev/null +++ b/protocols/reasoning/coverage-gap-analysis.md @@ -0,0 +1,197 @@ + + + +--- +name: coverage-gap-analysis +type: reasoning +description: > + Deterministic protocol for turning code coverage gaps into specification + drift candidates. Normalizes uncovered regions, filters incidental code, + traces remaining regions to requirements and validation artifacts, and + classifies missing validation versus undocumented behavior. +applicable_to: + - audit-coverage-gaps +--- + +# Protocol: Coverage Gap Analysis + +Apply this protocol when a coverage report is available and the goal is to +use uncovered code as a **discovery signal** for specification drift. + +Coverage gaps are **candidate generators, not findings by themselves**. +Covered code is **out of scope** for this protocol and MUST NOT be treated +as evidence that the behavior is specified or adequately validated. + +## Phase 1: Coverage Signal Inventory + +Build a reproducible inventory of coverage gaps before tracing them. + +1. **Identify the coverage artifact**: + - Record the coverage tool or report format if evident. + - Record what test scope produced it (unit, integration, mixed) if stated. + - Record any stated exclusions, filters, or generated-code suppressions. + +2. **Extract uncovered regions**: + - Capture every uncovered or partially covered region with file path, + line range, and coverage kind (`no hits`, `partial branch`, or + equivalent from the report). + - If the report only provides function-level or file-level data, keep + that granularity. Do NOT invent finer block boundaries. + +3. **Normalize regions into reviewable units**: + - Merge adjacent uncovered lines only when they are clearly part of the + same behavioral unit (same function, branch body, or error path). + - Preserve the original report evidence so the normalization can be + reproduced. + +4. **Create a candidate ledger**: + - Assign each normalized region a unique identifier (`CG-001`, `CG-002`, ...). + - For each entry record: file path, line range, enclosing symbol, + coverage kind, and the raw coverage evidence used to create it. + +## Phase 2: Disambiguation Before Drift Classification + +Do NOT classify uncovered regions until you determine whether they are +behaviorally significant. + +1. **Exclude clearly non-significant code**: + - Logging, metrics, debug strings, tracing hooks, boilerplate + serialization, generated code, trivial accessors, and test-only + scaffolding are excluded unless the specification explicitly + constrains them. + - Record excluded regions in the coverage summary with rationale. + Do NOT turn them into findings. + +2. **Check for inactive or intentionally unreachable paths**: + - Feature-flagged code, platform-gated branches, deprecated paths, + fault-injection hooks, and known-dead fallback branches may explain + missing coverage without implying drift. + - If the inactive status is evidenced, exclude the region with the + supporting rationale. + - If the status is plausible but not evidenced, mark the region as + **INCONCLUSIVE** and state what additional context is needed. + +3. **Determine behavioral significance**: + A region is significant when it affects one or more of: + - user-visible behavior + - data mutation or persistence + - access control or trust boundaries + - external communication or side effects + - state transitions + - error contracts, retry logic, or timeout behavior + - resource lifecycle or requirement-bound constraints + +4. **Only advance significant, in-scope regions**: + - Regions that are excluded or inconclusive stop here. + - Regions that are significant proceed to specification tracing. + +## Phase 3: Specification Trace for Significant Regions + +For each significant uncovered region, determine whether it traces to +documented intent. + +1. **Search requirements and design artifacts**: + - Look for explicit REQ-ID references, acceptance criteria, + domain terminology, and design mechanisms that match the region's behavior. + - If no design document is provided, skip design checks and trace + directly from requirements to code. + +2. **Record positive traceability**: + - When a region maps to one or more REQ-IDs, record the governing + requirement(s), acceptance criteria, and any relevant design sections. + +3. **Handle absent traceability carefully**: + - If the region implements genuine product behavior and no requirement + or design trace can be found, classify it as a candidate + **D9_UNDOCUMENTED_BEHAVIOR**. + - If the region appears to be reasonable infrastructure that supports + other requirements indirectly, record it as excluded rather than D9. + +4. **Handle ambiguous traceability**: + - If multiple REQ-IDs are plausible, carry all plausible mappings + forward and mark the finding confidence accordingly. + - Do NOT invent a new requirement to resolve the ambiguity. + +## Phase 4: Validation Trace for Requirement-Linked Regions + +For each significant uncovered region that traces to a requirement, +determine whether the uncovered status reflects missing validation, +missing tests, or weak assertions. + +1. **Check the validation plan**: + - Determine whether the linked REQ-ID has one or more TC-NNN entries + in the validation plan or traceability matrix. + - If no validation entry exists, classify the gap as + **D2_UNTESTED_REQUIREMENT** unless the plan explicitly marks the + requirement as manual-only or deferred. + +2. **Check test implementation**: + - If a TC-NNN exists, search the provided test code for the + implementing test. + - If no implementing test is found, classify the gap as + **D11_UNIMPLEMENTED_TEST_CASE**. + +3. **Check assertion sufficiency**: + - If tests exist, determine whether the uncovered region corresponds + to unexercised acceptance criteria, negative paths, boundary cases, + ordering constraints, or semantic assertions that the test does not verify. + - Missing required criterion exercise is + **D12_UNTESTED_ACCEPTANCE_CRITERION**. + - Incorrect or overly coarse assertions that leave the behavior + effectively unverified are **D13_ASSERTION_MISMATCH**. + +4. **Respect documented manual-only validation**: + - If the validation plan explicitly documents that the behavior is + validated manually or deferred outside the automated suite, + record that rationale and exclude the region from D11-D13 findings. + +5. **Handle insufficient evidence**: + - If the available test context is insufficient to distinguish D12 + from D13, mark the region **INCONCLUSIVE** and state the missing + evidence instead of guessing. + +## Phase 5: Classification and Escalation + +Turn only the confirmed regions into findings. + +1. **Assign exactly one classification** to each confirmed region: + - `D2_UNTESTED_REQUIREMENT` + - `D9_UNDOCUMENTED_BEHAVIOR` + - `D11_UNIMPLEMENTED_TEST_CASE` + - `D12_UNTESTED_ACCEPTANCE_CRITERION` + - `D13_ASSERTION_MISMATCH` + +2. **For each finding provide**: + - the coverage region location + - the specification location(s), or `None — no matching requirement identified` for D9 + - the validation and test location(s), or explicit absence + - the disambiguation rationale + - the impact of leaving the region uncovered + - a concrete recommended next action + +3. **Recommended escalation paths**: + - D9 findings that appear to describe real behavior with no governing + requirement are good candidates for + `requirements-from-implementation` or `spec-extraction-workflow`. + - D2, D11, D12, and D13 clusters that suggest broader validation drift + are good candidates for `audit-traceability` or + `audit-test-compliance`. + +4. **Do NOT promote excluded or inconclusive regions into findings**. + +## Phase 6: Coverage Summary + +After individual findings, produce aggregate metrics: + +1. **Coverage candidate count**: total normalized regions, excluded regions, + inconclusive regions, and classified findings. +2. **Traceability split**: requirement-linked vs unlinked significant regions. +3. **Finding distribution**: count by D2, D9, D11, D12, D13. +4. **Exclusion reasons**: grouped counts for generated code, + infrastructure-only code, manual-only validation, inactive paths, and + other documented exclusions. +5. **Overall assessment**: a short judgment of whether the dominant issue + appears to be missing validation, undocumented behavior, or mixed drift. +6. **Scope limitation**: explicitly state that this protocol examined + uncovered regions only and did not clear covered code for + specification or validation compliance. diff --git a/templates/audit-coverage-gaps.md b/templates/audit-coverage-gaps.md new file mode 100644 index 0000000..dbfb2cf --- /dev/null +++ b/templates/audit-coverage-gaps.md @@ -0,0 +1,151 @@ + + + +--- +name: audit-coverage-gaps +description: > + Audit uncovered code regions against requirements, validation artifacts, + and tests. Uses coverage data as a deterministic discovery signal for + missing validation and undocumented behavior. +persona: specification-analyst +protocols: + - guardrails/anti-hallucination + - guardrails/self-verification + - guardrails/operational-constraints + - reasoning/coverage-gap-analysis +taxonomies: + - specification-drift +format: investigation-report +params: + project_name: "Name of the project or feature being audited" + coverage_report: "Coverage artifact content or report excerpt showing uncovered or partially covered regions" + requirements_doc: "The requirements document content" + validation_plan: "The validation plan content" + design_doc: "The design document content (optional — omit for a requirements-only audit)" + code_context: "Source code to audit — files, modules, or repository path" + test_code: "Test source code to inspect for validation coverage" + coverage_scope: "Optional narrowing for the coverage signal — e.g., '0-hit regions only', 'include partial branches', 'coverage below 80%'" + focus_areas: "Optional narrowing — e.g., 'authentication module', 'retry paths' (default: audit all significant uncovered regions)" + audience: "Who will read the audit report — e.g., 'spec owners', 'engineering leads'" +input_contract: + type: validation-plan + description: > + A validation plan and requirements document, plus a coverage artifact, + source code, and test code used to triage uncovered regions against + specification intent. +output_contract: + type: investigation-report + description: > + An investigation report classifying coverage-driven drift findings + using the specification-drift taxonomy (D2, D9, D11, D12, D13), + with evidence, exclusions, and escalation guidance. +--- + +# Task: Audit Coverage Gaps + +You are tasked with auditing **uncovered code regions** against the +requirements, validation plan, and test suite to determine whether low +coverage signals missing validation or undocumented behavior. + +## Inputs + +**Project Name**: {{project_name}} + +**Coverage Report**: +{{coverage_report}} + +**Requirements Document**: +{{requirements_doc}} + +**Validation Plan**: +{{validation_plan}} + +**Design Document** (if provided): +{{design_doc}} + +**Source Code**: +{{code_context}} + +**Test Code**: +{{test_code}} + +**Coverage Scope**: {{coverage_scope}} + +**Focus Areas**: {{focus_areas}} + +**Audience**: {{audience}} + +## Instructions + +1. **Apply the coverage-gap-analysis protocol.** Execute all phases in + order. Treat the coverage report as a deterministic source of + **candidates**, not as direct proof of drift. + +2. **Classify only confirmed findings** using the specification-drift + taxonomy. Every reported finding MUST have exactly one of: + - `D2_UNTESTED_REQUIREMENT` + - `D9_UNDOCUMENTED_BEHAVIOR` + - `D11_UNIMPLEMENTED_TEST_CASE` + - `D12_UNTESTED_ACCEPTANCE_CRITERION` + - `D13_ASSERTION_MISMATCH` + + Excluded or inconclusive regions belong in coverage notes or open + questions, not in the findings list. + +3. **If the design document is not provided**, skip design-specific + tracing. Trace uncovered regions directly from requirements to code. + Do NOT fabricate design intent. + +4. **If coverage scope or focus areas are specified**, still build the + initial candidate ledger from the provided coverage artifact, but + restrict detailed tracing and classification to the narrowed scope. + Explicitly document which candidate regions were excluded by scope. + +5. **Apply the anti-hallucination protocol.** Every finding must cite: + - the coverage region location and raw coverage evidence + - the requirement or design location, or explicit absence for D9 + - the validation-plan location + - the test-code location, or explicit absence for D11 + + Do NOT invent requirements, tests, branch boundaries, or intended + behavior that are not evidenced in the provided artifacts. + +6. **Apply the operational-constraints protocol.** Do not attempt to + ingest the entire codebase or test suite blindly. Use the coverage + artifact to identify candidate regions first, then deep-read only the + code and tests needed to disambiguate those regions. + +7. **Format the output** according to the investigation-report format. + Map this task's work products as follows: + - Phase 1 candidate ledger and scoping method -> **Investigation Scope** + - Phase 2 disambiguation results -> **Investigation Scope** and + **Open Questions** for inconclusive regions + - Phases 3-5 classified regions -> **Findings**, one F-NNN per finding + - Phase 6 metrics -> **Executive Summary** and a coverage subsection + in **Root Cause Analysis** + - Escalation paths and next actions -> **Remediation Plan** + +8. **State the scope boundary explicitly** in the report: + - This audit examined uncovered or partially covered regions only. + - Covered code was not evaluated for specification alignment by this task. + +9. **Quality checklist** — before finalizing, verify: + - [ ] Every finding has exactly one drift label from D2, D9, D11, D12, D13 + - [ ] Every finding cites coverage evidence and concrete artifact locations + - [ ] Excluded regions are documented with rationale and are not reported as findings + - [ ] Inconclusive regions state what evidence is missing + - [ ] The report distinguishes missing validation from undocumented behavior + - [ ] The report states that covered code remains out of scope + - [ ] Coverage metrics are calculated from actual candidate counts + - [ ] Escalation recommendations are concrete and aligned to the finding type + +## Non-Goals + +- Do NOT treat uncovered code as automatically buggy or drifted. +- Do NOT clear covered code as specified, correct, or adequately validated. +- Do NOT execute the code or run the coverage tool — this task analyzes + the provided coverage artifact and related source material. +- Do NOT rewrite requirements, tests, or code — report findings and + recommended next actions only. +- Do NOT expand into a full repository maintenance audit unless the + findings explicitly warrant escalation. From 9ab8885be844419215a7b8d904315069de612f52 Mon Sep 17 00:00:00 2001 From: Alan Jowett Date: Wed, 3 Jun 2026 09:33:06 -0700 Subject: [PATCH 2/3] Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --- templates/audit-coverage-gaps.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/templates/audit-coverage-gaps.md b/templates/audit-coverage-gaps.md index dbfb2cf..f8505c9 100644 --- a/templates/audit-coverage-gaps.md +++ b/templates/audit-coverage-gaps.md @@ -104,7 +104,7 @@ coverage signals missing validation or undocumented behavior. 5. **Apply the anti-hallucination protocol.** Every finding must cite: - the coverage region location and raw coverage evidence - the requirement or design location, or explicit absence for D9 - - the validation-plan location + - the validation-plan location, or explicit absence for D2 - the test-code location, or explicit absence for D11 Do NOT invent requirements, tests, branch boundaries, or intended From 983b2d5aa22f4ab54f7d0e7684e80aa0d135a0e6 Mon Sep 17 00:00:00 2001 From: Alan Jowett Date: Wed, 3 Jun 2026 11:22:00 -0700 Subject: [PATCH 3/3] Clarify coverage gap audit findings Address PR feedback by clarifying the allowed drift labels and their scope in the coverage gap audit workflow. Make excluded and inconclusive regions map cleanly to the investigation report structure, add synchronization contracts to behavioral significance, normalize next-action wording, and state the one-region-one-label rule for confirmed findings. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- protocols/reasoning/coverage-gap-analysis.md | 11 +++++++++-- templates/audit-coverage-gaps.md | 10 ++++++++-- 2 files changed, 17 insertions(+), 4 deletions(-) diff --git a/protocols/reasoning/coverage-gap-analysis.md b/protocols/reasoning/coverage-gap-analysis.md index c4a5c10..5be8429 100644 --- a/protocols/reasoning/coverage-gap-analysis.md +++ b/protocols/reasoning/coverage-gap-analysis.md @@ -80,6 +80,7 @@ behaviorally significant. - state transitions - error contracts, retry logic, or timeout behavior - resource lifecycle or requirement-bound constraints + - synchronization or shared resource access enforcement contracts 4. **Only advance significant, in-scope regions**: - Regions that are excluded or inconclusive stop here. @@ -154,13 +155,19 @@ missing tests, or weak assertions. Turn only the confirmed regions into findings. -1. **Assign exactly one classification** to each confirmed region: +1. **Assign exactly one classification from the specification-drift + taxonomy** to each confirmed region: - `D2_UNTESTED_REQUIREMENT` - `D9_UNDOCUMENTED_BEHAVIOR` - `D11_UNIMPLEMENTED_TEST_CASE` - `D12_UNTESTED_ACCEPTANCE_CRITERION` - `D13_ASSERTION_MISMATCH` + If one source location appears to support multiple labels, split it + into separate normalized candidate regions only when the evidence + supports distinct behavioral units. Do NOT stack multiple drift + labels onto one confirmed region. + 2. **For each finding provide**: - the coverage region location - the specification location(s), or `None — no matching requirement identified` for D9 @@ -169,7 +176,7 @@ Turn only the confirmed regions into findings. - the impact of leaving the region uncovered - a concrete recommended next action -3. **Recommended escalation paths**: +3. **Recommended next actions**: - D9 findings that appear to describe real behavior with no governing requirement are good candidates for `requirements-from-implementation` or `spec-extraction-workflow`. diff --git a/templates/audit-coverage-gaps.md b/templates/audit-coverage-gaps.md index f8505c9..2751b12 100644 --- a/templates/audit-coverage-gaps.md +++ b/templates/audit-coverage-gaps.md @@ -89,8 +89,13 @@ coverage signals missing validation or undocumented behavior. - `D12_UNTESTED_ACCEPTANCE_CRITERION` - `D13_ASSERTION_MISMATCH` - Excluded or inconclusive regions belong in coverage notes or open - questions, not in the findings list. + `D8_UNIMPLEMENTED_REQUIREMENT` is intentionally out of scope for this + workflow: this audit starts from uncovered implemented regions in a + coverage artifact, so requirements with no implementation at all are + better handled by `audit-code-compliance`. + + Excluded regions belong in **Investigation Scope** and inconclusive + regions belong in **Open Questions**, not in the findings list. 3. **If the design document is not provided**, skip design-specific tracing. Trace uncovered regions directly from requirements to code. @@ -131,6 +136,7 @@ coverage signals missing validation or undocumented behavior. 9. **Quality checklist** — before finalizing, verify: - [ ] Every finding has exactly one drift label from D2, D9, D11, D12, D13 + - [ ] Each normalized candidate region maps to at most one finding; split distinct behavioral units instead of stacking labels - [ ] Every finding cites coverage evidence and concrete artifact locations - [ ] Excluded regions are documented with rationale and are not reported as findings - [ ] Inconclusive regions state what evidence is missing