-
Notifications
You must be signed in to change notification settings - Fork 12
Add coverage gap audit workflow #258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Alan-Jowett
merged 3 commits into
microsoft:main
from
Alan-Jowett:add-coverage-gap-audit
Jun 3, 2026
+382
−0
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,204 @@ | ||
| <!-- SPDX-License-Identifier: MIT --> | ||
| <!-- Copyright (c) PromptKit Contributors --> | ||
|
|
||
| --- | ||
| name: coverage-gap-analysis | ||
| type: reasoning | ||
| description: > | ||
| Deterministic protocol for turning code coverage gaps into specification | ||
| drift candidates. Normalizes uncovered regions, filters incidental code, | ||
| traces remaining regions to requirements and validation artifacts, and | ||
| classifies missing validation versus undocumented behavior. | ||
| applicable_to: | ||
| - audit-coverage-gaps | ||
| --- | ||
|
|
||
| # Protocol: Coverage Gap Analysis | ||
|
|
||
| Apply this protocol when a coverage report is available and the goal is to | ||
| use uncovered code as a **discovery signal** for specification drift. | ||
|
|
||
| Coverage gaps are **candidate generators, not findings by themselves**. | ||
| Covered code is **out of scope** for this protocol and MUST NOT be treated | ||
| as evidence that the behavior is specified or adequately validated. | ||
|
|
||
| ## Phase 1: Coverage Signal Inventory | ||
|
|
||
| Build a reproducible inventory of coverage gaps before tracing them. | ||
|
|
||
| 1. **Identify the coverage artifact**: | ||
| - Record the coverage tool or report format if evident. | ||
| - Record what test scope produced it (unit, integration, mixed) if stated. | ||
| - Record any stated exclusions, filters, or generated-code suppressions. | ||
|
|
||
| 2. **Extract uncovered regions**: | ||
| - Capture every uncovered or partially covered region with file path, | ||
| line range, and coverage kind (`no hits`, `partial branch`, or | ||
| equivalent from the report). | ||
| - If the report only provides function-level or file-level data, keep | ||
| that granularity. Do NOT invent finer block boundaries. | ||
|
|
||
| 3. **Normalize regions into reviewable units**: | ||
| - Merge adjacent uncovered lines only when they are clearly part of the | ||
| same behavioral unit (same function, branch body, or error path). | ||
| - Preserve the original report evidence so the normalization can be | ||
| reproduced. | ||
|
|
||
| 4. **Create a candidate ledger**: | ||
| - Assign each normalized region a unique identifier (`CG-001`, `CG-002`, ...). | ||
| - For each entry record: file path, line range, enclosing symbol, | ||
| coverage kind, and the raw coverage evidence used to create it. | ||
|
|
||
| ## Phase 2: Disambiguation Before Drift Classification | ||
|
|
||
| Do NOT classify uncovered regions until you determine whether they are | ||
| behaviorally significant. | ||
|
|
||
| 1. **Exclude clearly non-significant code**: | ||
| - Logging, metrics, debug strings, tracing hooks, boilerplate | ||
| serialization, generated code, trivial accessors, and test-only | ||
| scaffolding are excluded unless the specification explicitly | ||
| constrains them. | ||
| - Record excluded regions in the coverage summary with rationale. | ||
| Do NOT turn them into findings. | ||
|
|
||
| 2. **Check for inactive or intentionally unreachable paths**: | ||
| - Feature-flagged code, platform-gated branches, deprecated paths, | ||
| fault-injection hooks, and known-dead fallback branches may explain | ||
| missing coverage without implying drift. | ||
| - If the inactive status is evidenced, exclude the region with the | ||
| supporting rationale. | ||
| - If the status is plausible but not evidenced, mark the region as | ||
| **INCONCLUSIVE** and state what additional context is needed. | ||
|
|
||
| 3. **Determine behavioral significance**: | ||
| A region is significant when it affects one or more of: | ||
| - user-visible behavior | ||
| - data mutation or persistence | ||
| - access control or trust boundaries | ||
| - external communication or side effects | ||
| - state transitions | ||
| - error contracts, retry logic, or timeout behavior | ||
| - resource lifecycle or requirement-bound constraints | ||
| - synchronization or shared resource access enforcement contracts | ||
|
|
||
| 4. **Only advance significant, in-scope regions**: | ||
| - Regions that are excluded or inconclusive stop here. | ||
| - Regions that are significant proceed to specification tracing. | ||
|
|
||
| ## Phase 3: Specification Trace for Significant Regions | ||
|
|
||
| For each significant uncovered region, determine whether it traces to | ||
| documented intent. | ||
|
|
||
| 1. **Search requirements and design artifacts**: | ||
| - Look for explicit REQ-ID references, acceptance criteria, | ||
| domain terminology, and design mechanisms that match the region's behavior. | ||
| - If no design document is provided, skip design checks and trace | ||
| directly from requirements to code. | ||
|
|
||
| 2. **Record positive traceability**: | ||
| - When a region maps to one or more REQ-IDs, record the governing | ||
| requirement(s), acceptance criteria, and any relevant design sections. | ||
|
|
||
| 3. **Handle absent traceability carefully**: | ||
| - If the region implements genuine product behavior and no requirement | ||
| or design trace can be found, classify it as a candidate | ||
| **D9_UNDOCUMENTED_BEHAVIOR**. | ||
| - If the region appears to be reasonable infrastructure that supports | ||
| other requirements indirectly, record it as excluded rather than D9. | ||
|
|
||
| 4. **Handle ambiguous traceability**: | ||
| - If multiple REQ-IDs are plausible, carry all plausible mappings | ||
| forward and mark the finding confidence accordingly. | ||
| - Do NOT invent a new requirement to resolve the ambiguity. | ||
|
|
||
| ## Phase 4: Validation Trace for Requirement-Linked Regions | ||
|
|
||
| For each significant uncovered region that traces to a requirement, | ||
| determine whether the uncovered status reflects missing validation, | ||
| missing tests, or weak assertions. | ||
|
|
||
| 1. **Check the validation plan**: | ||
| - Determine whether the linked REQ-ID has one or more TC-NNN entries | ||
| in the validation plan or traceability matrix. | ||
| - If no validation entry exists, classify the gap as | ||
| **D2_UNTESTED_REQUIREMENT** unless the plan explicitly marks the | ||
| requirement as manual-only or deferred. | ||
|
|
||
| 2. **Check test implementation**: | ||
| - If a TC-NNN exists, search the provided test code for the | ||
| implementing test. | ||
| - If no implementing test is found, classify the gap as | ||
| **D11_UNIMPLEMENTED_TEST_CASE**. | ||
|
|
||
| 3. **Check assertion sufficiency**: | ||
| - If tests exist, determine whether the uncovered region corresponds | ||
| to unexercised acceptance criteria, negative paths, boundary cases, | ||
| ordering constraints, or semantic assertions that the test does not verify. | ||
| - Missing required criterion exercise is | ||
| **D12_UNTESTED_ACCEPTANCE_CRITERION**. | ||
| - Incorrect or overly coarse assertions that leave the behavior | ||
| effectively unverified are **D13_ASSERTION_MISMATCH**. | ||
|
|
||
| 4. **Respect documented manual-only validation**: | ||
| - If the validation plan explicitly documents that the behavior is | ||
| validated manually or deferred outside the automated suite, | ||
| record that rationale and exclude the region from D11-D13 findings. | ||
|
|
||
| 5. **Handle insufficient evidence**: | ||
| - If the available test context is insufficient to distinguish D12 | ||
| from D13, mark the region **INCONCLUSIVE** and state the missing | ||
| evidence instead of guessing. | ||
|
|
||
| ## Phase 5: Classification and Escalation | ||
|
|
||
| Turn only the confirmed regions into findings. | ||
|
|
||
| 1. **Assign exactly one classification from the specification-drift | ||
| taxonomy** to each confirmed region: | ||
| - `D2_UNTESTED_REQUIREMENT` | ||
| - `D9_UNDOCUMENTED_BEHAVIOR` | ||
| - `D11_UNIMPLEMENTED_TEST_CASE` | ||
| - `D12_UNTESTED_ACCEPTANCE_CRITERION` | ||
| - `D13_ASSERTION_MISMATCH` | ||
|
Alan-Jowett marked this conversation as resolved.
|
||
|
|
||
| If one source location appears to support multiple labels, split it | ||
| into separate normalized candidate regions only when the evidence | ||
| supports distinct behavioral units. Do NOT stack multiple drift | ||
| labels onto one confirmed region. | ||
|
|
||
| 2. **For each finding provide**: | ||
| - the coverage region location | ||
| - the specification location(s), or `None — no matching requirement identified` for D9 | ||
| - the validation and test location(s), or explicit absence | ||
| - the disambiguation rationale | ||
| - the impact of leaving the region uncovered | ||
| - a concrete recommended next action | ||
|
|
||
| 3. **Recommended next actions**: | ||
| - D9 findings that appear to describe real behavior with no governing | ||
| requirement are good candidates for | ||
| `requirements-from-implementation` or `spec-extraction-workflow`. | ||
| - D2, D11, D12, and D13 clusters that suggest broader validation drift | ||
| are good candidates for `audit-traceability` or | ||
| `audit-test-compliance`. | ||
|
|
||
| 4. **Do NOT promote excluded or inconclusive regions into findings**. | ||
|
|
||
| ## Phase 6: Coverage Summary | ||
|
|
||
| After individual findings, produce aggregate metrics: | ||
|
|
||
| 1. **Coverage candidate count**: total normalized regions, excluded regions, | ||
| inconclusive regions, and classified findings. | ||
| 2. **Traceability split**: requirement-linked vs unlinked significant regions. | ||
| 3. **Finding distribution**: count by D2, D9, D11, D12, D13. | ||
| 4. **Exclusion reasons**: grouped counts for generated code, | ||
| infrastructure-only code, manual-only validation, inactive paths, and | ||
| other documented exclusions. | ||
| 5. **Overall assessment**: a short judgment of whether the dominant issue | ||
| appears to be missing validation, undocumented behavior, or mixed drift. | ||
| 6. **Scope limitation**: explicitly state that this protocol examined | ||
| uncovered regions only and did not clear covered code for | ||
| specification or validation compliance. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,157 @@ | ||
| <!-- SPDX-License-Identifier: MIT --> | ||
| <!-- Copyright (c) PromptKit Contributors --> | ||
|
|
||
| --- | ||
| name: audit-coverage-gaps | ||
| description: > | ||
| Audit uncovered code regions against requirements, validation artifacts, | ||
| and tests. Uses coverage data as a deterministic discovery signal for | ||
| missing validation and undocumented behavior. | ||
| persona: specification-analyst | ||
| protocols: | ||
| - guardrails/anti-hallucination | ||
| - guardrails/self-verification | ||
| - guardrails/operational-constraints | ||
| - reasoning/coverage-gap-analysis | ||
| taxonomies: | ||
| - specification-drift | ||
| format: investigation-report | ||
| params: | ||
| project_name: "Name of the project or feature being audited" | ||
| coverage_report: "Coverage artifact content or report excerpt showing uncovered or partially covered regions" | ||
| requirements_doc: "The requirements document content" | ||
| validation_plan: "The validation plan content" | ||
| design_doc: "The design document content (optional — omit for a requirements-only audit)" | ||
| code_context: "Source code to audit — files, modules, or repository path" | ||
| test_code: "Test source code to inspect for validation coverage" | ||
| coverage_scope: "Optional narrowing for the coverage signal — e.g., '0-hit regions only', 'include partial branches', 'coverage below 80%'" | ||
| focus_areas: "Optional narrowing — e.g., 'authentication module', 'retry paths' (default: audit all significant uncovered regions)" | ||
| audience: "Who will read the audit report — e.g., 'spec owners', 'engineering leads'" | ||
| input_contract: | ||
| type: validation-plan | ||
| description: > | ||
| A validation plan and requirements document, plus a coverage artifact, | ||
| source code, and test code used to triage uncovered regions against | ||
| specification intent. | ||
| output_contract: | ||
| type: investigation-report | ||
| description: > | ||
| An investigation report classifying coverage-driven drift findings | ||
| using the specification-drift taxonomy (D2, D9, D11, D12, D13), | ||
| with evidence, exclusions, and escalation guidance. | ||
| --- | ||
|
|
||
| # Task: Audit Coverage Gaps | ||
|
|
||
| You are tasked with auditing **uncovered code regions** against the | ||
| requirements, validation plan, and test suite to determine whether low | ||
| coverage signals missing validation or undocumented behavior. | ||
|
|
||
| ## Inputs | ||
|
|
||
| **Project Name**: {{project_name}} | ||
|
|
||
| **Coverage Report**: | ||
| {{coverage_report}} | ||
|
|
||
| **Requirements Document**: | ||
| {{requirements_doc}} | ||
|
|
||
| **Validation Plan**: | ||
| {{validation_plan}} | ||
|
|
||
| **Design Document** (if provided): | ||
| {{design_doc}} | ||
|
|
||
| **Source Code**: | ||
| {{code_context}} | ||
|
|
||
| **Test Code**: | ||
| {{test_code}} | ||
|
|
||
| **Coverage Scope**: {{coverage_scope}} | ||
|
|
||
| **Focus Areas**: {{focus_areas}} | ||
|
|
||
| **Audience**: {{audience}} | ||
|
|
||
| ## Instructions | ||
|
|
||
| 1. **Apply the coverage-gap-analysis protocol.** Execute all phases in | ||
| order. Treat the coverage report as a deterministic source of | ||
| **candidates**, not as direct proof of drift. | ||
|
|
||
| 2. **Classify only confirmed findings** using the specification-drift | ||
| taxonomy. Every reported finding MUST have exactly one of: | ||
| - `D2_UNTESTED_REQUIREMENT` | ||
| - `D9_UNDOCUMENTED_BEHAVIOR` | ||
| - `D11_UNIMPLEMENTED_TEST_CASE` | ||
| - `D12_UNTESTED_ACCEPTANCE_CRITERION` | ||
| - `D13_ASSERTION_MISMATCH` | ||
|
Alan-Jowett marked this conversation as resolved.
|
||
|
|
||
| `D8_UNIMPLEMENTED_REQUIREMENT` is intentionally out of scope for this | ||
| workflow: this audit starts from uncovered implemented regions in a | ||
| coverage artifact, so requirements with no implementation at all are | ||
| better handled by `audit-code-compliance`. | ||
|
|
||
| Excluded regions belong in **Investigation Scope** and inconclusive | ||
| regions belong in **Open Questions**, not in the findings list. | ||
|
|
||
| 3. **If the design document is not provided**, skip design-specific | ||
| tracing. Trace uncovered regions directly from requirements to code. | ||
| Do NOT fabricate design intent. | ||
|
|
||
| 4. **If coverage scope or focus areas are specified**, still build the | ||
| initial candidate ledger from the provided coverage artifact, but | ||
| restrict detailed tracing and classification to the narrowed scope. | ||
| Explicitly document which candidate regions were excluded by scope. | ||
|
|
||
| 5. **Apply the anti-hallucination protocol.** Every finding must cite: | ||
| - the coverage region location and raw coverage evidence | ||
| - the requirement or design location, or explicit absence for D9 | ||
| - the validation-plan location, or explicit absence for D2 | ||
| - the test-code location, or explicit absence for D11 | ||
|
Alan-Jowett marked this conversation as resolved.
|
||
|
|
||
| Do NOT invent requirements, tests, branch boundaries, or intended | ||
| behavior that are not evidenced in the provided artifacts. | ||
|
|
||
| 6. **Apply the operational-constraints protocol.** Do not attempt to | ||
| ingest the entire codebase or test suite blindly. Use the coverage | ||
| artifact to identify candidate regions first, then deep-read only the | ||
| code and tests needed to disambiguate those regions. | ||
|
|
||
| 7. **Format the output** according to the investigation-report format. | ||
| Map this task's work products as follows: | ||
| - Phase 1 candidate ledger and scoping method -> **Investigation Scope** | ||
| - Phase 2 disambiguation results -> **Investigation Scope** and | ||
| **Open Questions** for inconclusive regions | ||
| - Phases 3-5 classified regions -> **Findings**, one F-NNN per finding | ||
| - Phase 6 metrics -> **Executive Summary** and a coverage subsection | ||
| in **Root Cause Analysis** | ||
| - Escalation paths and next actions -> **Remediation Plan** | ||
|
|
||
| 8. **State the scope boundary explicitly** in the report: | ||
| - This audit examined uncovered or partially covered regions only. | ||
| - Covered code was not evaluated for specification alignment by this task. | ||
|
|
||
| 9. **Quality checklist** — before finalizing, verify: | ||
| - [ ] Every finding has exactly one drift label from D2, D9, D11, D12, D13 | ||
|
Alan-Jowett marked this conversation as resolved.
|
||
| - [ ] Each normalized candidate region maps to at most one finding; split distinct behavioral units instead of stacking labels | ||
| - [ ] Every finding cites coverage evidence and concrete artifact locations | ||
| - [ ] Excluded regions are documented with rationale and are not reported as findings | ||
| - [ ] Inconclusive regions state what evidence is missing | ||
| - [ ] The report distinguishes missing validation from undocumented behavior | ||
| - [ ] The report states that covered code remains out of scope | ||
| - [ ] Coverage metrics are calculated from actual candidate counts | ||
| - [ ] Escalation recommendations are concrete and aligned to the finding type | ||
|
|
||
| ## Non-Goals | ||
|
|
||
| - Do NOT treat uncovered code as automatically buggy or drifted. | ||
| - Do NOT clear covered code as specified, correct, or adequately validated. | ||
| - Do NOT execute the code or run the coverage tool — this task analyzes | ||
| the provided coverage artifact and related source material. | ||
| - Do NOT rewrite requirements, tests, or code — report findings and | ||
| recommended next actions only. | ||
| - Do NOT expand into a full repository maintenance audit unless the | ||
| findings explicitly warrant escalation. | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.