nullhack · nullhack · May 21, 2026 · May 21, 2026 · May 21, 2026
diff --git a/.opencode/knowledge/requirements/gherkin.md b/.opencode/knowledge/requirements/gherkin.md
@@ -195,5 +195,5 @@ Test path conventions (`tests/features/<feature_slug>/`), the feature-test vs un
 - [[requirements/decomposition]]: splitting Rules with too many Examples
 - [[requirements/pre-mortem]]: finding hidden failure modes in rules
 - [[software-craft/test-design]]: property-based testing for invariant rules
-- [[software-craft/test-stubs]]: how beehave generates test stubs from feature files
+- [[software-craft/test-stubs]]: how beehave generates test stubs from feature files, development stage tracking via `beehave status`
 - [[software-craft/external-fixtures]]: real data fixtures for external adapter mocking
diff --git a/.opencode/knowledge/software-craft/source-stubs.md b/.opencode/knowledge/software-craft/source-stubs.md
@@ -13,7 +13,7 @@ last-updated: 2026-05-08
 - Source stubs contain the absolute minimum to compile and trace: Protocol signatures with `raise NotImplementedError` bodies, no docstrings, no type hints beyond the contract.
 - Package structure mirrors the module structure from technical design; the domain package depends on nothing.
 - Feature branches are created from the latest main.
-- Create artifacts in this order: branch, directories, port interfaces, Protocol stubs, run beehave generate to create test stubs, run beehave check.
+- Create artifacts in this order: branch, directories, port interfaces, Protocol stubs, run beehave generate to create test stubs, verify with `beehave check` and `beehave status` for development stage confirmation per [[software-craft/test-stubs#concepts]].
 
 ## Concepts
 

diff --git a/.opencode/knowledge/software-craft/test-stubs.md b/.opencode/knowledge/software-craft/test-stubs.md
@@ -1,74 +1,73 @@
 ---
 domain: software-craft
 tags: [test-stubs, traceability, pytest-beehave, scenario-outline, hypothesis]
-last-updated: 2026-05-19
+last-updated: 2026-05-20
 ---
 
 # Test Stubs
 
 ## Key Takeaways
 
-- Test stubs are auto-generated by `beehave generate <feature_id>` from the feature file; no manual stub creation is needed.
-- pytest-beehave uses title-based mapping: each Example title becomes a test function named `test_<example_title_slug>` (e.g., `test_VAT_is_applied_at_the_correct_rate`).
-- Structural traceability (every Example has a test, no orphan tests, placeholders present, literals present) is verified by `beehave check`, not manually.
-- Scenario Outline produces parameterized stubs with Hypothesis `@given` decorators (inferred strategies) and `@example` decorators (one per Examples table row). Plain Examples produce bare function stubs.
-- `beehave check` verifies literal values from steps appear in test function bodies — tests must use the exact quoted strings and numbers from the spec.
+- Test stubs are auto-generated by `beehave generate <feature_id>` from the feature file; no manual stub creation is needed. pytest-beehave uses title-based mapping: each Example title becomes a test function named `test_<example_title_slug>`.
+- `beehave check` verifies structural traceability (every Example has a test, no orphan tests, placeholders present, literals present). Scenario Outline produces parameterized stubs with Hypothesis `@given` decorators (inferred strategies) and `@example` decorators (one per Examples table row).
+- Literals from Given/When/Then steps (quoted strings, bare numbers) must appear verbatim in test function bodies — `beehave check` enforces this. Stubs (functions with `...` body) are exempt from literal and placeholder checks.
+- `beehave status` reports development stage per feature across 6 stages (ok, broken, needs scenarios, needs tests, needs bodies, needs fixes) with tree or `--json` output. Use `beehave status --json` for project-wide overview and `beehave status` for per-feature tree view including inline violation codes.
+- `beehave list` lists feature slugs and titles for features with Examples; `list -v` adds path, scenario count, stub/impl counts. `beehave clean <feature> --force` removes unmapped test functions from feature-paired test directories.
+- Feature file stem MUST match the Feature title slug (e.g., Feature "CLI Entrypoint" → `cli_entrypoint.feature`). Title violations anywhere in the project block `beehave generate` (pre-flight validates all titles project-wide). `beehave check <single-feature>` skips global title validation — only `beehave check` (no argument) runs `validate_all_titles`.
 
 ## Concepts
 
-**Title-Based Mapping**. pytest-beehave maps each Example or Scenario Outline in the feature file to a test function by title. The function name is derived from the Example or Scenario Outline title as a slug (e.g., Example: "VAT is applied at the correct rate" → `test_vat_is_applied_at_the_correct_rate`). This replaces the previous `@id` tag system. Titles must be unique within a feature file and 2–6 words per [[requirements/gherkin#concepts]].
+**Title-Based Mapping and Auto-Generated Stubs**. pytest-beehave maps each Example or Scenario Outline in the feature file to a test function by title. The function name is derived from the Example or Scenario Outline title as a slug (e.g., Example: "VAT is applied at the correct rate" → `test_vat_is_applied_at_the_correct_rate`). Titles must be unique within a feature file and 2–6 words per [[requirements/gherkin#concepts]]. When `beehave generate <feature_id>` runs, pytest-beehave reads the feature files and creates test stubs automatically. Each stub has an `...` (Ellipsis) body. During pytest collection, pytest-beehave auto-skips any test function with an `...` body — no `@pytest.mark.skip` decorator is needed. The SE replaces the `...` body with the test implementation during the RED phase.
 
-**Auto-Generated Stubs**. When `beehave generate <feature_id>` runs, pytest-beehave reads the feature files and creates test stubs automatically. Each stub has an `...` (Ellipsis) body. During pytest collection, pytest-beehave auto-skips any test function with an `...` body — no `@pytest.mark.skip` decorator is needed. The SE replaces the `...` body with the test implementation during the RED phase.
+**Traceability Verification and Scenario Outline Stubs**. `beehave check` enforces structural traceability with 6 violation types: `unmapped-scenario` (Example with no test), `unmapped-test` (test with no Example), `misplaced-test` (test in wrong file), `missing-placeholder` (placeholder not in test body), `missing-literal` (literal not in test body), `example-mismatch` (Examples table row lacking `@example()` decorator). Scenario Outline stubs include `@given(placeholder_name=strategy)` decorators with inferred Hypothesis strategies plus `@example()` for each Examples table row. Strategy is inferred from column values: all integers → `st.integers()`, all floats → `st.floats()`, all booleans → `st.booleans()`, otherwise `st.text()`. Override by defining a strategy variable in the test file.
 
-**Scenario Outline Stubs**. When a feature uses `Scenario Outline:` with `<placeholder>` syntax, the generated stub includes:
-- `@given(placeholder_name=st.text())` (or inferred strategy) for each placeholder
-- `@example(col1="val1", col2="val2")` for each row in the Examples table
-- Function parameters matching placeholder names
+**Literal and Placeholder Verification**. `beehave check` extracts quoted strings (`"value"`) and bare numbers (`42`, `-3`) from Given/When/Then steps and verifies they appear in the test function body. Per Spec Value Fidelity ([[software-craft/test-design#concepts]]), every literal and placeholder must carry domain meaning in the test — identifiers identify entities, boundaries bound, configurations configure. Never satisfy traceability with noise: assigning to `_`, stuffing strings into assert messages, or helper functions whose sole purpose is consuming a literal. Placeholder names become Python function parameters and must be valid Python identifiers (not keywords, not builtins like `sum`, `list`).
 
-Example stub generated from a Scenario Outline:
+**Development Stage Tracking with beehave status**. `beehave status` computes a development stage for each feature file. JSON output (`--json`) provides per-feature stages, per-scenario status with violation types, summary counts, `unmapped_directories`, and `collisions`. Tree output shows Rule → Scenario hierarchy with inline violation codes. `--include-unmapped` finds orphan test directories. A feature is `ok` even with mixed stubs and implementations — `needs bodies` fires only when ALL scenarios are stubs.
 
-```python
-from hypothesis import example, given, strategies as st
+**Project Overview with beehave list and Cleanup with beehave clean**. `beehave list` shows feature slugs and titles for features with at least one Example. `beehave list -v` adds: path, scenario count (total + top-level vs rule breakdown), stub/impl counts (e.g., "stubs: 1/2 (1 implemented)"). `beehave clean <feature> [--force]` removes unmapped test functions from that feature's test directory and reports what was removed.
 
-@given(qty=st.integers())
-@example(qty=1)
-@example(qty=5)
-@example(qty=100)
-def test_quantity_rejected_when_negative(qty):
-    ...
-```
+**Feature File Stem and Title Consistency**. The feature filename stem MUST match the Feature title slug. Feature "CLI Entrypoint" (slug `cli_entrypoint`) → file `cli_entrypoint.feature` → test directory `tests/features/cli_entrypoint/`. Mismatch causes unmapped directories and test mapping failures. `beehave generate` runs a project-wide `validate_all_titles` pre-flight that blocks all generation if ANY feature has bad titles. `beehave check <feature>` skips global title validation — only `beehave check` with no argument runs `validate_all_titles`. When a Gherkin parse error exists, `validate_all_titles` raises an exception rather than handling gracefully.
 
-The Hypothesis strategy is inferred from Examples table column values: all integers → `st.integers()`, all floats → `st.floats()`, all booleans → `st.booleans()`, otherwise `st.text()`. To use a custom strategy, define a variable with the placeholder name in the test file before the function.
+## Content
 
-**Literal Verification**. `beehave check` extracts quoted strings (`"value"`) and bare numbers (`42`, `-3`) from Given/When/Then steps and verifies they appear in the test function body. This means the test implementation MUST use the exact literal values from the spec — no paraphrasing. For example, if the step says `Then the output contains "temple8"`, the test body must contain the string `"temple8"`.
+### Development Stages (beehave status)
 
-**Strategy Inference from Examples Table**. When Scenario Outline has an Examples table, beehave infers Hypothesis strategies per [[requirements/gherkin#concepts]]:
+| Stage | Condition |
+|-------|-----------|
+| `ok` | All Examples have implemented tests with no violations |
+| `broken` | Feature file has Gherkin parse errors |
+| `needs scenarios` | Has Rules but no Examples |
+| `needs tests` | Has Examples but some lack test functions |
+| `needs bodies` | All Examples have test functions but all bodies are `...` stubs |
+| `needs fixes` | Tests exist with bodies but have violations (missing literals, missing placeholders, example mismatch, etc.) |
 
-| Column Values | Inferred Strategy |
-|---------------|-------------------|
-| All integers (e.g. `1`, `5`, `100`) | `st.integers()` |
-| All floats (e.g. `1.0`, `3.14`) | `st.floats()` |
-| All booleans (`true`, `false`) | `st.booleans()` |
-| Mixed or text (e.g. `Widget`, `Gadget`) | `st.text()` |
+### beehave status --json Structure
 
-Override by defining a variable with the placeholder name as a Hypothesis strategy in the test file.
+| Key | Content |
+|-----|---------|
+| `features` | Per-feature: `slug`, `stage`, `scenarios[]` with `title`, `status`, `violations[]` |
+| `summary` | Counts per stage: `ok`, `broken`, `needs_scenarios`, `needs_tests`, `needs_bodies`, `needs_fixes` |
+| `unmapped_directories` | Test directories with no matching feature file |
+| `collisions` | Test function names appearing in multiple feature test dirs |
 
-**beehave check**. The `beehave check` command (from the `beehave` library, not to be confused with the `behave` BDD framework) verifies structural traceability. It enforces:
+### beehave list -v Output
 
-| Violation Type | What It Detects |
-|----------------|-----------------|
-| `unmapped-scenario` | Example/Scenario Outline in .feature with no test function (note: "scenario" in the beehave error name refers to Examples) |
-| `unmapped-test` | Test function with no matching Example/Scenario Outline |
-| `misplaced-test` | Test in wrong file (rule path mismatch) |
-| `missing-placeholder` | Placeholder `<name>` from step not used in test body |
-| `missing-literal` | Literal value from step not found in test body |
-| `example-mismatch` | Examples table row has no matching `@example()` decorator |
+| Field | Description |
+|-------|-------------|
+| `slug` | Feature title slugified |
+| `title` | Feature title |
+| `path` | Path to the .feature file |
+| `scenarios` | Total scenario count |
+| `top_level_scenarios` | Scenarios not under a Rule |
+| `rules` | Count of Rule blocks |
+| `rule_scenarios` | Scenarios under Rules |
+| `stubs` | Stub count (including `...` body count) |
+| `implemented` | Implemented scenario count |
 
-Stubs (functions with `...` body) are exempt from placeholder and literal checks — these only apply once the `...` is replaced with implementation.
+### Test File Layout
 
-**Test File Layout**. pytest-beehave organizes tests as: Feature title → directory, Rule → test file, Example/Scenario Outline → function name. Test files are placed in `tests/features/<feature_slug>/<rule_slug>_test.py`.
-
-**Spec Value Fidelity in Tests**. Every literal and placeholder from the spec must appear in the test body — `beehave check` verifies this. Per Spec Value Fidelity ([[software-craft/test-design#concepts]]), the test must use each value in a way that reflects its domain purpose. If `"BTC/USD"` represents a trading pair, use it to construct or identify one. If `42` is a boundary value, use it at the boundary. Never satisfy traceability with noise: assigning to `_`, stuffing strings into assert messages, or writing helper functions whose sole purpose is consuming a literal. These mask the real issue — the value's domain purpose is not reflected in the test.
+pytest-beehave organizes tests as: Feature title → directory, Rule → test file, Example/Scenario Outline → function name. Test files are placed in `tests/features/<feature_slug>/<rule_slug>_test.py`.
 
 ## Related
 

diff --git a/.opencode/skills/accept-feature/SKILL.md b/.opencode/skills/accept-feature/SKILL.md
@@ -5,12 +5,12 @@ description: "Validate business behavior against BDD examples from the end user'
 
 # Accept Feature
 
-Available knowledge: [[requirements/gherkin#key-takeaways]], [[software-craft/test-design#key-takeaways]]. `in` artifacts: read all before starting work.
+Available knowledge: [[requirements/gherkin#key-takeaways]], [[software-craft/test-design#key-takeaways]], [[software-craft/test-stubs#concepts]]. `in` artifacts: read all before starting work.
 
 1. Run `task test-build` to verify all tests pass with coverage.
 2. Verify all BDD examples pass from the end user's perspective, not the test harness, per [[software-craft/test-design#key-takeaways]].
 3. IF an example passes in the test harness but fails from the user's perspective → flag it as a semantic alignment gap per [[software-craft/test-design#concepts]].
-4. Verify structural traceability via `beehave check`: every Example in the feature file must have exactly one corresponding test function, and every test function must trace back to an Example. pytest-beehave enforces this via title-based mapping. Any violations reported by `beehave check` mean the feature is not done.
+4. Verify structural traceability: run `beehave status --json` for stage overview, then `beehave check` for detailed violations per [[software-craft/test-stubs#concepts]]. Every Example in the feature file must have exactly one corresponding test function, and every test function must trace back to an Example. The feature stage must be `ok` and `beehave check` must produce no output.
 5. Verify semantic depth per [[software-craft/test-design#concepts]].
 6. Verify quality attributes are met.
 7. Verify definition of done criteria are satisfied.

diff --git a/.opencode/skills/review-gate/SKILL.md b/.opencode/skills/review-gate/SKILL.md
@@ -5,7 +5,7 @@ description: "Two-tier review with fail-fast: design -> structure"
 
 # Review Gate
 
-Available knowledge: [[software-craft/code-review]], [[software-craft/test-design]], [[software-craft/smell-catalogue]], [[architecture/reconciliation#key-takeaways]]. `in` artifacts: read all before starting work.
+Available knowledge: [[software-craft/code-review]], [[software-craft/test-design]], [[software-craft/test-stubs]], [[software-craft/smell-catalogue]], [[architecture/reconciliation#key-takeaways]]. `in` artifacts: read all before starting work.
 
 **Fail-fast rule**: Stop at first failure in any tier. Do NOT proceed to next tier if current tier fails.
 

diff --git a/.opencode/skills/select-feature/SKILL.md b/.opencode/skills/select-feature/SKILL.md
@@ -5,24 +5,16 @@ description: "Select the next feature to develop by detecting delivery status fr
 
 # Select Feature
 
-Available knowledge: [[requirements/wsjf#key-takeaways]]. `in` artifacts: read all before starting work.
+Available knowledge: [[requirements/wsjf#key-takeaways]], [[software-craft/test-stubs#key-takeaways]]. `in` artifacts: read all before starting work.
 
 1. List available feature files in `docs/features/`.
 2. IF no feature files exist → exit via `no-features`; features need discovery first.
-3. For each feature, determine delivery status — do NOT open or read individual feature or test files:
-
-   a. Check if the feature file has Example blocks (any line starting with `Example:`).
-      If none, the feature has not been broken down into BDD examples yet → feature is incomplete.
-
-   b. Run `beehave check <slug>` to verify structural traceability:
-      - Any output (errors) → some Examples lack matching test functions or there are orphan tests → feature is incomplete.
-      - No output (clean) → all Examples have matching test functions.
-
-   c. If beehave check is clean, run `task test-fast` scoped to that feature's test directory.
-      - Any failures → feature is incomplete.
-      - All pass → feature is delivered (skip).
-
-   d. If the test directory does not exist, beehave check will report errors → feature is incomplete.
+3. Run `beehave status --json` for project-wide overview per [[software-craft/test-stubs#concepts]]. For each feature, determine delivery status from its stage:
+   - Any stage other than `ok` → feature is incomplete.
+   - Stage `ok` → all Examples have implemented tests with no structural violations, but functional correctness must still be verified.
+   For features at `ok` stage, run `task test-fast` scoped to that feature's test directory:
+   - Any failures → feature is incomplete.
+   - All pass → feature is delivered (skip).
 
 4. IF every feature is delivered → exit via `no-features`.
 5. Collect all incomplete features. Derive dependency count for each from `domain_spec.md` context map:

diff --git a/.opencode/skills/verify-traceability/SKILL.md b/.opencode/skills/verify-traceability/SKILL.md
@@ -5,7 +5,7 @@ description: "Verify example-to-test traceability via beehave check and semantic
 
 # Verify Traceability
 
-Available knowledge: [[software-craft/test-design#key-takeaways]], [[requirements/gherkin#key-takeaways]]. `in` artifacts: read all before starting work.
+Available knowledge: [[software-craft/test-design#key-takeaways]], [[requirements/gherkin#key-takeaways]], [[software-craft/test-stubs#concepts]]. `in` artifacts: read all before starting work.
 
-1. Run `beehave check` and verify all violations resolved per [[software-craft/test-stubs#concepts]].
+1. Run `beehave check` and verify all violations resolved per [[software-craft/test-stubs#concepts]]. Optionally use `beehave status --json` for a summary view of the feature's development stage.
 2. Verify semantic depth per [[software-craft/test-design#concepts]]: for each Example that describes a user-facing command or API invocation, verify the corresponding test exercises the entry point described in the acceptance criterion (e.g., command handler, API endpoint), not just the domain logic in isolation. A test that calls domain methods directly when the AC describes a user-facing command is a semantic alignment gap: it has structural traceability but wrong semantic depth.