Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .opencode/knowledge/requirements/gherkin.md
Original file line number Diff line number Diff line change
Expand Up @@ -195,5 +195,5 @@ Test path conventions (`tests/features/<feature_slug>/`), the feature-test vs un
- [[requirements/decomposition]]: splitting Rules with too many Examples
- [[requirements/pre-mortem]]: finding hidden failure modes in rules
- [[software-craft/test-design]]: property-based testing for invariant rules
- [[software-craft/test-stubs]]: how beehave generates test stubs from feature files
- [[software-craft/test-stubs]]: how beehave generates test stubs from feature files, development stage tracking via `beehave status`
- [[software-craft/external-fixtures]]: real data fixtures for external adapter mocking
2 changes: 1 addition & 1 deletion .opencode/knowledge/software-craft/source-stubs.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ last-updated: 2026-05-08
- Source stubs contain the absolute minimum to compile and trace: Protocol signatures with `raise NotImplementedError` bodies, no docstrings, no type hints beyond the contract.
- Package structure mirrors the module structure from technical design; the domain package depends on nothing.
- Feature branches are created from the latest main.
- Create artifacts in this order: branch, directories, port interfaces, Protocol stubs, run beehave generate to create test stubs, run beehave check.
- Create artifacts in this order: branch, directories, port interfaces, Protocol stubs, run beehave generate to create test stubs, verify with `beehave check` and `beehave status` for development stage confirmation per [[software-craft/test-stubs#concepts]].

## Concepts

Expand Down
89 changes: 44 additions & 45 deletions .opencode/knowledge/software-craft/test-stubs.md
Original file line number Diff line number Diff line change
@@ -1,74 +1,73 @@
---
domain: software-craft
tags: [test-stubs, traceability, pytest-beehave, scenario-outline, hypothesis]
last-updated: 2026-05-19
last-updated: 2026-05-20
---

# Test Stubs

## Key Takeaways

- Test stubs are auto-generated by `beehave generate <feature_id>` from the feature file; no manual stub creation is needed.
- pytest-beehave uses title-based mapping: each Example title becomes a test function named `test_<example_title_slug>` (e.g., `test_VAT_is_applied_at_the_correct_rate`).
- Structural traceability (every Example has a test, no orphan tests, placeholders present, literals present) is verified by `beehave check`, not manually.
- Scenario Outline produces parameterized stubs with Hypothesis `@given` decorators (inferred strategies) and `@example` decorators (one per Examples table row). Plain Examples produce bare function stubs.
- `beehave check` verifies literal values from steps appear in test function bodies — tests must use the exact quoted strings and numbers from the spec.
- Test stubs are auto-generated by `beehave generate <feature_id>` from the feature file; no manual stub creation is needed. pytest-beehave uses title-based mapping: each Example title becomes a test function named `test_<example_title_slug>`.
- `beehave check` verifies structural traceability (every Example has a test, no orphan tests, placeholders present, literals present). Scenario Outline produces parameterized stubs with Hypothesis `@given` decorators (inferred strategies) and `@example` decorators (one per Examples table row).
- Literals from Given/When/Then steps (quoted strings, bare numbers) must appear verbatim in test function bodies — `beehave check` enforces this. Stubs (functions with `...` body) are exempt from literal and placeholder checks.
- `beehave status` reports development stage per feature across 6 stages (ok, broken, needs scenarios, needs tests, needs bodies, needs fixes) with tree or `--json` output. Use `beehave status --json` for project-wide overview and `beehave status` for per-feature tree view including inline violation codes.
- `beehave list` lists feature slugs and titles for features with Examples; `list -v` adds path, scenario count, stub/impl counts. `beehave clean <feature> --force` removes unmapped test functions from feature-paired test directories.
- Feature file stem MUST match the Feature title slug (e.g., Feature "CLI Entrypoint" → `cli_entrypoint.feature`). Title violations anywhere in the project block `beehave generate` (pre-flight validates all titles project-wide). `beehave check <single-feature>` skips global title validation — only `beehave check` (no argument) runs `validate_all_titles`.

## Concepts

**Title-Based Mapping**. pytest-beehave maps each Example or Scenario Outline in the feature file to a test function by title. The function name is derived from the Example or Scenario Outline title as a slug (e.g., Example: "VAT is applied at the correct rate" → `test_vat_is_applied_at_the_correct_rate`). This replaces the previous `@id` tag system. Titles must be unique within a feature file and 2–6 words per [[requirements/gherkin#concepts]].
**Title-Based Mapping and Auto-Generated Stubs**. pytest-beehave maps each Example or Scenario Outline in the feature file to a test function by title. The function name is derived from the Example or Scenario Outline title as a slug (e.g., Example: "VAT is applied at the correct rate" → `test_vat_is_applied_at_the_correct_rate`). Titles must be unique within a feature file and 2–6 words per [[requirements/gherkin#concepts]]. When `beehave generate <feature_id>` runs, pytest-beehave reads the feature files and creates test stubs automatically. Each stub has an `...` (Ellipsis) body. During pytest collection, pytest-beehave auto-skips any test function with an `...` body — no `@pytest.mark.skip` decorator is needed. The SE replaces the `...` body with the test implementation during the RED phase.

**Auto-Generated Stubs**. When `beehave generate <feature_id>` runs, pytest-beehave reads the feature files and creates test stubs automatically. Each stub has an `...` (Ellipsis) body. During pytest collection, pytest-beehave auto-skips any test function with an `...` body — no `@pytest.mark.skip` decorator is needed. The SE replaces the `...` body with the test implementation during the RED phase.
**Traceability Verification and Scenario Outline Stubs**. `beehave check` enforces structural traceability with 6 violation types: `unmapped-scenario` (Example with no test), `unmapped-test` (test with no Example), `misplaced-test` (test in wrong file), `missing-placeholder` (placeholder not in test body), `missing-literal` (literal not in test body), `example-mismatch` (Examples table row lacking `@example()` decorator). Scenario Outline stubs include `@given(placeholder_name=strategy)` decorators with inferred Hypothesis strategies plus `@example()` for each Examples table row. Strategy is inferred from column values: all integers → `st.integers()`, all floats → `st.floats()`, all booleans → `st.booleans()`, otherwise `st.text()`. Override by defining a strategy variable in the test file.

**Scenario Outline Stubs**. When a feature uses `Scenario Outline:` with `<placeholder>` syntax, the generated stub includes:
- `@given(placeholder_name=st.text())` (or inferred strategy) for each placeholder
- `@example(col1="val1", col2="val2")` for each row in the Examples table
- Function parameters matching placeholder names
**Literal and Placeholder Verification**. `beehave check` extracts quoted strings (`"value"`) and bare numbers (`42`, `-3`) from Given/When/Then steps and verifies they appear in the test function body. Per Spec Value Fidelity ([[software-craft/test-design#concepts]]), every literal and placeholder must carry domain meaning in the test — identifiers identify entities, boundaries bound, configurations configure. Never satisfy traceability with noise: assigning to `_`, stuffing strings into assert messages, or helper functions whose sole purpose is consuming a literal. Placeholder names become Python function parameters and must be valid Python identifiers (not keywords, not builtins like `sum`, `list`).

Example stub generated from a Scenario Outline:
**Development Stage Tracking with beehave status**. `beehave status` computes a development stage for each feature file. JSON output (`--json`) provides per-feature stages, per-scenario status with violation types, summary counts, `unmapped_directories`, and `collisions`. Tree output shows Rule → Scenario hierarchy with inline violation codes. `--include-unmapped` finds orphan test directories. A feature is `ok` even with mixed stubs and implementations — `needs bodies` fires only when ALL scenarios are stubs.

```python
from hypothesis import example, given, strategies as st
**Project Overview with beehave list and Cleanup with beehave clean**. `beehave list` shows feature slugs and titles for features with at least one Example. `beehave list -v` adds: path, scenario count (total + top-level vs rule breakdown), stub/impl counts (e.g., "stubs: 1/2 (1 implemented)"). `beehave clean <feature> [--force]` removes unmapped test functions from that feature's test directory and reports what was removed.

@given(qty=st.integers())
@example(qty=1)
@example(qty=5)
@example(qty=100)
def test_quantity_rejected_when_negative(qty):
...
```
**Feature File Stem and Title Consistency**. The feature filename stem MUST match the Feature title slug. Feature "CLI Entrypoint" (slug `cli_entrypoint`) → file `cli_entrypoint.feature` → test directory `tests/features/cli_entrypoint/`. Mismatch causes unmapped directories and test mapping failures. `beehave generate` runs a project-wide `validate_all_titles` pre-flight that blocks all generation if ANY feature has bad titles. `beehave check <feature>` skips global title validation — only `beehave check` with no argument runs `validate_all_titles`. When a Gherkin parse error exists, `validate_all_titles` raises an exception rather than handling gracefully.

The Hypothesis strategy is inferred from Examples table column values: all integers → `st.integers()`, all floats → `st.floats()`, all booleans → `st.booleans()`, otherwise `st.text()`. To use a custom strategy, define a variable with the placeholder name in the test file before the function.
## Content

**Literal Verification**. `beehave check` extracts quoted strings (`"value"`) and bare numbers (`42`, `-3`) from Given/When/Then steps and verifies they appear in the test function body. This means the test implementation MUST use the exact literal values from the spec — no paraphrasing. For example, if the step says `Then the output contains "temple8"`, the test body must contain the string `"temple8"`.
### Development Stages (beehave status)

**Strategy Inference from Examples Table**. When Scenario Outline has an Examples table, beehave infers Hypothesis strategies per [[requirements/gherkin#concepts]]:
| Stage | Condition |
|-------|-----------|
| `ok` | All Examples have implemented tests with no violations |
| `broken` | Feature file has Gherkin parse errors |
| `needs scenarios` | Has Rules but no Examples |
| `needs tests` | Has Examples but some lack test functions |
| `needs bodies` | All Examples have test functions but all bodies are `...` stubs |
| `needs fixes` | Tests exist with bodies but have violations (missing literals, missing placeholders, example mismatch, etc.) |

| Column Values | Inferred Strategy |
|---------------|-------------------|
| All integers (e.g. `1`, `5`, `100`) | `st.integers()` |
| All floats (e.g. `1.0`, `3.14`) | `st.floats()` |
| All booleans (`true`, `false`) | `st.booleans()` |
| Mixed or text (e.g. `Widget`, `Gadget`) | `st.text()` |
### beehave status --json Structure

Override by defining a variable with the placeholder name as a Hypothesis strategy in the test file.
| Key | Content |
|-----|---------|
| `features` | Per-feature: `slug`, `stage`, `scenarios[]` with `title`, `status`, `violations[]` |
| `summary` | Counts per stage: `ok`, `broken`, `needs_scenarios`, `needs_tests`, `needs_bodies`, `needs_fixes` |
| `unmapped_directories` | Test directories with no matching feature file |
| `collisions` | Test function names appearing in multiple feature test dirs |

**beehave check**. The `beehave check` command (from the `beehave` library, not to be confused with the `behave` BDD framework) verifies structural traceability. It enforces:
### beehave list -v Output

| Violation Type | What It Detects |
|----------------|-----------------|
| `unmapped-scenario` | Example/Scenario Outline in .feature with no test function (note: "scenario" in the beehave error name refers to Examples) |
| `unmapped-test` | Test function with no matching Example/Scenario Outline |
| `misplaced-test` | Test in wrong file (rule path mismatch) |
| `missing-placeholder` | Placeholder `<name>` from step not used in test body |
| `missing-literal` | Literal value from step not found in test body |
| `example-mismatch` | Examples table row has no matching `@example()` decorator |
| Field | Description |
|-------|-------------|
| `slug` | Feature title slugified |
| `title` | Feature title |
| `path` | Path to the .feature file |
| `scenarios` | Total scenario count |
| `top_level_scenarios` | Scenarios not under a Rule |
| `rules` | Count of Rule blocks |
| `rule_scenarios` | Scenarios under Rules |
| `stubs` | Stub count (including `...` body count) |
| `implemented` | Implemented scenario count |

Stubs (functions with `...` body) are exempt from placeholder and literal checks — these only apply once the `...` is replaced with implementation.
### Test File Layout

**Test File Layout**. pytest-beehave organizes tests as: Feature title → directory, Rule → test file, Example/Scenario Outline → function name. Test files are placed in `tests/features/<feature_slug>/<rule_slug>_test.py`.

**Spec Value Fidelity in Tests**. Every literal and placeholder from the spec must appear in the test body — `beehave check` verifies this. Per Spec Value Fidelity ([[software-craft/test-design#concepts]]), the test must use each value in a way that reflects its domain purpose. If `"BTC/USD"` represents a trading pair, use it to construct or identify one. If `42` is a boundary value, use it at the boundary. Never satisfy traceability with noise: assigning to `_`, stuffing strings into assert messages, or writing helper functions whose sole purpose is consuming a literal. These mask the real issue — the value's domain purpose is not reflected in the test.
pytest-beehave organizes tests as: Feature title → directory, Rule → test file, Example/Scenario Outline → function name. Test files are placed in `tests/features/<feature_slug>/<rule_slug>_test.py`.

## Related

Expand Down
4 changes: 2 additions & 2 deletions .opencode/skills/accept-feature/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,12 @@ description: "Validate business behavior against BDD examples from the end user'

# Accept Feature

Available knowledge: [[requirements/gherkin#key-takeaways]], [[software-craft/test-design#key-takeaways]]. `in` artifacts: read all before starting work.
Available knowledge: [[requirements/gherkin#key-takeaways]], [[software-craft/test-design#key-takeaways]], [[software-craft/test-stubs#concepts]]. `in` artifacts: read all before starting work.

1. Run `task test-build` to verify all tests pass with coverage.
2. Verify all BDD examples pass from the end user's perspective, not the test harness, per [[software-craft/test-design#key-takeaways]].
3. IF an example passes in the test harness but fails from the user's perspective → flag it as a semantic alignment gap per [[software-craft/test-design#concepts]].
4. Verify structural traceability via `beehave check`: every Example in the feature file must have exactly one corresponding test function, and every test function must trace back to an Example. pytest-beehave enforces this via title-based mapping. Any violations reported by `beehave check` mean the feature is not done.
4. Verify structural traceability: run `beehave status --json` for stage overview, then `beehave check` for detailed violations per [[software-craft/test-stubs#concepts]]. Every Example in the feature file must have exactly one corresponding test function, and every test function must trace back to an Example. The feature stage must be `ok` and `beehave check` must produce no output.
5. Verify semantic depth per [[software-craft/test-design#concepts]].
6. Verify quality attributes are met.
7. Verify definition of done criteria are satisfied.
Expand Down
2 changes: 1 addition & 1 deletion .opencode/skills/review-gate/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ description: "Two-tier review with fail-fast: design -> structure"

# Review Gate

Available knowledge: [[software-craft/code-review]], [[software-craft/test-design]], [[software-craft/smell-catalogue]], [[architecture/reconciliation#key-takeaways]]. `in` artifacts: read all before starting work.
Available knowledge: [[software-craft/code-review]], [[software-craft/test-design]], [[software-craft/test-stubs]], [[software-craft/smell-catalogue]], [[architecture/reconciliation#key-takeaways]]. `in` artifacts: read all before starting work.

**Fail-fast rule**: Stop at first failure in any tier. Do NOT proceed to next tier if current tier fails.

Expand Down
22 changes: 7 additions & 15 deletions .opencode/skills/select-feature/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,24 +5,16 @@ description: "Select the next feature to develop by detecting delivery status fr

# Select Feature

Available knowledge: [[requirements/wsjf#key-takeaways]]. `in` artifacts: read all before starting work.
Available knowledge: [[requirements/wsjf#key-takeaways]], [[software-craft/test-stubs#key-takeaways]]. `in` artifacts: read all before starting work.

1. List available feature files in `docs/features/`.
2. IF no feature files exist → exit via `no-features`; features need discovery first.
3. For each feature, determine delivery status — do NOT open or read individual feature or test files:

a. Check if the feature file has Example blocks (any line starting with `Example:`).
If none, the feature has not been broken down into BDD examples yet → feature is incomplete.

b. Run `beehave check <slug>` to verify structural traceability:
- Any output (errors) → some Examples lack matching test functions or there are orphan tests → feature is incomplete.
- No output (clean) → all Examples have matching test functions.

c. If beehave check is clean, run `task test-fast` scoped to that feature's test directory.
- Any failures → feature is incomplete.
- All pass → feature is delivered (skip).

d. If the test directory does not exist, beehave check will report errors → feature is incomplete.
3. Run `beehave status --json` for project-wide overview per [[software-craft/test-stubs#concepts]]. For each feature, determine delivery status from its stage:
- Any stage other than `ok` → feature is incomplete.
- Stage `ok` → all Examples have implemented tests with no structural violations, but functional correctness must still be verified.
For features at `ok` stage, run `task test-fast` scoped to that feature's test directory:
- Any failures → feature is incomplete.
- All pass → feature is delivered (skip).

4. IF every feature is delivered → exit via `no-features`.
5. Collect all incomplete features. Derive dependency count for each from `domain_spec.md` context map:
Expand Down
4 changes: 2 additions & 2 deletions .opencode/skills/verify-traceability/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ description: "Verify example-to-test traceability via beehave check and semantic

# Verify Traceability

Available knowledge: [[software-craft/test-design#key-takeaways]], [[requirements/gherkin#key-takeaways]]. `in` artifacts: read all before starting work.
Available knowledge: [[software-craft/test-design#key-takeaways]], [[requirements/gherkin#key-takeaways]], [[software-craft/test-stubs#concepts]]. `in` artifacts: read all before starting work.

1. Run `beehave check` and verify all violations resolved per [[software-craft/test-stubs#concepts]].
1. Run `beehave check` and verify all violations resolved per [[software-craft/test-stubs#concepts]]. Optionally use `beehave status --json` for a summary view of the feature's development stage.
2. Verify semantic depth per [[software-craft/test-design#concepts]]: for each Example that describes a user-facing command or API invocation, verify the corresponding test exercises the entry point described in the acceptance criterion (e.g., command handler, API endpoint), not just the domain logic in isolation. A test that calls domain methods directly when the AC describes a user-facing command is a semantic alignment gap: it has structural traceability but wrong semantic depth.
Loading
Loading