Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 11 additions & 7 deletions .opencode/knowledge/requirements/gherkin.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
domain: requirements
tags: [gherkin, acceptance-criteria, specification, examples, bdd, scenario-outline, hypothesis]
last-updated: 2026-05-14
last-updated: 2026-05-19
---

# Gherkin Specification Format
Expand All @@ -10,10 +10,9 @@ last-updated: 2026-05-14

- Write declarative Examples that describe behaviour, not UI steps; use `Example:` not `Scenario:` for single-case examples (BDD, North, 2006).
- Use `Scenario Outline:` with `<placeholder>` syntax and an `Examples:` table when the same behavioural outcome must be verified across 3+ input/output value combinations.
- Feature, Rule, and Example/Scenario Outline titles must be 2–6 words and unique within the feature file — pytest-beehave uses title-based mapping (title → `test_<slug>` function name) for traceability.
- `Then` must be a single, observable, measurable outcome; no "and" combining multiple behaviours in one `Then`.
- Quoted strings (`"value"`) and bare numbers (`42`, `-3`) in steps are extracted by beehave as literals and verified present in test function bodies via `beehave check`.
- `<placeholder>` names in steps become Python function parameters and Hypothesis `@given` strategies in generated stubs. Names must be valid Python identifiers (not keywords, not builtins).
- Feature, Rule, and Example/Scenario Outline titles must be 2–6 words and unique within the feature file — pytest-beehave uses title-based mapping (title → `test_<slug>` function name) for traceability. Titles must contain ONLY Unicode letters, digits, and spaces — no hyphens, periods, underscores, or special characters (they break slug generation).
- Quoted strings (`"value"`) and bare numbers (`42`, `-3`) in steps are extracted by beehave as literals and verified present in test function bodies via `beehave check`. Every literal carries domain meaning in the test — use literals that identify entities, boundaries, configurations, or error cases the reader needs to see.
- `<placeholder>` names in steps become Python function parameters in generated test stubs. Names must be valid Python identifiers (not keywords, not builtins). Every column in an Examples table must be referenced in the scenario steps — columns not referenced in steps are test data, not specification. If the expected outcome varies independently across examples, an output column is appropriate. If it is a deterministic function of inputs, express the relationship in the `Then` step (e.g., `Then the result is the minimum of <a> and <b>`) rather than adding a computed column.
- Bug Examples use `@bug` and require both a specific feature test and a Hypothesis property test.
- After criteria commit, Examples are frozen; changes require `@deprecated` on the old Example and a new Example with a new unique title.
- Two Examples with the same `Then` outcome but different input values test the same behaviour; partition by behaviour outcome, not by input value (Wynne, 2015; Adzic, 2011).
Expand All @@ -24,14 +23,16 @@ last-updated: 2026-05-14

**Example vs Scenario Outline**: Use `Example:` for single-case examples. Use `Scenario Outline:` when the same behavioural outcome must be verified across 3+ different input/output value combinations. Scenario Outline uses `<placeholder>` syntax in Given/When/Then steps and an `Examples:` table with concrete data rows. This avoids repeating identical step structures with different values.

**Title Length Constraint**: Feature, Rule, and Example/Scenario Outline titles must be 2–6 words. Titles become `test_<slug>` function names — too short produces ambiguous identifiers (e.g. `test_stuff`), too long produces unwieldy ones (e.g. `test_when_the_user_submits_a_form_with_invalid_email_the_system_displays_an_error_message`). Count words by splitting on whitespace.
**Title Length and Character Constraint**: Feature, Rule, and Example/Scenario Outline titles must be 2–6 words and unique within the feature file. Titles must contain ONLY Unicode letters, digits, and spaces — no hyphens (`-`), periods (`.`), underscores (`_`), or other special characters. The title is slugified to produce the test function name (`test_<slug>`), and special characters either break slug generation or produce ambiguous identifiers. Too short produces ambiguous identifiers (e.g. `test_stuff`), too long produces unwieldy ones. Count words by splitting on whitespace.

**Placeholder Syntax**: `<variable_name>` in Given/When/Then steps. Beehave extracts these and generates Hypothesis `@given(var_name=strategy)` decorators in test stubs. Placeholder names must be valid Python identifiers, not keywords (`for`, `class`), and not builtins (`list`, `str`). When used with Scenario Outline, the Examples table column headers must match the placeholder names.

**Literal Extraction**: Quoted strings (`"value"`, `'value'`) and bare numbers (`42`, `-3`, `3.14`) in Given/When/Then steps are extracted by beehave as literals. `beehave check` verifies these literals appear in the test function body. This provides structural traceability beyond title mapping — tests must use the exact literal values from the spec.
**Literal Extraction**: Quoted strings (`"value"`, `'value'`) and bare numbers (`42`, `-3`, `3.14`) in Given/When/Then steps are extracted by beehave as literals. `beehave check` verifies these literals appear in the test function body, providing structural traceability. Per Spec Value Fidelity ([[software-craft/test-design#concepts]]), every literal in a step must carry domain meaning — it should identify an entity, boundary, configuration, or error case that matters to the reader.

**Hypothesis Integration**: Scenario Outline generates `@given` decorated stubs with inferred Hypothesis strategies (`st.integers()`, `st.floats()`, `st.booleans()`, `st.text()`) plus `@example` decorators for each Examples table row. Plain Examples generate bare function stubs. For tests hitting external services, use `@settings(max_examples=N)` to control load. For unit/domain tests, Hypothesis defaults are fine.

**Meaningful Examples Tables**. Every column in an Examples table must be referenced in at least one step — unreferenced columns are test data, not specification. Output columns (e.g., `<expected>`) are appropriate when the expected value varies independently across rows (e.g., different tax rates per country, different error codes per input). When the expected outcome is a deterministic function of inputs, do not add an output column; express the relationship in the `Then` step. Per Spec Value Fidelity ([[software-craft/test-design#concepts]]), every value in the table exists to be used meaningfully in the test.

**Example Format and Title-Based Mapping**: Each Example uses the `Example:` keyword (not `Scenario:`), includes `Given/When/Then` in plain English. pytest-beehave maps Examples to test functions by title: the function name is `test_<example_title_slug>`. Titles must be unique within the feature file. Descriptive titles serve as the traceability link between feature specification and test code — no `@id` tags are needed.

**Single Observable Outcome per Then**: `Then` must be a single, observable, measurable outcome. No "and" combining multiple behaviours in one `Then`. Split into separate Examples instead. Observable means observable by the end user, not by a test harness.
Expand All @@ -58,6 +59,7 @@ last-updated: 2026-05-14

- Feature, Rule, and Example/Scenario Outline titles must be 2–6 words
- Titles must be unique within the feature file
- Titles must contain ONLY Unicode letters, digits, and spaces — no hyphens, periods, underscores, or special characters
- Title becomes the test function name: `test_<example_title_slug>`
- Titles should be descriptive enough to serve as the test identifier
- No `@id` tags — the title is the traceability link
Expand Down Expand Up @@ -175,6 +177,8 @@ Implement both:
- Using `Scenario:` keyword: use `Example:` for single cases or `Scenario Outline:` for parameterized cases
- Placeholder names that are Python keywords or builtins: beehave rejects these at parse time
- Paraphrasing literal values in test code instead of using exact values from the spec: fails `beehave check`
- Titles containing hyphens, periods, underscores, or special characters: only Unicode letters, digits, and spaces are allowed
- Adding literals to steps that lack domain meaning: every literal must identify an entity, boundary, configuration, or error case that justifies its presence in both the specification and the test

### Feature File Path Convention

Expand Down
107 changes: 107 additions & 0 deletions .opencode/knowledge/requirements/property-patterns.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
---
domain: requirements
tags: [property-based-testing, examples, scenario-outline, test-design, bdd, hypothesis]
last-updated: 2026-05-19
---

# Property Patterns for BDD Example Selection

## Key Takeaways

- When writing BDD Examples, use these seven property patterns (Wlaschin, 2014) to decide whether an Example should be a simple `Example:` or a `Scenario Outline:` with multiple input combinations.
- **Simple `Example:`** is appropriate when the behaviour is a single observable outcome with fixed inputs — no interesting property to generalise.
- **`Scenario Outline:`** is appropriate when the same behavioural outcome holds across multiple input/output combinations — the property pattern reveals which combinations matter.
- The seven patterns also surface missing Examples: if a property pattern applies but has no corresponding Example, the specification is incomplete.

## Concepts

**Seven Property Patterns** (Wlaschin, 2014). When choosing what to verify in a specification, these patterns help discover what properties (invariants, relationships) the system should satisfy:

| Pattern | Core Idea | When to use Scenario Outline |
|---------|-----------|------------------------------|
| Different paths, same destination | Two operation sequences produce the same result | When multiple paths exist to the same outcome (e.g., different orderings, different constructors) |
| There and back again | An operation and its inverse return to the starting state | When serialise/deserialise, encode/decode, add/remove pairs exist |
| Some things never change | An invariant is preserved after a transformation | When a transform should preserve size, membership, ordering, or other invariants |
| The more things change, the more they stay the same | Applying an operation twice is the same as applying it once (idempotence) | When operations should be idempotent (e.g., deduplicate, round, normalise) |
| Solve a smaller problem first | A property true for a small case implies truth for a composed case (structural induction) | When recursive or composable structures are involved (lists, trees, nested objects) |
| Hard to prove, easy to verify | Finding the answer is complex, but checking it is simple | When output can be verified by a simpler check (e.g., sort result is a permutation, parse result concatenates to original) |
| The test oracle | An alternate implementation exists to verify results | When a brute-force or simplified reference implementation can validate the optimised version |

**Using Patterns to Choose Example vs Scenario Outline**: During feature example creation (write-bdd-features skill), apply these patterns to each Rule:

1. For each Rule, ask: "Does any of the seven patterns apply to this behaviour?"
2. If **no pattern applies** — the behaviour is a single discrete outcome with fixed inputs — write a simple `Example:`.
3. If a pattern applies — the behaviour holds across a range of inputs — write a `Scenario Outline:` with an `Examples:` table covering the significant input combinations surfaced by the pattern.
4. If a pattern reveals an edge case not covered by existing Examples — add the missing Example.

**Pattern-to-Example Decision Tree**:

```
Does the Rule describe an invariant that holds across inputs?
├─ Yes → Scenario Outline with inputs that exercise the invariant
│ + Hypothesis property test per [[software-craft/test-design#concepts]]
└─ No → Does the Rule have "easy to verify" checkable output?
├─ Yes → Can multiple inputs produce different valid outputs?
│ ├─ Yes → Scenario Outline with representative input/output pairs
│ └─ No → Simple Example with the key input
└─ No → Simple Example (single observable outcome)
```

**Pre-mortem Integration**: During the behavior-level pre-mortem per [[requirements/pre-mortem#concepts]], apply property patterns adversarially: "Given this pattern applies to this Rule, what inputs would break it?" Surface failure modes as additional Examples.

## Content

### Pattern Application Examples

**Different paths, same destination**: A sort function produces the same result regardless of input order. Use Scenario Outline with different input orderings asserting identical sorted output. This also applies to commutative operations: `a + b == b + a`.

**There and back again**: JSON serialisation round-trips: `decode(encode(obj)) == obj`. Use Scenario Outline with different object shapes. HTTP encode/decode, compression/decompress, and format conversions all fit this pattern.

**Some things never change**: A `map` operation preserves list length. A `sort` preserves the multiset of elements. Use Scenario Outline with different input sizes and element values, asserting the invariant holds.

**Idempotence**: Calling `distinct()` twice produces the same result as calling it once. Use Scenario Outline with different input sets, some already distinct, some with duplicates. REST PUT operations are another common case.

**Structural induction**: If a property holds for a base case (empty list) and for appending one element, it holds for all lists. Use Scenario Outline with list sizes 0, 1, 2, N to cover induction steps.

**Hard to prove, easy to verify**: Finding a prime factorisation is hard, but multiplying the factors back is trivial. Tokenising a string is hard, but concatenating tokens should equal the original. Use Scenario Outline with different input strings or numbers, asserting the verification check.

**Test oracle**: A fast sorting algorithm can be verified against a naive bubble sort. A parallel computation can be verified against a sequential version. Use Scenario Outline where each row exercises a different input against both implementations.

### Integration with BDD Workflow

When the PO (or SE) writes Examples during `write-bdd-features`:

1. Write the Rule's declarative behaviour first (Given/When/Then).
2. Check each of the seven patterns against the Rule.
3. For each matching pattern, determine the input combinations that exercise the property.
4. If 1-2 combinations → simple `Example:` per combination.
5. If 3+ combinations with the same step structure → `Scenario Outline:` with `Examples:` table.
6. For invariant/structural Rules → also generate a Hypothesis property test per [[software-craft/test-design#concepts]].

### Hypothesis Property Tests from Patterns

Each invariant/structural Rule should produce both BDD Examples AND a Hypothesis property test. The property pattern guides the Hypothesis strategy:

| Pattern | Hypothesis Strategy |
|---------|-------------------|
| Different paths, same destination | `@given(inputs, order=strategies.permutations)` |
| There and back again | `@given(arbitrary_input)` then round-trip assert |
| Some things never change | `@given(transform_input)` then assert invariant |
| Idempotence | `@given(input)` then `assert f(f(x)) == f(x)` |
| Structural induction | `@given(recursive_strategy)` with base + step |
| Hard to prove, easy to verify | `@given(input)` then verify output with simple check |
| Test oracle | `@given(input)` then `assert fast(input) == oracle(input)` |

## Related

- [[requirements/gherkin]]
- [[requirements/pre-mortem]]
- [[software-craft/test-design]]
- [[software-craft/tdd]]

## Related

- [[software-craft/test-design]]
- [[software-craft/tdd]]
- [[requirements/gherkin]]
- [[requirements/pre-mortem]]
3 changes: 3 additions & 0 deletions .opencode/knowledge/software-craft/test-design.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ last-updated: 2026-05-14
- Test coupling exists on a spectrum: feature tests (most resilient) > unit contract tests > property-based tests > white-box tests (most brittle, avoid).
- One observable behaviour per test: each test should fail for exactly one reason and pass for exactly one reason.
- Hard-coded values are acceptable when the test only requires that value; parameterising prematurely couples the test to assumptions about future needs.
- Spec Value Fidelity: every value in the specification carries domain intent; the test must use it in a way that reflects that intent — no noise patterns to satisfy traceability.
- Property tests: all invariant/structural rules, not just @bug Examples. Examples alone cannot prove an invariant (MacIver, 2016).

## Concepts
Expand All @@ -27,6 +28,8 @@ last-updated: 2026-05-14

**Semantic Depth**. A test that exists for an Example but exercises domain logic directly instead of through the entry point described in the acceptance criterion has correct structural traceability but wrong semantic depth. Every feature test must exercise the entry point the AC describes: if the AC specifies a command-line invocation, the test must invoke the command handler; if the AC specifies an API call, the test must call the API endpoint. Structural traceability (every Example has a test function) without semantic depth (every test exercises the right entry point) creates a false sense of coverage.

**Spec Value Fidelity**. Every literal, placeholder, and Examples table column in a specification exists because the PO judged it domain-meaningful — an entity identifier, a boundary value, a configuration, a concrete expected outcome. The test must use each value in a way that reflects that domain purpose. Noise patterns that satisfy structural traceability without domain meaning — assigning to `_`, stuffing strings into assert messages, helper functions whose sole purpose is consuming a literal — violate this fidelity. If a spec value does not fit naturally in the test, the mismatch signals a spec clarity issue, not a test workaround opportunity.

**Invariant Property Tests**. Structural (invariant) rules describe properties that must hold across all inputs, not specific behaviours. Examples alone cannot prove an invariant — they only confirm it holds for the selected cases (MacIver, 2016). When a Rule asserts an invariant (e.g., "total must equal sum of parts," "output must be sorted," "balance must never go negative"), the specification pre-mortem and behavior pre-mortem surface candidate counterexamples. These counterexamples become assertions in a Hypothesis property test (`tests/unit/`) that verifies the invariant across a generated range of inputs, catching failure modes that no finite set of hand-picked Examples could have found.

## Content
Expand Down
4 changes: 3 additions & 1 deletion .opencode/knowledge/software-craft/test-stubs.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
domain: software-craft
tags: [test-stubs, traceability, pytest-beehave, scenario-outline, hypothesis]
last-updated: 2026-05-14
last-updated: 2026-05-19
---

# Test Stubs
Expand Down Expand Up @@ -68,6 +68,8 @@ Stubs (functions with `...` body) are exempt from placeholder and literal checks

**Test File Layout**. pytest-beehave organizes tests as: Feature title → directory, Rule → test file, Example/Scenario Outline → function name. Test files are placed in `tests/features/<feature_slug>/<rule_slug>_test.py`.

**Spec Value Fidelity in Tests**. Every literal and placeholder from the spec must appear in the test body — `beehave check` verifies this. Per Spec Value Fidelity ([[software-craft/test-design#concepts]]), the test must use each value in a way that reflects its domain purpose. If `"BTC/USD"` represents a trading pair, use it to construct or identify one. If `42` is a boundary value, use it at the boundary. Never satisfy traceability with noise: assigning to `_`, stuffing strings into assert messages, or writing helper functions whose sole purpose is consuming a literal. These mask the real issue — the value's domain purpose is not reflected in the test.

## Related

- [[requirements/gherkin]]: Example format, title conventions, Scenario Outline syntax, placeholder and literal rules
Expand Down
Loading
Loading