From 853a0020795bdd4c8d9daaa6d8abffdc9eeced29 Mon Sep 17 00:00:00 2001
From: Ubuntu <info@saprolings.com>
Date: Tue, 19 May 2026 08:06:19 +0000
Subject: [PATCH] =?UTF-8?q?fix:=20strengthen=20orchestrator=20discipline?=
 =?UTF-8?q?=20=E2=80=94=20dispatch=20enforcement,=20branch=20verification,?=
 =?UTF-8?q?=20todo=20discipline,=20property=20patterns=20for=20BDD=20examp?=
 =?UTF-8?q?les?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Post-session analysis of cex-mm project revealed three systemic failures:

1. Orchestrator routinely bypasses owner dispatch and does work directly.
   The todo template had no Dispatch step — it jumped from Preparation
   to Load Skills, so the orchestrator never saw the instruction to dispatch.
   Fix: added explicit Dispatch step (#2) in todo template with MUST NOT
   do the work itself constraint and owner mapping table.

2. Branch discipline not enforced at state entry. Agents entered states
   declaring git:dev while on feature branches and vice versa.
   Fix: Preparation step now verifies branch matches attrs.git. Golden
   rule 7 now says 'Verify before starting'. New golden rule 8: feature
   branches must be merged back to dev before new work starts.

3. Todo list goes stale or disappears mid-state as agents focus on work.
   Fix: added Todo discipline paragraph requiring update after every step
   and regeneration if missing.

4. Review-gate skill loaded smell-catalogue at #key-takeaways but
   detecting violations needs the full document (per progressive
   knowledge loading rules in AGENTS.md).
   Fix: step 5 now loads full docs for detection, #key-takeaways only
   for recall.

5. No guidance for choosing Example vs Scenario Outline during BDD
   example creation. Agents either over-used Scenario Outlines or
   under-used them.
   Fix: added property-patterns knowledge file (Wlaschin, 2014) with
   seven patterns and a decision tree. Updated write-bdd-features skill
   step 4 to apply patterns systematically. Added research reference.
---
 .../requirements/property-patterns.md         | 107 ++++++++++++++++++
 .opencode/skills/review-gate/SKILL.md         |   2 +-
 .opencode/skills/write-bdd-features/SKILL.md  |  16 +--
 AGENTS.md                                     |  19 ++--
 .../quality/wlaschin_2014.md                  |  50 ++++++++
 5 files changed, 178 insertions(+), 16 deletions(-)
 create mode 100644 .opencode/knowledge/requirements/property-patterns.md
 create mode 100644 docs/research/software-engineering/quality/wlaschin_2014.md

diff --git a/.opencode/knowledge/requirements/property-patterns.md b/.opencode/knowledge/requirements/property-patterns.md
new file mode 100644
index 0000000..a9d9865
--- /dev/null
+++ b/.opencode/knowledge/requirements/property-patterns.md
@@ -0,0 +1,107 @@
+---
+domain: requirements
+tags: [property-based-testing, examples, scenario-outline, test-design, bdd, hypothesis]
+last-updated: 2026-05-19
+---
+
+# Property Patterns for BDD Example Selection
+
+## Key Takeaways
+
+- When writing BDD Examples, use these seven property patterns (Wlaschin, 2014) to decide whether an Example should be a simple `Example:` or a `Scenario Outline:` with multiple input combinations.
+- **Simple `Example:`** is appropriate when the behaviour is a single observable outcome with fixed inputs — no interesting property to generalise.
+- **`Scenario Outline:`** is appropriate when the same behavioural outcome holds across multiple input/output combinations — the property pattern reveals which combinations matter.
+- The seven patterns also surface missing Examples: if a property pattern applies but has no corresponding Example, the specification is incomplete.
+
+## Concepts
+
+**Seven Property Patterns** (Wlaschin, 2014). When choosing what to verify in a specification, these patterns help discover what properties (invariants, relationships) the system should satisfy:
+
+| Pattern | Core Idea | When to use Scenario Outline |
+|---------|-----------|------------------------------|
+| Different paths, same destination | Two operation sequences produce the same result | When multiple paths exist to the same outcome (e.g., different orderings, different constructors) |
+| There and back again | An operation and its inverse return to the starting state | When serialise/deserialise, encode/decode, add/remove pairs exist |
+| Some things never change | An invariant is preserved after a transformation | When a transform should preserve size, membership, ordering, or other invariants |
+| The more things change, the more they stay the same | Applying an operation twice is the same as applying it once (idempotence) | When operations should be idempotent (e.g., deduplicate, round, normalise) |
+| Solve a smaller problem first | A property true for a small case implies truth for a composed case (structural induction) | When recursive or composable structures are involved (lists, trees, nested objects) |
+| Hard to prove, easy to verify | Finding the answer is complex, but checking it is simple | When output can be verified by a simpler check (e.g., sort result is a permutation, parse result concatenates to original) |
+| The test oracle | An alternate implementation exists to verify results | When a brute-force or simplified reference implementation can validate the optimised version |
+
+**Using Patterns to Choose Example vs Scenario Outline**: During feature example creation (write-bdd-features skill), apply these patterns to each Rule:
+
+1. For each Rule, ask: "Does any of the seven patterns apply to this behaviour?"
+2. If **no pattern applies** — the behaviour is a single discrete outcome with fixed inputs — write a simple `Example:`.
+3. If a pattern applies — the behaviour holds across a range of inputs — write a `Scenario Outline:` with an `Examples:` table covering the significant input combinations surfaced by the pattern.
+4. If a pattern reveals an edge case not covered by existing Examples — add the missing Example.
+
+**Pattern-to-Example Decision Tree**:
+
+```
+Does the Rule describe an invariant that holds across inputs?
+├─ Yes → Scenario Outline with inputs that exercise the invariant
+│        + Hypothesis property test per [[software-craft/test-design#concepts]]
+└─ No → Does the Rule have "easy to verify" checkable output?
+    ├─ Yes → Can multiple inputs produce different valid outputs?
+    │        ├─ Yes → Scenario Outline with representative input/output pairs
+    │        └─ No → Simple Example with the key input
+    └─ No → Simple Example (single observable outcome)
+```
+
+**Pre-mortem Integration**: During the behavior-level pre-mortem per [[requirements/pre-mortem#concepts]], apply property patterns adversarially: "Given this pattern applies to this Rule, what inputs would break it?" Surface failure modes as additional Examples.
+
+## Content
+
+### Pattern Application Examples
+
+**Different paths, same destination**: A sort function produces the same result regardless of input order. Use Scenario Outline with different input orderings asserting identical sorted output. This also applies to commutative operations: `a + b == b + a`.
+
+**There and back again**: JSON serialisation round-trips: `decode(encode(obj)) == obj`. Use Scenario Outline with different object shapes. HTTP encode/decode, compression/decompress, and format conversions all fit this pattern.
+
+**Some things never change**: A `map` operation preserves list length. A `sort` preserves the multiset of elements. Use Scenario Outline with different input sizes and element values, asserting the invariant holds.
+
+**Idempotence**: Calling `distinct()` twice produces the same result as calling it once. Use Scenario Outline with different input sets, some already distinct, some with duplicates. REST PUT operations are another common case.
+
+**Structural induction**: If a property holds for a base case (empty list) and for appending one element, it holds for all lists. Use Scenario Outline with list sizes 0, 1, 2, N to cover induction steps.
+
+**Hard to prove, easy to verify**: Finding a prime factorisation is hard, but multiplying the factors back is trivial. Tokenising a string is hard, but concatenating tokens should equal the original. Use Scenario Outline with different input strings or numbers, asserting the verification check.
+
+**Test oracle**: A fast sorting algorithm can be verified against a naive bubble sort. A parallel computation can be verified against a sequential version. Use Scenario Outline where each row exercises a different input against both implementations.
+
+### Integration with BDD Workflow
+
+When the PO (or SE) writes Examples during `write-bdd-features`:
+
+1. Write the Rule's declarative behaviour first (Given/When/Then).
+2. Check each of the seven patterns against the Rule.
+3. For each matching pattern, determine the input combinations that exercise the property.
+4. If 1-2 combinations → simple `Example:` per combination.
+5. If 3+ combinations with the same step structure → `Scenario Outline:` with `Examples:` table.
+6. For invariant/structural Rules → also generate a Hypothesis property test per [[software-craft/test-design#concepts]].
+
+### Hypothesis Property Tests from Patterns
+
+Each invariant/structural Rule should produce both BDD Examples AND a Hypothesis property test. The property pattern guides the Hypothesis strategy:
+
+| Pattern | Hypothesis Strategy |
+|---------|-------------------|
+| Different paths, same destination | `@given(inputs, order=strategies.permutations)` |
+| There and back again | `@given(arbitrary_input)` then round-trip assert |
+| Some things never change | `@given(transform_input)` then assert invariant |
+| Idempotence | `@given(input)` then `assert f(f(x)) == f(x)` |
+| Structural induction | `@given(recursive_strategy)` with base + step |
+| Hard to prove, easy to verify | `@given(input)` then verify output with simple check |
+| Test oracle | `@given(input)` then `assert fast(input) == oracle(input)` |
+
+## Related
+
+- [[requirements/gherkin]]
+- [[requirements/pre-mortem]]
+- [[software-craft/test-design]]
+- [[software-craft/tdd]]
+
+## Related
+
+- [[software-craft/test-design]]
+- [[software-craft/tdd]]
+- [[requirements/gherkin]]
+- [[requirements/pre-mortem]]
diff --git a/.opencode/skills/review-gate/SKILL.md b/.opencode/skills/review-gate/SKILL.md
index c86e28d..8b112e2 100644
--- a/.opencode/skills/review-gate/SKILL.md
+++ b/.opencode/skills/review-gate/SKILL.md
@@ -15,7 +15,7 @@ Available knowledge: [[software-craft/code-review]], [[software-craft/test-desig
 2. Verify implementation aligns with architectural decisions per [[software-craft/code-review#concepts]]: ADR compliance, quality attributes met.
 3. Verify all `# Constraints:` in the .feature file are met in the implementation. For technology constraints, read domain_spec.md `### Technology Requirements` table and execute the Verification instruction for each row (grep imports, check file existence, inspect config). Zero evidence → FAIL. For quality attribute constraints, verify thresholds are enforced.
 4. Verify implementation aligns with feature specification: all Examples have corresponding test implementations, behavior matches Gherkin steps.
-5. Verify design principles adversarially per the priority order in [[software-craft/tdd#content]], loading ObjCal per [[software-craft/object-calisthenics#key-takeaways]], smells per [[software-craft/smell-catalogue#key-takeaways]], and SOLID per [[software-craft/solid#key-takeaways]].
+5. Verify design principles adversarially per the priority order in [[software-craft/tdd#content]], loading the full documents for detection: ObjCal per [[software-craft/object-calisthenics]], smells per [[software-craft/smell-catalogue]], and SOLID per [[software-craft/solid]]. Use `#key-takeaways` only when recalling principles, not when detecting violations.
 6. **FAIL-FAST**: If any design violations found → exit `fail` with specific citations (file:line). Do NOT proceed to structure review.
 
 ## Tier 2: Structure Review
diff --git a/.opencode/skills/write-bdd-features/SKILL.md b/.opencode/skills/write-bdd-features/SKILL.md
index 4c20d06..f3523c1 100644
--- a/.opencode/skills/write-bdd-features/SKILL.md
+++ b/.opencode/skills/write-bdd-features/SKILL.md
@@ -5,22 +5,22 @@ description: "Write concrete Given/When/Then Example blocks for each Rule in the
 
 # Write BDD Features
 
-Available knowledge: [[requirements/gherkin]], [[requirements/moscow]], [[requirements/pre-mortem]], [[requirements/decomposition]]. `in` artifacts: read all before starting work.
+Available knowledge: [[requirements/gherkin]], [[requirements/moscow]], [[requirements/pre-mortem]], [[requirements/decomposition]], [[requirements/property-patterns]]. `in` artifacts: read all before starting work.
 
 1. Discover and read the feature file, product definition, domain spec, and glossary from `in`.
 2. Run a pre-mortem per [[requirements/pre-mortem]] for each Rule before writing any Examples. All Rules must have their pre-mortems completed before any Examples are written.
 3. IF hidden failure modes surface from the pre-mortem → plan Examples to cover them per [[requirements/gherkin#key-takeaways]].
-4. For each Rule, write Example or Scenario Outline blocks directly from the Rule description and domain spec knowledge per [[requirements/gherkin#concepts]]. Do NOT use behavior hints — they have been removed from the flow. Derive Example behavior directly from:
-   - The Rule's behavioral description paragraph
-   - The domain spec's External Contracts, Data Shapes, and Invariants
-   - The feature's `# Constraints:` comments
-   - Quality attributes from product_definition.md
-   Write Examples per format rules in [[requirements/gherkin#concepts]].
+4. For each Rule, apply property patterns per [[requirements/property-patterns#concepts]] to determine Example structure:
+    a) Check each of the seven patterns against the Rule's behaviour.
+    b) If no pattern applies → write a simple `Example:` with fixed inputs.
+    c) If a pattern applies and reveals 3+ input combinations with the same step structure → write a `Scenario Outline:` with an `Examples:` table covering the significant combinations surfaced by the pattern.
+    d) If a pattern applies but only reveals 1-2 combinations → write simple `Example:` per combination.
+    Write Examples per format rules in [[requirements/gherkin#concepts]], deriving behavior from the Rule's description, domain spec External Contracts/Data Shapes/Invariants, the feature's `# Constraints:` comments, and quality attributes from product_definition.md.
 5. For each Rule, verify Examples cover distinct behaviours per [[requirements/gherkin#concepts]]:
    a) Group Examples by `Then` outcome. Same outcome = same behaviour. Keep one representative per outcome. Discard duplicates. Exception: Scenario Outline rows are parameterized variants of the same behaviour — they are NOT duplicates.
    b) For each distinct outcome, run the behavior-level pre-mortem per [[requirements/pre-mortem#concepts]].
    c) Add Examples targeting the failure modes surfaced.
-   d) Structural (invariant) rules: one representative Example suffices. Defer full coverage to a Hypothesis property test per [[software-craft/test-design#concepts]].
+    d) Structural (invariant) rules: one representative Example suffices. Defer full coverage to a Hypothesis property test per [[software-craft/test-design#concepts]], using the pattern-to-strategy mapping in [[requirements/property-patterns#content]].
 6. Classify each Example per [[requirements/moscow#concepts]]; MoSCoW classification is for internal triage only: do NOT add Must/Should/Could tags to Examples in the .feature file.
 7. IF a Rule has more than 8 Must behaviors (after grouping by Then-outcome and collapsing Scenario Outlines) → this is a soft flag for PO review. Do NOT split or modify the Rule — Rule structure is frozen after define-flow. Decomposition was applied during refine-features; this check catches edge cases that slipped through. A Rule with 9+ Must behaviors is acceptable if the behaviour genuinely requires that many distinct cases.
 8. Evaluate each Rule's Examples for quality, checking every criterion per [[requirements/gherkin#concepts]]:
diff --git a/AGENTS.md b/AGENTS.md
index 1539731..b36fc40 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -8,7 +8,8 @@ Post-mortem analysis shows these practices prevent most project failures. Violat
 4. **Never decompose a feature without stakeholder approval.** If a feature is too large for INVEST, propose the split to the stakeholder with rationale. They decide what's core vs. deferred.
 5. **Verify inputs exist before entering a state.** Every state's `in` artifacts must be readable on disk. If they're missing, stop and reconstruct them. Don't proceed with assumed knowledge.
 6. **A feature is not done until every interview requirement is traced.** Every stakeholder Q&A must map to either a passing @id test or an explicit stakeholder deferral. Untraced requirements = incomplete delivery.
-7. **Respect git branch discipline.** Every state declares `git: dev`, `git: feature`, or `git: main` in its attrs. Work on the branch the state declares. Never switch branches mid-state. Before exiting a project-phase flow (discovery, architecture, branding, setup), set `committed-to-dev-locally: ==verified` evidence. Changes must be committed to dev before advancing.
+7. **Respect git branch discipline.** Every state declares `git: dev`, `git: feature`, or `git: main` in its attrs. **Verify the current branch matches `attrs.git` before starting any work.** If the branch is wrong, checkout or create the correct branch before proceeding. Never switch branches mid-state. Before exiting a project-phase flow (discovery, architecture, branding, setup), set `committed-to-dev-locally: ==verified` evidence. Changes must be committed to dev before advancing.
+8. **Every feature branch must be merged back to dev.** A feature is not delivered until its commits are squash-merged into local dev and `task test-fast` passes on dev. The develop-flow exits to deliver-flow which handles the merge, but the orchestrator must never leave a feature branch dangling — if the session ends mid-feature, resume and complete the merge before starting new work.
 
 ## Project Structure
 - `.flowr/flows/`: YAML state machine definitions (source of truth for routing)
@@ -163,16 +164,20 @@ Exception: The polish-code skill explicitly runs convention commands (`task conv
 
 ### Todo-Driven State Execution
 
-At state entry, generate a procedural todo list from the state's metadata using the todowrite tool. Format: `[X]` completed, `[ ]` pending, `[~]` anchor (always last).
+At state entry, generate a procedural todo list using the todowrite tool. Format: `[X]` completed, `[ ]` pending, `[~]` anchor (always last).
 
-1. **Preparation** (`[ ]`): list available `in` artifacts
-2. **Dispatch** (`[ ]`): call the state's owner agent with skills loaded
-3. **Output** (`[ ]`): one per `out` artifact
-4. **Verification** (`[ ]`): check constraints, run tests/lint if applicable
-5. **Anchor** (`[~]`, always last): flowr next → pick transition → flowr transition → rewrite todo
+1. **Preparation** (`[ ]`): verify current branch matches `attrs.git` (checkout or create if wrong). List available `in` artifacts.
+2. **Dispatch** (`[ ]`): dispatch to the owner agent listed in `attrs.owner` as a subagent with skills loaded. The orchestrator MUST NOT do the work itself — only route. Owner mapping: `PO` → product-owner, `DE` → domain-expert, `SE` → software-engineer, `SA` → system-architect, `R` → reviewer, `Design Agent` → design-agent, `Setup Agent` → setup-agent.
+3. **Load skills** (`[ ]`): read every skill file listed in `attrs.skills` from `.opencode/skills/<skill_name>/SKILL.md`. This step is MANDATORY — never skip it.
+4. **Skill-derived work items** (`[ ]`): one todo item per numbered step in the skill, using the skill's own language verbatim. These are the substantive work items. Self-generated items are only permitted for infrastructure (read artifacts, commit) — never for the core procedure.
+5. **Output** (`[ ]`): one per `out` artifact
+6. **Verification** (`[ ]`): check constraints, run tests/lint if applicable
+7. **Anchor** (`[~]`, always last): flowr next → pick transition → flowr transition → rewrite todo
 
 The todo is the execution contract. Every item must be marked `[X]` before the anchor fires. One state per todo; never span multiple states or collapse loop iterations. Full protocol: [[workflow/todo-anchor-protocol]].
 
+**Todo discipline**: After completing ANY step, update the todowrite tool to mark it `[X]` and set the next step `[ ]` to `in_progress`. If the todo list is empty or missing, regenerate it immediately — working without a todo means working without a contract. Never let the todo go stale between steps.
+
 ### Session Init
 
 Before starting a flow, create a session to track progress:
diff --git a/docs/research/software-engineering/quality/wlaschin_2014.md b/docs/research/software-engineering/quality/wlaschin_2014.md
new file mode 100644
index 0000000..f24b55e
--- /dev/null
+++ b/docs/research/software-engineering/quality/wlaschin_2014.md
@@ -0,0 +1,50 @@
+# Choosing Properties for Property-Based Testing (Wlaschin, 2014)
+
+## Citation
+
+Wlaschin, S. (2014). "Choosing properties for property-based testing" *F# for Fun and Profit*. https://fsharpforfunandprofit.com/posts/property-based-testing-2/
+
+## Source Type
+
+Blog/Article
+
+## Method
+
+Theoretical with practical examples
+
+## Verification Status
+
+Verified
+
+## Confidence
+
+High
+
+## Key Insight
+
+Seven recurring patterns help developers discover testable properties when they cannot think of any: "Different paths, same destination" (commutative diagram), "There and back again" (inverse function), "Some things never change" (invariant under transformation), "The more things change, the more they stay the same" (idempotence), "Solve a smaller problem first" (structural induction), "Hard to prove, easy to verify", and "The test oracle".
+
+## Core Findings
+
+1. The universal problem with property-based testing is not tooling but discovering what properties to test — developers stare at a blank screen unable to think of properties
+2. Seven patterns cover most common cases: commutative operations, inverse pairs, invariants, idempotence, structural induction, easy-verification, and test oracles
+3. "Different paths, same destination" applies when two operation sequences should produce the same result (e.g., addition is commutative)
+4. "There and back again" applies when an operation and its inverse return to the starting state (e.g., serialize/deserialize)
+5. "Some things never change" applies when a transformation preserves an invariant (e.g., sort preserves multiset of elements)
+6. "Hard to prove, easy to verify" applies when finding the answer is complex but checking it is simple (e.g., prime factorisation — hard to find, easy to multiply back)
+7. "The test oracle" applies when an alternate (simpler, slower) implementation exists to verify the production implementation
+8. Model-based testing is a variant of the test oracle pattern: a simplified model runs in parallel with the system under test, and states are compared after each operation
+
+## Mechanism
+
+Rather than trying to enumerate all possible properties of a system, developers apply each of the seven patterns as lenses to examine the system's behaviour. Each pattern asks a different question: "Are there two ways to get the same result?" "Is there an inverse?" "What stays the same?" "What happens if I do it twice?" "Can I break it into smaller parts?" "Is the answer easy to check?" "Is there a reference implementation?" This structured approach overcomes the "blank screen" problem by providing concrete starting points for property discovery.
+
+## Relevance
+
+Essential for BDD example creation: the seven patterns provide a systematic method for deciding whether a Rule should use simple Examples or Scenario Outlines with parameterised inputs. When a pattern applies, it reveals which input combinations matter and whether the behaviour holds across a range of inputs. This directly informs the Example-vs-Scenario-Outline decision in the write-bdd-features skill. The patterns also surface missing Examples: if a pattern applies to a Rule but no Example covers the revealed combination, the specification is incomplete.
+
+## Related Research
+
+- (Claessen & Hughes, 2000) - QuickCheck: automatic testing of Haskell programs
+- (MacIver, 2016) - Hypothesis library extending property-based testing to Python
+- (Tillmann & Schulte, 2005) - PEX team at Microsoft compiled a complementary list of property patterns