Skip to content

fix: prevent beehave noise patterns — title rules, literal guidance, output column patterns#164

Merged
nullhack merged 3 commits into
mainfrom
fix/beehave-test-patterns-post-mortem
May 19, 2026
Merged

fix: prevent beehave noise patterns — title rules, literal guidance, output column patterns#164
nullhack merged 3 commits into
mainfrom
fix/beehave-test-patterns-post-mortem

Conversation

@nullhack
Copy link
Copy Markdown
Owner

Summary

Post-mortem analysis of two features (rate limit buckets, cache orderbook history) revealed recurring patterns where agents add noise to satisfy beehave checks instead of writing natural test code. Six changes across 4 files:

  1. Title special characters (gherkin.md): POs wrote titles with hyphens/periods/underscores which break slug generation. Added explicit "ONLY Unicode letters, digits, and spaces" rule to Key Takeaways, Concepts, Title Conventions, and Common Mistakes.

  2. Literal awareness (gherkin.md): POs wrote display-oriented literals that are awkward for tests. Added guidance: choose short, code-friendly literals.

  3. Output columns and Hypothesis (gherkin.md, test-stubs.md): Scenario Outline output columns like expected=min(a,b) are generated as @given strategies. Documented the reassignment pattern — the @given value is noise, overridden in the test body.

  4. String literal helpers (test-stubs.md): Documented the helper pattern (e.g., base, quote = "FOO/BAR".split("/")) so literals appear naturally instead of being stuffed into assert messages.

  5. write-test skill step 3 (write-test/SKILL.md): Added explicit instructions for output column reassignment and string literal helpers. Added "NEVER stuff literals into assert messages" prohibition.

  6. write-bdd-features skill step 9 (write-bdd-features/SKILL.md): Added beehave check at write time to catch title/placeholder issues before downstream stub generation.

Files Changed

File Change
.opencode/knowledge/requirements/gherkin.md Title chars, literal guidance, output columns, Hypothesis constraint
.opencode/knowledge/software-craft/test-stubs.md Derived output columns, string literal helpers
.opencode/skills/write-test/SKILL.md Step 3: output column reassignment, literal helper patterns
.opencode/skills/write-bdd-features/SKILL.md Step 9: beehave check at write time

Testing

Applied during live cex-mm session. The cache_orderbook_history feature exposed all three failure modes. After fixes, beehave check and all tests pass clean with zero noise assertions.

Ubuntu added 3 commits May 19, 2026 08:06
…h verification, todo discipline, property patterns for BDD examples

Post-session analysis of cex-mm project revealed three systemic failures:

1. Orchestrator routinely bypasses owner dispatch and does work directly.
   The todo template had no Dispatch step — it jumped from Preparation
   to Load Skills, so the orchestrator never saw the instruction to dispatch.
   Fix: added explicit Dispatch step (#2) in todo template with MUST NOT
   do the work itself constraint and owner mapping table.

2. Branch discipline not enforced at state entry. Agents entered states
   declaring git:dev while on feature branches and vice versa.
   Fix: Preparation step now verifies branch matches attrs.git. Golden
   rule 7 now says 'Verify before starting'. New golden rule 8: feature
   branches must be merged back to dev before new work starts.

3. Todo list goes stale or disappears mid-state as agents focus on work.
   Fix: added Todo discipline paragraph requiring update after every step
   and regeneration if missing.

4. Review-gate skill loaded smell-catalogue at #key-takeaways but
   detecting violations needs the full document (per progressive
   knowledge loading rules in AGENTS.md).
   Fix: step 5 now loads full docs for detection, #key-takeaways only
   for recall.

5. No guidance for choosing Example vs Scenario Outline during BDD
   example creation. Agents either over-used Scenario Outlines or
   under-used them.
   Fix: added property-patterns knowledge file (Wlaschin, 2014) with
   seven patterns and a decision tree. Updated write-bdd-features skill
   step 4 to apply patterns systematically. Added research reference.
…output column patterns

Post-mortem analysis of two features (rate limit buckets, cache history)
revealed recurring patterns where agents add noise to satisfy beehave
checks instead of writing natural test code:

1. Title special characters: POs write titles with hyphens, periods,
   underscores which break slug generation. Added explicit 'ONLY Unicode
   letters, digits, and spaces' rule to Key Takeaways, Concepts, Title
   Conventions, and Common Mistakes in gherkin.md.

2. Literal awareness: POs write display-oriented literals (e.g.
   'United States of America') that are awkward for tests. Added guidance
   in Key Takeaways and Concepts: choose short, code-friendly literals.

3. Output columns and Hypothesis: Scenario Outline output columns
   (like expected=min(a,b)) are generated as @given strategies. Agents
   either remove them (breaking example-mismatch) or add noise (_=expected).
   Documented the reassignment pattern in gherkin.md Concepts and
   test-stubs.md Concepts.

4. String literal helpers: Documented the _pair_from('FOO/BAR') helper
   pattern in test-stubs.md. Agents were stuffing literals into assert
   messages or assigning to _.

5. write-test skill step 3: Added explicit instructions for output column
   reassignment and string literal helpers. Added 'NEVER stuff literals
   into assert messages' prohibition.

6. write-bdd-features skill step 9: Added beehave check at write time to
   catch title/placeholder issues before downstream stub generation.
Anti-patterns (output column reassignment, helper functions for literals,
code-friendly literal guidance) treated symptoms — the test working around
awkward spec values. Root cause: every value in a spec should carry domain
meaning, and both PO and SE must preserve that meaning.

Introduces Spec Value Fidelity as a first-class concept in test-design:
values exist with domain intent; tests must reflect that intent; noise
patterns (assigning to _, assert stuffing, helpers for traceability) violate
this fidelity.

Cascades to gherkin.md (PO: meaningful literals and columns), test-stubs.md
(SE: use values for their domain purpose), and both skills (write-bdd-features
adds semantic value check, write-test replaces workaround bullets).
@nullhack nullhack merged commit 33f7b2e into main May 19, 2026
10 checks passed
@nullhack nullhack deleted the fix/beehave-test-patterns-post-mortem branch May 19, 2026 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant