fix: prevent beehave noise patterns — title rules, literal guidance, output column patterns#164
Merged
Merged
Conversation
added 3 commits
May 19, 2026 08:06
…h verification, todo discipline, property patterns for BDD examples Post-session analysis of cex-mm project revealed three systemic failures: 1. Orchestrator routinely bypasses owner dispatch and does work directly. The todo template had no Dispatch step — it jumped from Preparation to Load Skills, so the orchestrator never saw the instruction to dispatch. Fix: added explicit Dispatch step (#2) in todo template with MUST NOT do the work itself constraint and owner mapping table. 2. Branch discipline not enforced at state entry. Agents entered states declaring git:dev while on feature branches and vice versa. Fix: Preparation step now verifies branch matches attrs.git. Golden rule 7 now says 'Verify before starting'. New golden rule 8: feature branches must be merged back to dev before new work starts. 3. Todo list goes stale or disappears mid-state as agents focus on work. Fix: added Todo discipline paragraph requiring update after every step and regeneration if missing. 4. Review-gate skill loaded smell-catalogue at #key-takeaways but detecting violations needs the full document (per progressive knowledge loading rules in AGENTS.md). Fix: step 5 now loads full docs for detection, #key-takeaways only for recall. 5. No guidance for choosing Example vs Scenario Outline during BDD example creation. Agents either over-used Scenario Outlines or under-used them. Fix: added property-patterns knowledge file (Wlaschin, 2014) with seven patterns and a decision tree. Updated write-bdd-features skill step 4 to apply patterns systematically. Added research reference.
…output column patterns Post-mortem analysis of two features (rate limit buckets, cache history) revealed recurring patterns where agents add noise to satisfy beehave checks instead of writing natural test code: 1. Title special characters: POs write titles with hyphens, periods, underscores which break slug generation. Added explicit 'ONLY Unicode letters, digits, and spaces' rule to Key Takeaways, Concepts, Title Conventions, and Common Mistakes in gherkin.md. 2. Literal awareness: POs write display-oriented literals (e.g. 'United States of America') that are awkward for tests. Added guidance in Key Takeaways and Concepts: choose short, code-friendly literals. 3. Output columns and Hypothesis: Scenario Outline output columns (like expected=min(a,b)) are generated as @given strategies. Agents either remove them (breaking example-mismatch) or add noise (_=expected). Documented the reassignment pattern in gherkin.md Concepts and test-stubs.md Concepts. 4. String literal helpers: Documented the _pair_from('FOO/BAR') helper pattern in test-stubs.md. Agents were stuffing literals into assert messages or assigning to _. 5. write-test skill step 3: Added explicit instructions for output column reassignment and string literal helpers. Added 'NEVER stuff literals into assert messages' prohibition. 6. write-bdd-features skill step 9: Added beehave check at write time to catch title/placeholder issues before downstream stub generation.
Anti-patterns (output column reassignment, helper functions for literals, code-friendly literal guidance) treated symptoms — the test working around awkward spec values. Root cause: every value in a spec should carry domain meaning, and both PO and SE must preserve that meaning. Introduces Spec Value Fidelity as a first-class concept in test-design: values exist with domain intent; tests must reflect that intent; noise patterns (assigning to _, assert stuffing, helpers for traceability) violate this fidelity. Cascades to gherkin.md (PO: meaningful literals and columns), test-stubs.md (SE: use values for their domain purpose), and both skills (write-bdd-features adds semantic value check, write-test replaces workaround bullets).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Post-mortem analysis of two features (rate limit buckets, cache orderbook history) revealed recurring patterns where agents add noise to satisfy beehave checks instead of writing natural test code. Six changes across 4 files:
Title special characters (
gherkin.md): POs wrote titles with hyphens/periods/underscores which break slug generation. Added explicit "ONLY Unicode letters, digits, and spaces" rule to Key Takeaways, Concepts, Title Conventions, and Common Mistakes.Literal awareness (
gherkin.md): POs wrote display-oriented literals that are awkward for tests. Added guidance: choose short, code-friendly literals.Output columns and Hypothesis (
gherkin.md,test-stubs.md): Scenario Outline output columns likeexpected=min(a,b)are generated as@givenstrategies. Documented the reassignment pattern — the@givenvalue is noise, overridden in the test body.String literal helpers (
test-stubs.md): Documented the helper pattern (e.g.,base, quote = "FOO/BAR".split("/")) so literals appear naturally instead of being stuffed into assert messages.write-test skill step 3 (
write-test/SKILL.md): Added explicit instructions for output column reassignment and string literal helpers. Added "NEVER stuff literals into assert messages" prohibition.write-bdd-features skill step 9 (
write-bdd-features/SKILL.md): Addedbeehave checkat write time to catch title/placeholder issues before downstream stub generation.Files Changed
.opencode/knowledge/requirements/gherkin.md.opencode/knowledge/software-craft/test-stubs.md.opencode/skills/write-test/SKILL.md.opencode/skills/write-bdd-features/SKILL.mdTesting
Applied during live cex-mm session. The cache_orderbook_history feature exposed all three failure modes. After fixes,
beehave checkand all tests pass clean with zero noise assertions.