nullhack · nullhack · May 20, 2026 · May 19, 2026 · May 19, 2026 · May 19, 2026
diff --git a/.cache/interview-notes/IN_20260520_case_insensitive_matching.md b/.cache/interview-notes/IN_20260520_case_insensitive_matching.md
@@ -0,0 +1,84 @@
+# Interview Notes: Case-Insensitive Matching
+
+**Session:** IN_20260520_case_insensitive_matching
+**Date:** 2026-05-20
+**Stakeholder:** Product Owner (adversarial code review)
+
+---
+
+## Pain Points
+
+| # | Description | Severity | Location |
+|---|-------------|----------|----------|
+| 18 | Negative numbers invisible — `_extract_body_nodes` misses `UnaryOp(USub(), Constant(n))` | High | `discover.py:44-48` |
+| 19 | Quoted placeholder double-capture — `"<name>"` extracted as both Placeholder and Literal | Medium | `gherkin.py:88-96` |
+| 20 | Quoted bracket notation captured as literal — `"[PHONE]"` becomes Literal when intent is markup | Medium | `gherkin.py:88-96` |
+| 22 | Type mismatch — Gherkin `int(77000)` vs AST `str("77000")` from `Decimal("77000")` | High | `check.py:55` |
+
+## Business Goals
+
+1. **Case-insensitive matching for placeholders and literals.** `<Dog>` in Gherkin must match `dog`, `DOG`, `Dog` in test body. `"Rex"` in Gherkin must match `"rex"`, `"Rex"`, `"REX"` in test body.
+
+## Formal Rules
+
+### R1 — Placeholder Extraction
+A `<token>` in step text is a Placeholder iff `token` is a valid Python identifier, not a Python keyword, not a Python builtin. Placeholder regex matches regardless of surrounding quotes. Duplicate placeholders within a step are deduplicated.
+
+### R2 — Numeric Literal Extraction
+A bare token in step text is a numeric Literal iff it matches `^-?\d+$`.
+
+### R3 — String Literal Extraction
+A quoted segment (`"..."` or `'...'`) is a string Literal with content extracted as-is between quotes. Exception: `<...>` inside quotes is skipped (already captured as Placeholder via R1). `[...]` inside quotes is captured verbatim as a literal value.
+
+### R4 — AST Body Constant Extraction
+`_extract_body_nodes` collects: (a) `ast.Constant` values directly, (b) folded `UnaryOp(USub(), Constant(n))` → `-n`. Leading docstring expression is excluded.
+
+### R5 — Placeholder Comparison (case-insensitive)
+A placeholder `ph` matches iff `ph.name.lower()` is in `{n.lower() for n in ti.body_name_nodes}`.
+
+### R6 — Literal Comparison (string-normalized, case-insensitive)
+A literal `lit` matches iff `str(lit.value).lower()` is in `{str(c).lower() for c in ti.body_constant_nodes}`.
+
+## Domain Terms
+
+| Term | Definition |
+|------|-----------|
+| Placeholder | `<name>` in Gherkin step text, mapped to Hypothesis strategy parameter |
+| Literal | Numeric token or quoted string in Gherkin step text, must appear in test body |
+| body_name_nodes | All `ast.Name` identifiers in test function body (after docstring exclusion) |
+| body_constant_nodes | All `ast.Constant` values in test function body (after docstring exclusion, plus folded UnaryOp) |
+| Case-insensitive matching | Comparison normalizes both sides to lowercase string form |
+
+## Edge Cases
+
+| Case | Expected |
+|------|----------|
+| `-2010` in Gherkin, `x = -2010` in body | Match (#18 fix) |
+| `-3.14` in Gherkin, `x = -3.14` in body | Match |
+| `"<phone>"` in Gherkin step with `Scenario Outline` | Placeholder extracted, literal skipped (#19 fix) |
+| `"[PHONE]"` in Gherkin, body has `"555-1234"` | `[PHONE]` is a literal matching literal `[PHONE]`; user writes different value → missing-literal (correct — user should use placeholders for dynamic values) |
+| `"Rex"` in Gherkin, `"rex"` in body | Match (case-insensitive) |
+| `<Dog>` in Gherkin, `Dog` class in body | Match (case-insensitive) |
+| `77000` in Gherkin, `Decimal("77000")` in body | Match (#22 fix via string normalization) |
+| `1` in Gherkin, `True` in body | No match — `"1" != "true"` |
+| Leading docstring in test body | Excluded from constant collection (existing behavior, unchanged) |
+| Stub test bodies | Skipped entirely (existing behavior, unchanged) |
+
+## Files Affected
+
+| File | Change |
+|------|--------|
+| `beehave/discover.py` | `_extract_body_nodes`: fold UnaryOp (#18) |
+| `beehave/gherkin.py` | `_extract_literals`: filter `<...>` from quoted captures (#19, #20) |
+| `beehave/check.py` | `_check_placeholders`: case-insensitive (R5); `_check_literals`: string-normalized case-insensitive (R6, fixes #22, hardens #18) |
+| `tests/` | New edge case tests for all 4 bugs + case variations |
+
+## Scope
+
+Single feature. Changes localized to extraction and comparison functions within Feature Parsing, Consistency Checking, and Test Discovery bounded contexts. No new bounded contexts, no cross-cutting concerns, no new dependencies.
+
+## Quality Attributes
+
+- **Correctness:** Deterministic comparison — same inputs always yield same result
+- **Reliability:** No false positives (existing test suite guards against regression)
+- **Simplicity:** string-based comparison replaces type-based + multiple special cases
diff --git a/.cache/sim/case_insensitive_matching/walkthrough_01_in.json b/.cache/sim/case_insensitive_matching/walkthrough_01_in.json
@@ -0,0 +1,13 @@
+{
+  "context": "Feature Parsing",
+  "walkthrough": 1,
+  "type": "happy-path",
+  "rule": "R1 — Placeholder Extraction",
+  "description": "Plain-text token extraction produces valid Placeholder",
+  "input": {
+    "step_text": "Given a dog named <name>",
+    "known_builtins": ["int", "str", "list"],
+    "known_keywords": ["class", "def", "if"]
+  },
+  "initial_state": "parse_feature() processing a Scenario step"
+}
diff --git a/.cache/sim/case_insensitive_matching/walkthrough_01_out.json b/.cache/sim/case_insensitive_matching/walkthrough_01_out.json
@@ -0,0 +1,11 @@
+{
+  "context": "Feature Parsing",
+  "walkthrough": 1,
+  "expected": {
+    "placeholders": [
+      {"name": "name", "raw": "<name>"}
+    ],
+    "literals": []
+  },
+  "verification": "name is a valid Python identifier, not keyword, not builtin → Placeholder extracted"
+}
diff --git a/.cache/sim/case_insensitive_matching/walkthrough_02_in.json b/.cache/sim/case_insensitive_matching/walkthrough_02_in.json
@@ -0,0 +1,13 @@
+{
+  "context": "Feature Parsing",
+  "walkthrough": 2,
+  "type": "edge-case",
+  "rule": "R1 — Placeholder inside quotes still extracted",
+  "description": "Placeholder regex matches regardless of surrounding quotes",
+  "input": {
+    "step_text": "Given a user named \"<username>\"",
+    "known_builtins": ["int", "str"],
+    "known_keywords": ["class", "def"]
+  },
+  "initial_state": "Scenario Outline step with quoted placeholder"
+}
diff --git a/.cache/sim/case_insensitive_matching/walkthrough_02_out.json b/.cache/sim/case_insensitive_matching/walkthrough_02_out.json
@@ -0,0 +1,11 @@
+{
+  "context": "Feature Parsing",
+  "walkthrough": 2,
+  "expected": {
+    "placeholders": [
+      {"name": "username", "raw": "<username>"}
+    ],
+    "literals": []
+  },
+  "verification": "Placeholder regex fires first (matches regardless of quotes); literal extraction skips <...> inside quotes per R3 exception"
+}
diff --git a/.cache/sim/case_insensitive_matching/walkthrough_03_in.json b/.cache/sim/case_insensitive_matching/walkthrough_03_in.json
@@ -0,0 +1,13 @@
+{
+  "context": "Feature Parsing",
+  "walkthrough": 3,
+  "type": "edge-case",
+  "rule": "R1 — Python keyword rejection",
+  "description": "Token that is a Python keyword is NOT extracted as Placeholder",
+  "input": {
+    "step_text": "Given we use the <class> instance",
+    "known_keywords": ["class", "def", "if", "else", "for", "while"],
+    "known_builtins": ["int", "str"]
+  },
+  "initial_state": "Keyword token in step text"
+}
diff --git a/.cache/sim/case_insensitive_matching/walkthrough_03_out.json b/.cache/sim/case_insensitive_matching/walkthrough_03_out.json
@@ -0,0 +1,9 @@
+{
+  "context": "Feature Parsing",
+  "walkthrough": 3,
+  "expected": {
+    "placeholders": [],
+    "literals": []
+  },
+  "verification": "<class> is a Python keyword → rejected, not a Placeholder"
+}
diff --git a/.cache/sim/case_insensitive_matching/walkthrough_04_in.json b/.cache/sim/case_insensitive_matching/walkthrough_04_in.json
@@ -0,0 +1,13 @@
+{
+  "context": "Feature Parsing",
+  "walkthrough": 4,
+  "type": "edge-case",
+  "rule": "R1 — Python builtin rejection",
+  "description": "Token that is a Python builtin is NOT extracted as Placeholder",
+  "input": {
+    "step_text": "Given the value is <int>",
+    "known_keywords": ["class", "def"],
+    "known_builtins": ["int", "str", "list", "dict", "float"]
+  },
+  "initial_state": "Builtin token in step text"
+}
diff --git a/.cache/sim/case_insensitive_matching/walkthrough_04_out.json b/.cache/sim/case_insensitive_matching/walkthrough_04_out.json
@@ -0,0 +1,9 @@
+{
+  "context": "Feature Parsing",
+  "walkthrough": 4,
+  "expected": {
+    "placeholders": [],
+    "literals": []
+  },
+  "verification": "<int> is a Python builtin → rejected, not a Placeholder"
+}
diff --git a/.cache/sim/case_insensitive_matching/walkthrough_05_in.json b/.cache/sim/case_insensitive_matching/walkthrough_05_in.json
@@ -0,0 +1,13 @@
+{
+  "context": "Feature Parsing",
+  "walkthrough": 5,
+  "type": "edge-case",
+  "rule": "R1 — Duplicate placeholder deduplication",
+  "description": "Same <token> twice in one step produces one Placeholder",
+  "input": {
+    "step_text": "Given <name> meets <name>",
+    "known_keywords": ["class", "def"],
+    "known_builtins": ["int", "str"]
+  },
+  "initial_state": "Step with repeated placeholder token"
+}
diff --git a/.cache/sim/case_insensitive_matching/walkthrough_05_out.json b/.cache/sim/case_insensitive_matching/walkthrough_05_out.json
@@ -0,0 +1,11 @@
+{
+  "context": "Feature Parsing",
+  "walkthrough": 5,
+  "expected": {
+    "placeholders": [
+      {"name": "name", "raw": "<name>"}
+    ],
+    "literals": []
+  },
+  "verification": "Duplicates deduplicated — single Placeholder for <name> despite appearing twice"
+}
diff --git a/.cache/sim/case_insensitive_matching/walkthrough_06_in.json b/.cache/sim/case_insensitive_matching/walkthrough_06_in.json
@@ -0,0 +1,13 @@
+{
+  "context": "Feature Parsing",
+  "walkthrough": 6,
+  "type": "edge-case",
+  "rule": "R1 — Case-distinct placeholders",
+  "description": "<ID> and <id> are two distinct Placeholders",
+  "input": {
+    "step_text": "Given product <ID> is also known as <id>",
+    "known_keywords": ["class", "def"],
+    "known_builtins": ["int", "str"]
+  },
+  "initial_state": "Step with case-variant placeholder tokens"
+}
diff --git a/.cache/sim/case_insensitive_matching/walkthrough_06_out.json b/.cache/sim/case_insensitive_matching/walkthrough_06_out.json
@@ -0,0 +1,12 @@
+{
+  "context": "Feature Parsing",
+  "walkthrough": 6,
+  "expected": {
+    "placeholders": [
+      {"name": "ID", "raw": "<ID>"},
+      {"name": "id", "raw": "<id>"}
+    ],
+    "literals": []
+  },
+  "verification": "Both extracted — extraction is case-sensitive; case-insensitive matching at comparison stage handles both"
+}
diff --git a/.cache/sim/case_insensitive_matching/walkthrough_07_in.json b/.cache/sim/case_insensitive_matching/walkthrough_07_in.json
@@ -0,0 +1,11 @@
+{
+  "context": "Feature Parsing",
+  "walkthrough": 7,
+  "type": "happy-path",
+  "rule": "R2 — Numeric literal extraction",
+  "description": "Bare integer token extracted as numeric Literal",
+  "input": {
+    "step_text": "Given 3 items in the cart"
+  },
+  "initial_state": "Step with numeric token"
+}
diff --git a/.cache/sim/case_insensitive_matching/walkthrough_07_out.json b/.cache/sim/case_insensitive_matching/walkthrough_07_out.json
@@ -0,0 +1,11 @@
+{
+  "context": "Feature Parsing",
+  "walkthrough": 7,
+  "expected": {
+    "placeholders": [],
+    "literals": [
+      {"value": 3, "raw": "3", "type": "numeric"}
+    ]
+  },
+  "verification": "Token '3' matches ^-?\\d+$ → numeric Literal with int value"
+}
diff --git a/.cache/sim/case_insensitive_matching/walkthrough_08_in.json b/.cache/sim/case_insensitive_matching/walkthrough_08_in.json
@@ -0,0 +1,11 @@
+{
+  "context": "Feature Parsing",
+  "walkthrough": 8,
+  "type": "edge-case",
+  "rule": "R2 — Negative numeric literal",
+  "description": "Bare negative integer token extracted as numeric Literal",
+  "input": {
+    "step_text": "Given the balance is -2010"
+  },
+  "initial_state": "Step with negative numeric token"
+}
diff --git a/.cache/sim/case_insensitive_matching/walkthrough_08_out.json b/.cache/sim/case_insensitive_matching/walkthrough_08_out.json
@@ -0,0 +1,11 @@
+{
+  "context": "Feature Parsing",
+  "walkthrough": 8,
+  "expected": {
+    "placeholders": [],
+    "literals": [
+      {"value": -2010, "raw": "-2010", "type": "numeric"}
+    ]
+  },
+  "verification": "Token '-2010' matches ^-?\\d+$ → numeric Literal. Extract step is correct; the matching failure was in Test Discovery (bug #18, no UnaryOp folding)"
+}
diff --git a/.cache/sim/case_insensitive_matching/walkthrough_09_in.json b/.cache/sim/case_insensitive_matching/walkthrough_09_in.json
@@ -0,0 +1,11 @@
+{
+  "context": "Feature Parsing",
+  "walkthrough": 9,
+  "type": "happy-path",
+  "rule": "R3 — String literal extraction",
+  "description": "Double-quoted string extracted as string Literal",
+  "input": {
+    "step_text": "Given a dog named \"Rex\""
+  },
+  "initial_state": "Step with quoted string"
+}
diff --git a/.cache/sim/case_insensitive_matching/walkthrough_09_out.json b/.cache/sim/case_insensitive_matching/walkthrough_09_out.json
@@ -0,0 +1,11 @@
+{
+  "context": "Feature Parsing",
+  "walkthrough": 9,
+  "expected": {
+    "placeholders": [],
+    "literals": [
+      {"value": "Rex", "raw": "\"Rex\"", "type": "string"}
+    ]
+  },
+  "verification": "Content 'Rex' extracted as-is between quotes. Case preserved — matching is case-insensitive at comparison stage"
+}
diff --git a/.cache/sim/case_insensitive_matching/walkthrough_10_in.json b/.cache/sim/case_insensitive_matching/walkthrough_10_in.json
@@ -0,0 +1,11 @@
+{
+  "context": "Feature Parsing",
+  "walkthrough": 10,
+  "type": "edge-case",
+  "rule": "R3 — Quoted bracket notation captured verbatim",
+  "description": "[...] inside quotes captured as literal content verbatim (bug #20 decision)",
+  "input": {
+    "step_text": "Given a phone number \"[PHONE]\""
+  },
+  "initial_state": "Step with bracket-delimited content in quotes"
+}
diff --git a/.cache/sim/case_insensitive_matching/walkthrough_10_out.json b/.cache/sim/case_insensitive_matching/walkthrough_10_out.json
@@ -0,0 +1,11 @@
+{
+  "context": "Feature Parsing",
+  "walkthrough": 10,
+  "expected": {
+    "placeholders": [],
+    "literals": [
+      {"value": "[PHONE]", "raw": "\"[PHONE]\"", "type": "string"}
+    ]
+  },
+  "verification": "[...] is captured verbatim per user decision (not filtered). If user wants dynamic value they should use <phone> placeholder instead. This is NOT a bug."
+}
diff --git a/.cache/sim/case_insensitive_matching/walkthrough_11_in.json b/.cache/sim/case_insensitive_matching/walkthrough_11_in.json
@@ -0,0 +1,13 @@
+{
+  "context": "Feature Parsing",
+  "walkthrough": 11,
+  "type": "bug-before",
+  "rule": "Bug #19 — Quoted placeholder double-capture BEFORE fix",
+  "description": "\"<name>\" in step text currently produces both Placeholder AND Literal",
+  "input": {
+    "step_text": "Given a user named \"<name>\" in scenario outline <name>",
+    "known_keywords": [],
+    "known_builtins": []
+  },
+  "initial_state": "Step with placeholder inside double quotes (existing code)"
+}
diff --git a/.cache/sim/case_insensitive_matching/walkthrough_11_out.json b/.cache/sim/case_insensitive_matching/walkthrough_11_out.json
@@ -0,0 +1,15 @@
+{
+  "context": "Feature Parsing",
+  "walkthrough": 11,
+  "bug": true,
+  "bug_id": 19,
+  "expected_before_fix": {
+    "placeholders": [
+      {"name": "name", "raw": "<name>"}
+    ],
+    "literals": [
+      {"value": "<name>", "raw": "\"<name>\"", "type": "string"}
+    ]
+  },
+  "problem": "Literal extraction does not filter <...> from quoted strings. The literal '<name>' should not exist — the placeholder already captures this semantic. This produces a false positive missing-literal violation because '<name>' is not a real constant in the test body."
+}
diff --git a/.cache/sim/case_insensitive_matching/walkthrough_12_in.json b/.cache/sim/case_insensitive_matching/walkthrough_12_in.json
@@ -0,0 +1,13 @@
+{
+  "context": "Feature Parsing",
+  "walkthrough": 12,
+  "type": "bug-after",
+  "rule": "Bug #19 — Quoted placeholder double-capture AFTER fix",
+  "description": "\"<name>\" in step text: placeholder extracted, literal skipped",
+  "input": {
+    "step_text": "Given a user named \"<name>\" in scenario outline <name>",
+    "known_keywords": [],
+    "known_builtins": []
+  },
+  "initial_state": "Step with placeholder inside double quotes (fixed code)"
+}
diff --git a/.cache/sim/case_insensitive_matching/walkthrough_12_out.json b/.cache/sim/case_insensitive_matching/walkthrough_12_out.json
@@ -0,0 +1,12 @@
+{
+  "context": "Feature Parsing",
+  "walkthrough": 12,
+  "expected_after_fix": {
+    "placeholders": [
+      {"name": "name", "raw": "<name>"}
+    ],
+    "literals": []
+  },
+  "fix_applied": "Literal extraction filters out any quoted content matching <...> pattern before creating Literal objects. Placeholder regex already captures it. No more double-capture.",
+  "verification": "Literal list is empty — <...> inside quotes is excluded from literal extraction per R3 exception"
+}