feat(0.30.0): FindingSubject — typed grammar + parser + Zod boundary by drewstone · Pull Request #61 · tangle-network/agent-eval

drewstone · 2026-05-20T09:24:59Z

Summary

Closes the substrate gap that made every per-vertical `ImprovementAdapter` dead code: the analyst kinds' actor prompts documented a subject grammar (`agent-knowledge:wiki:`, `system-prompt:

`, etc.) but `RawAnalystFindingSchema.subject` was an unvalidated optional string. The LLM could emit prose like `subject: "fix the prompt"` and downstream `startsWith(...)` routing in consumers silently dropped it.

This PR makes the grammar load-bearing — every emitted subject is parsed at the schema boundary; non-conforming rows fail loud (logged + skipped) rather than being lifted with free-form text.

What changes

`src/analyst/finding-subject.ts` — discriminated-union `FindingSubject`, `parseFindingSubject(raw)` parser, `renderFindingSubject(s)` inverse, `FINDING_SUBJECT_GRAMMAR_PROMPT` constant kinds embed as the single source of truth. 14 variants cover every locus the substrate routes on (knowledge `{wiki,claim,raw,stale}`, prompt/tool/scaffolding surfaces, stale signals, cluster labels).
`KIND_EXPECTED_SUBJECTS` — per-kind allow-list. failure-mode emits ONLY `cluster`; knowledge-gap can't sneak in a `system-prompt:*` (the improvement-analyst's job); improvement can't emit stale signals.
`RawAnalystFindingSchema.subject` — Zod `.refine` that runs the parser at parse time.
`kind-factory.ts` — after `parseRawFinding`, the factory enforces the per-kind allow-list. Wrong-kind subjects are logged + counted in `rejected_wrong_subject`.
Existing `'tool:foo'` test fixtures updated to canonical `'tool-doc:foo'`.

Why this matters

Downstream substrate work in agent-runtime (`defineAgent` + manifest-driven `ImprovementAdapter`) and per-vertical wiring (tax / legal / gtm / creative / N future verticals) all narrow on `FindingSubject['kind']` instead of fragile prefix matching. No silent skips. No fabricated paths. No theater.

Test plan

`pnpm test` — 1196/1196 pass (38 new cases)
`pnpm typecheck`
Bumps npm + pypi to 0.30.0

Closes the substrate gap that turned every per-vertical ImprovementAdapter into dead code: the analyst kinds' actor prompts documented a subject grammar (`agent-knowledge:wiki:<slug>`, `system-prompt:<section>`, ...) but `subject` was an unvalidated `z.string().optional()` and the LLM could emit prose like `subject: "fix the prompt"` which downstream `startsWith(...)` routing silently dropped. This PR makes the grammar load-bearing: 1. **`src/analyst/finding-subject.ts`** — discriminated-union `FindingSubject`, `parseFindingSubject(raw)` parser, `renderFindingSubject(s)` inverse, and a `FINDING_SUBJECT_GRAMMAR_PROMPT` constant kinds can embed as the single source of truth. Variants cover every locus the substrate routes on: - `agent-knowledge:{wiki,claim,raw,stale}:<locus>` → `KnowledgeAdapter` - `system-prompt`, `tool-doc`, `new-tool`, `rag`, `memory`, `scaffolding`, `output-schema` → `ImprovementAdapter` - `websearch.outdated`, `prior-run-summary` → stale signals - `cluster` → failure-mode-only free-form labels Slugs / tool ids are constrained to `[a-z0-9-]+`; topics / sections / keys allow free-form text trimmed. 2. **`KIND_EXPECTED_SUBJECTS`** — per-kind allow-list. failure-mode emits ONLY `cluster`; knowledge-gap can't sneak in a `system-prompt:*` (the improvement-analyst's job); improvement can't emit stale signals. Enforced at the kind factory boundary. 3. **`RawAnalystFindingSchema.subject`** — Zod `.refine` that runs the parser. Malformed subjects fail the row at Zod parse time with a clear log message instead of being silently lifted with a free-form string. 4. **`kind-factory.ts`** — after `parseRawFinding`, the factory checks the parsed subject against the kind's allow-list. Wrong-kind subjects (e.g. an improvement finding pointing at `cluster:foo`) are logged + counted in `rejected_wrong_subject` and excluded from `out`. Visible to operators in the `analyst.kind <id> done` log line. 5. **Tests**: 38 new cases on `parseFindingSubject` cover every variant (positive + malformed), boundary inputs (null / empty / whitespace / prose), round-trip via `renderFindingSubject`, and the `KIND_EXPECTED_SUBJECTS` truth table (failure-mode is the ONLY kind that emits cluster; improvement excludes stale signals; etc.). Updated the legacy `'tool:foo'` fixtures in `kinds.test.ts` to canonical `'tool-doc:foo'`. Result: every downstream consumer (agent-runtime's `KnowledgeAdapter` / `ImprovementAdapter`, per-vertical wiring) can now narrow on `FindingSubject['kind']` instead of `startsWith('agent-knowledge:wiki:')` — no more silent skips, no more fabricated paths, no more theater. Tests: 1196/1196 pass (38 new). Typecheck clean. Bumps to 0.30.0 (npm + pypi + python `__version__`).

drewstone merged commit 29ca3d2 into main May 20, 2026
1 check failed

drewstone mentioned this pull request May 20, 2026

ci: self-hosted runner + biome fix for finding-subject #62

Merged

drewstone added a commit that referenced this pull request May 20, 2026

style: biome fix for finding-subject.ts (post-#61 cleanup)

e1a7ece

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(0.30.0): FindingSubject — typed grammar + parser + Zod boundary#61

feat(0.30.0): FindingSubject — typed grammar + parser + Zod boundary#61
drewstone merged 1 commit into
mainfrom
feat/finding-subject-enforcement

drewstone commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

drewstone commented May 20, 2026

Summary

What changes

Why this matters

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant