Skip to content

test(cli): Add route honesty eval fixtures and finalize change#28

Open
thecodedrift wants to merge 6 commits into
feat/local-rule-skillfrom
feat/local-rule-eval
Open

test(cli): Add route honesty eval fixtures and finalize change#28
thecodedrift wants to merge 6 commits into
feat/local-rule-skillfrom
feat/local-rule-eval

Conversation

@thecodedrift

@thecodedrift thecodedrift commented Jun 11, 2026

Copy link
Copy Markdown
Member

Add the calibration dataset for the route recipe's decisions and finalize the change at the tip of the stack.

test/fixtures/route-eval.json is a labeled request→route dataset covering both failure directions the routing contract must defend:

  • over-claim — Taskless grabbing a job a packaged rule/formatter already does → must stay existing;
  • over-escalate — a locally-solvable request wrongly sent to the login-gated service → must stay static;
  • under-engage — naming a tool suppressing the skill instead of engaging routing → must stay existing;
  • plus clean remote cases that genuinely need the service.

The coverage test guards the dataset's balance — every route represented, every declared trap populated and correctly routed, remote cases clean — so it stays usable for a future eval harness. The route decision itself is agent-made (recipe-followed), not run against a code classifier.

Finalization: since all 25 tasks are complete, this PR also archives the OpenSpec change — applying the delta specs into the main specs (new cli-detect and cli-rule-routing capabilities; cli-help and skill-taskless updates) and moving the change to openspec/changes/archive/2026-06-12-local-rule-routing/. With no unarchived change remaining, the OpenSpec Archive Check passes for the final merged state.

Tip of the stack on #27. Merge the stack bottom-up; this PR lands last.

Refs TSKL Runtime Rules


Stack generated by Git Town

Add a labeled request→route calibration dataset for the route recipe
covering both failure directions: over-claim (Taskless grabbing a
packaged/formatter job that should stay `existing`) and over-escalate (a
locally-solvable request wrongly sent to the login-gated service that
should stay `static`), plus genuine `remote` cases. The route decision is
agent-made by following help/route.txt, so the coverage test guards the
dataset's balance across routes and traps rather than running a code
classifier. Closes out the change: openspec validate passes, full CLI
suite green, detect + routing recipes smoke-tested end-to-end.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@thecodedrift thecodedrift marked this pull request as ready for review June 11, 2026 23:48
@thecodedrift thecodedrift requested a review from Copilot June 12, 2026 00:07

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a labeled “route honesty” evaluation dataset for the CLI’s route recipe and a Vitest guard that keeps the dataset structurally valid and balanced over time. Also marks the “validation + quality gate” checklist items complete in the openspec change plan.

Changes:

  • Add packages/cli/test/fixtures/route-eval.json request→expected-route fixtures with trap labels.
  • Add packages/cli/test/route-eval.test.ts to validate fixture structure and ensure coverage across routes/traps.
  • Update openspec/changes/local-rule-routing/tasks.md to check off validation steps and document the fixture location/use.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
packages/cli/test/route-eval.test.ts New Vitest suite that sanity-checks fixture shape and enforces balance across routes/traps.
packages/cli/test/fixtures/route-eval.json New labeled calibration dataset for routing decisions (routes + trap cases + clean remote cases).
openspec/changes/local-rule-routing/tasks.md Checks off the validation/quality gate steps and references the eval dataset.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread openspec/changes/local-rule-routing/tasks.md Outdated
Comment thread packages/cli/test/route-eval.test.ts Outdated
thecodedrift and others added 4 commits June 11, 2026 17:36
* feat/local-rule-skill:
  docs(cli): Align route recipe with the three-state contract
  docs(cli): Show detect output shape in the route recipe
  docs(openspec): Clarify detect output schema is internal, not published
  docs(openspec): Address review feedback on local-rule-routing contract
* feat/local-rule-skill:
  fix(skill): Target @taskless/cli in the changeset; align tool lists
  docs(cli): Address review on routing recipes
  fix(cli): Harden detect against malformed manifests and false positives
PR #28 review:
- The trap-coverage test asserted the traps were declared but never
  verified each trap has at least one case, so the dataset could silently
  stop covering a failure direction (e.g. under-engage) while the test
  stayed green. Now every declared trap must have >= 1 case, and
  under-engage cases are asserted to route to `existing`.
- Fix the tasks.md note to point at the real fixture/test paths under
  packages/cli/test/.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
All 25 tasks are complete, so finalize the change on the tip of the
stack: apply the delta specs into the main specs (new cli-detect and
cli-rule-routing capabilities; cli-help and skill-taskless updates) and
move the change to openspec/changes/archive/2026-06-12-local-rule-routing/.
With no unarchived change remaining, the PR OpenSpec Archive Check passes
for the final merged state.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat/local-rule-skill:
  ci(openspec): Skip the archive check on non-tip stacked PRs (#31)
  docs(openspec): Clarify detect output schema is internal, not published
  docs(openspec): Address review feedback on local-rule-routing contract
  docs(openspec): Propose local-rule-routing change
  chore(skill): Refine the stacked-PR archive-check guidance
  chore(skill): Scope iterate-pr archive check to the stack tip
  chore(config): Allow git-town in project settings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants