test(cli): Add route honesty eval fixtures and finalize change#28
Open
thecodedrift wants to merge 6 commits into
Open
test(cli): Add route honesty eval fixtures and finalize change#28thecodedrift wants to merge 6 commits into
thecodedrift wants to merge 6 commits into
Conversation
Add a labeled request→route calibration dataset for the route recipe covering both failure directions: over-claim (Taskless grabbing a packaged/formatter job that should stay `existing`) and over-escalate (a locally-solvable request wrongly sent to the login-gated service that should stay `static`), plus genuine `remote` cases. The route decision is agent-made by following help/route.txt, so the coverage test guards the dataset's balance across routes and traps rather than running a code classifier. Closes out the change: openspec validate passes, full CLI suite green, detect + routing recipes smoke-tested end-to-end. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a labeled “route honesty” evaluation dataset for the CLI’s route recipe and a Vitest guard that keeps the dataset structurally valid and balanced over time. Also marks the “validation + quality gate” checklist items complete in the openspec change plan.
Changes:
- Add
packages/cli/test/fixtures/route-eval.jsonrequest→expected-route fixtures with trap labels. - Add
packages/cli/test/route-eval.test.tsto validate fixture structure and ensure coverage across routes/traps. - Update
openspec/changes/local-rule-routing/tasks.mdto check off validation steps and document the fixture location/use.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| packages/cli/test/route-eval.test.ts | New Vitest suite that sanity-checks fixture shape and enforces balance across routes/traps. |
| packages/cli/test/fixtures/route-eval.json | New labeled calibration dataset for routing decisions (routes + trap cases + clean remote cases). |
| openspec/changes/local-rule-routing/tasks.md | Checks off the validation/quality gate steps and references the eval dataset. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
* feat/local-rule-skill: docs(cli): Align route recipe with the three-state contract docs(cli): Show detect output shape in the route recipe docs(openspec): Clarify detect output schema is internal, not published docs(openspec): Address review feedback on local-rule-routing contract
* feat/local-rule-skill: fix(skill): Target @taskless/cli in the changeset; align tool lists docs(cli): Address review on routing recipes fix(cli): Harden detect against malformed manifests and false positives
PR #28 review: - The trap-coverage test asserted the traps were declared but never verified each trap has at least one case, so the dataset could silently stop covering a failure direction (e.g. under-engage) while the test stayed green. Now every declared trap must have >= 1 case, and under-engage cases are asserted to route to `existing`. - Fix the tasks.md note to point at the real fixture/test paths under packages/cli/test/. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
All 25 tasks are complete, so finalize the change on the tip of the stack: apply the delta specs into the main specs (new cli-detect and cli-rule-routing capabilities; cli-help and skill-taskless updates) and move the change to openspec/changes/archive/2026-06-12-local-rule-routing/. With no unarchived change remaining, the PR OpenSpec Archive Check passes for the final merged state. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This was referenced Jun 12, 2026
* feat/local-rule-skill: ci(openspec): Skip the archive check on non-tip stacked PRs (#31) docs(openspec): Clarify detect output schema is internal, not published docs(openspec): Address review feedback on local-rule-routing contract docs(openspec): Propose local-rule-routing change chore(skill): Refine the stacked-PR archive-check guidance chore(skill): Scope iterate-pr archive check to the stack tip chore(config): Allow git-town in project settings
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add the calibration dataset for the
routerecipe's decisions and finalize the change at the tip of the stack.test/fixtures/route-eval.jsonis a labeled request→route dataset covering both failure directions the routing contract must defend:existing;static;existing;remotecases that genuinely need the service.The coverage test guards the dataset's balance — every route represented, every declared trap populated and correctly routed, remote cases clean — so it stays usable for a future eval harness. The route decision itself is agent-made (recipe-followed), not run against a code classifier.
Finalization: since all 25 tasks are complete, this PR also archives the OpenSpec change — applying the delta specs into the main specs (new
cli-detectandcli-rule-routingcapabilities;cli-helpandskill-tasklessupdates) and moving the change toopenspec/changes/archive/2026-06-12-local-rule-routing/. With no unarchived change remaining, the OpenSpec Archive Check passes for the final merged state.Tip of the stack on #27. Merge the stack bottom-up; this PR lands last.
Refs TSKL Runtime Rules
Stack generated by Git Town