Skip to content

test(cli): simplify churny local coverage#1289

Merged
christso merged 1 commit into
mainfrom
feature/simplify-tests-docs-main
Jun 3, 2026
Merged

test(cli): simplify churny local coverage#1289
christso merged 1 commit into
mainfrom
feature/simplify-tests-docs-main

Conversation

@christso
Copy link
Copy Markdown
Collaborator

@christso christso commented Jun 3, 2026

Summary

Local test runs now spend less time re-running the same CLI subprocess fixtures for pipeline artifacts. The pipeline command tests still protect the stable contracts users and downstream tooling depend on: manifest fields, code-grader execution, builtin contains/regex/negate grading, grading.json, index.jsonl, and benchmark.json output.

This also moves Dashboard launch-mode and threshold behavior into the docs site so end-user semantics are documented without pinning as many duplicate migration/presentation tests.

What changed

  • Collapsed repeated pipeline input, pipeline grade, and pipeline bench subprocess tests so each fixture run asserts all artifacts from that run.
  • Kept the full input -> grade -> bench smoke test intact.
  • Moved duplicated navigation helper coverage out of the CLI package and kept it beside the Dashboard route helper.
  • Reduced duplicated Dashboard threshold/config test cases while preserving canonical dashboard.threshold, legacy fallback, clamping, and save/migration coverage.
  • Updated apps/web/src/content/docs/docs/tools/dashboard.mdx with the current Projects-dashboard default and threshold config behavior.

Before/after evidence

Pipeline subprocess test reduction against origin/main:

File Tests before Tests after await execa before await execa after
apps/cli/test/commands/eval/pipeline/input.test.ts 10 4 10 4
apps/cli/test/commands/eval/pipeline/grade.test.ts 6 3 6 3
apps/cli/test/commands/eval/pipeline/bench.test.ts 5 3 5 3
apps/cli/test/commands/eval/pipeline/pipeline-e2e.test.ts 1 1 3 3

Net for these files: 22 -> 11 test cases, and 24 -> 13 subprocess invocations.

Timing report for next review

I also timed every test file individually after this change. This is not the same as the full package-level pre-push run, but it identifies where local time is concentrated.

  • Files timed: 175
  • Elapsed wall time for the per-file timing pass: 2.0 min
  • Group totals from individual file runs:
Group Files Tests Time
packages/core 112 1787 64.9s
apps/cli 46 543 51.9s
packages/phoenix-adapter 5 21 1.8s
apps/dashboard 6 39 1.1s
packages/eval 5 67 0.9s
plugins 1 7 0.2s

Slowest files after this PR:

Rank File Time Tests Notes
1 packages/core/test/evaluation/orchestrator.test.ts 9.9s 87 likely biggest remaining core target
2 packages/core/test/evaluation/workspace/pool-manager.test.ts 6.6s 31 real workspace/repo materialization
3 apps/cli/test/eval.integration.test.ts 6.1s 4 full CLI integration subprocesses
4 apps/cli/test/commands/eval/pipeline/input.test.ts 4.7s 4 still subprocess-heavy after reduction
5 apps/cli/test/commands/eval/assert.test.ts 3.9s 4 CLI subprocess assertions
6 apps/cli/test/commands/trend/trend.test.ts 3.9s 10 mix of pure and CLI coverage
7 apps/cli/test/commands/eval/pipeline/grade.test.ts 3.2s 3 still subprocess-heavy after reduction
8 apps/cli/test/commands/eval/pipeline/pipeline-e2e.test.ts 2.9s 1 full pipeline smoke
9 apps/cli/test/commands/eval/pipeline/bench.test.ts 2.9s 3 still subprocess-heavy after reduction
10 apps/cli/test/commands/results/serve.test.ts 2.7s 59 includes git remote/API integration cases

Timing limitation: packages/phoenix-adapter/test/agentv-normalize.test.ts failed only in direct single-file timing because one test resolved an example path outside this worktree. The full repo pre-push test hook passed.

Verification

  • bun run build passed. Existing Dashboard bundle-size warning remains.
  • bun test apps/cli/test/commands/eval/pipeline/input.test.ts apps/cli/test/commands/eval/pipeline/grade.test.ts apps/cli/test/commands/eval/pipeline/bench.test.ts apps/cli/test/commands/eval/pipeline/pipeline-e2e.test.ts passed: 11 tests.
  • bun test apps/cli/test/commands/results/studio-config.test.ts apps/cli/test/commands/results/serve.test.ts apps/dashboard/src/lib/navigation.test.ts passed: 86 tests.
  • bun --filter agentv typecheck passed.
  • bunx biome check <changed files> passed.
  • Pre-push hook passed: Build, Typecheck, Lint, Test, Validate eval YAML files.

Compound Engineering
Codex

@christso christso marked this pull request as ready for review June 3, 2026 07:02
@christso christso merged commit 221bdea into main Jun 3, 2026
4 checks passed
@christso christso deleted the feature/simplify-tests-docs-main branch June 3, 2026 07:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant