evalbuff: carve-based eval pipeline (delete & rebuild) by jahooma · Pull Request #487 · CodebuffAI/codebuff

jahooma · 2026-03-30T23:56:51Z

Summary

Adds a new eval approach that carves features out of the current codebase (using gpt-5.4 via OpenAI SDK) and has agents rebuild them from natural prompts, instead of replaying git commits
carve-features.ts: two-phase pipeline — plans carveable features across the codebase, then surgically removes each one producing diffs and ground truth
run-carve-eval.ts: runs N agents in parallel on carved repos, judges against original code, and iterates on docs using the existing doc-optimizer loop
Tested end-to-end on this repo: carved cli-init-command, agents scored 5.0 baseline → 5.5 after doc improvement, generated patterns/discover-before-implement.md
Also includes the doc and test artifacts from the trial run (AGENTS.md update, generated doc, carve/eval result JSONs)

Test plan

Typecheck passes (npx tsc --noEmit)
End-to-end test: bun run evalbuff/src/carve-features.ts --repo . --count 3 produced 3 carved features
End-to-end test: bun run evalbuff/src/run-carve-eval.ts --repo . --carve-file carve-2026-03-30.json --feature cli-init-command --parallelism 2 ran full loop successfully

🤖 Generated with Claude Code

…command)

New approach to evals that carves features out of the current codebase and has agents rebuild them, instead of replaying git commits. Uses OpenAI SDK (gpt-5.4) to identify and surgically remove features, then runs agents in parallel to rebuild from a natural prompt, judges against original code, and iterates on docs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…command)

- Switch carve eval inner agents to Claude SDK (sonnet) with 3 parallel runs - Update carve-features to use gpt-5.4 model - Remove auto-generated discover-before-implement.md (test artifact) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jahooma and others added 2 commits March 30, 2026 16:36

evalbuff: add patterns/discover-before-implement.md (carve: cli-init-…

d1bd453

…command)

jahooma requested review from brandonkachen and charleslien as code owners March 30, 2026 23:56

jahooma and others added 2 commits March 30, 2026 17:43

evalbuff: add patterns/discover-before-implement.md (carve: cli-init-…

e4376f9

…command)

jahooma merged commit 869f5c4 into main Mar 31, 2026
34 checks passed

jahooma deleted the jahooma/evalbuff-delete-rebuild branch March 31, 2026 17:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evalbuff: carve-based eval pipeline (delete & rebuild)#487

evalbuff: carve-based eval pipeline (delete & rebuild)#487
jahooma merged 4 commits intomainfrom
jahooma/evalbuff-delete-rebuild

jahooma commented Mar 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jahooma commented Mar 30, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant