EntityProcess · christso · Jun 2, 2026 · Jun 2, 2026 · Jun 2, 2026 · Jun 2, 2026
diff --git a/.agents/skills/README.md b/.agents/skills/README.md
@@ -0,0 +1,15 @@
+# AgentV Coding Agent Skills
+
+This directory contains repo-local skills that teach coding agents how to work with AgentV. They are shared across compatible tools through `.agents/skills`, with `.claude/skills` symlinked here for Claude compatibility.
+
+## Skills
+
+| Skill | Description |
+| ----- | ----------- |
+| [agentv-core-development](agentv-core-development/) | Core design principles, TypeScript conventions, naming, wire-format rules, docs expectations, and project structure. |
+| [agentv-testing-verification](agentv-testing-verification/) | AgentV test strategy, CLI verification, grader e2e checks, browser verification, and pre-push behavior. |
+| [agentv-git-workflow](agentv-git-workflow/) | Beads-first decentralized orchestration, worktrees, existing PR takeover, draft PRs, and merge cleanup. |
+| [beads-execplan-issue-creator](beads-execplan-issue-creator/) | Convert approved plans into dependency-aware bead epics/tasks with acceptance criteria, verification, and invariants. |
+| [beads-epic-delivery-loop](beads-epic-delivery-loop/) | Execute a bead epic end-to-end with select, claim, implement, verify, review, commit, close, and repeat loops. |
+| [agentv-grader-changes](agentv-grader-changes/) | Grader type conventions, live eval verification, baseline updates, and score-range checks. |
+| [agentv-release-publishing](agentv-release-publishing/) | Versioning, release workflow, and package publishing. |
diff --git a/.agents/skills/agentv-core-development/SKILL.md b/.agents/skills/agentv-core-development/SKILL.md
@@ -0,0 +1,85 @@
+---
+name: agentv-core-development
+description: Use when changing AgentV core, SDK, CLI, Studio APIs, config schemas, docs, examples, or any cross-process wire format. Covers design principles, TypeScript conventions, naming, snake_case boundaries, and documentation updates.
+---
+
+# AgentV Core Development
+
+AgentV is a TypeScript monorepo for a declarative AI agent evaluation framework.
+
+## Goals
+
+- Declarative YAML eval definitions.
+- Structured, type-safe grading.
+- Multi-objective scoring for correctness, latency, cost, and safety.
+- Optimization-ready primitives without speculative built-ins.
+
+## Design Principles
+
+- Keep core lightweight and extensible through plugins.
+- Built-ins should be universal primitives: deterministic, stateless, single-purpose, and broadly useful.
+- Prefer composition over new features. If existing primitives cover a need, document the pattern instead of adding code.
+- Research peer frameworks before adding a new capability, and choose the lowest common denominator.
+- Apply YAGNI to implementation size, not just feature selection. Audit existing primitives before adding knobs, modes, precedence rules, or new invariants.
+- New fields must be optional and non-breaking.
+- Design for AI agents: intuitive primitives, self-documenting modules, concise extension recipes in file headers, and no dead speculative infrastructure.
+
+If you notice existing overengineering while working, create a Beads issue titled `cleanup: simplify X` with current behavior, simpler model, migration notes, and code links. Do not widen the current PR unless asked.
+
+## Stack
+
+- TypeScript 5.x targeting ES2022 and Node 20+.
+- Bun for all package and script operations.
+- Bun workspaces, tsup, Biome, Vitest, Vercel AI SDK, Zod.
+
+## Project Structure
+
+- `packages/core/`: evaluation engine, providers, grading, registry, programmatic API.
+- `packages/eval/`: lightweight assertion SDK.
+- `apps/cli/`: command-line interface published as `agentv`.
+- `apps/studio/`: Studio frontend.
+- `apps/web/`: documentation site.
+- `examples/`: documentation and integration coverage.
+
+## Code Editing Discipline
+
+- Revise existing files in place when the feature belongs there; avoid creating `*-v2`, `*-new`, `*-improved`, or similarly duplicative files.
+- New files are appropriate for genuinely new modules, skills, examples, or docs, but do not create throwaway variants as a substitute for understanding the existing code.
+- Avoid broad script-based rewrites of source code. For code changes, prefer targeted edits after reading enough context; scripts are acceptable for mechanical verification, generated outputs, or narrow non-code maintenance where risk is low.
+- Do not delete files or folders without explicit permission. If cleanup is needed, ask or use a reversible alternative.
+- If using a third-party library/API and you are not sure about current usage, consult current official docs before changing the integration.
+
+## TypeScript
+
+- Prefer inference over explicit types when clear.
+- Use `async`/`await`.
+- Prefer named exports.
+- Keep modules cohesive.
+- Update stale file headers when behavior changes.
+
+## Project vs Benchmark
+
+- `Project`: top-level Studio container around a registered workspace directory. Modelled by `ProjectEntry` / `ProjectRegistry` and stored in `~/.agentv/projects.yaml`.
+- `Benchmark`: curated eval suite designed to measure a capability. Example benchmark directories should keep that name.
+- Legacy `~/.agentv/benchmarks.yaml` migration and per-run `benchmark.json` artifacts are separate concepts.
+
+When in doubt: if it holds runs/traces/experiments, it is a project. If it is a curated eval suite, it is a benchmark.
+
+## Wire Format
+
+Everything crossing a process boundary uses `snake_case`. Internal TypeScript uses `camelCase`. Translate at the boundary only.
+
+Snake case surfaces include YAML, JSONL result files, artifact output, HTTP responses, CLI JSON, and anything consumed by non-TS tooling. Camel case surfaces are TypeScript variables, parameters, type members, and in-memory shapes.
+
+Use paired wire/internal interfaces and converters, following `packages/core/src/projects.ts`. Do not dump TS objects directly to YAML or JSON responses.
+
+Treat existing camelCase on disk or in responses as a bug when touching that path.
+
+## Documentation
+
+When functionality changes, update:
+
+- Docs site under `apps/web/src/content/docs/`.
+- Skills if YAML schema, grader types, or CLI commands changed.
+- Examples that exercise changed behavior.
+- README only when the high-level pointer changes.
diff --git a/.agents/skills/agentv-git-workflow/SKILL.md b/.agents/skills/agentv-git-workflow/SKILL.md
@@ -0,0 +1,155 @@
+---
+name: agentv-git-workflow
+description: Use when starting, claiming, committing, pushing, opening, updating, reviewing, merging, or cleaning up AgentV work. Covers Beads as decentralized orchestration, GitHub as collaboration surface, worktrees, draft PRs, existing PR takeover, and merge cleanup.
+---
+
+# AgentV Git Workflow
+
+## Tracking Model
+
+- Beads is the decentralized orchestration layer: task state, ownership, dependencies, discoveries, and durable project knowledge live in the bead graph.
+- GitHub is the collaboration surface: draft PRs, reviews, CI, merge coordination, and communication with other parties.
+- Interpret "do not use external issue trackers" as "do not create a second private task brain." GitHub PRs still handle code review and merge state.
+- Runtime stays lightweight: Beads tracks durable coordination state, the repo-standard bead launcher creates disposable worktree sessions, and git worktrees provide isolation. Use manual worktree setup only as a fallback when the launcher is unavailable or broken.
+
+Use Beads instead of markdown TODO lists:
+
+```bash
+bd ready --json
+bd show <id> --json
+bd create "Issue title" --description="Detailed context" -t bug|feature|task|chore|epic -p 0-4 --json
+bd update <id> --claim --json
+bd update <id> --status in_progress --json
+bd close <id> --reason "Completed" --json
+bd remember "durable project insight"
+bd dolt push
+```
+
+## Starting New Bead Work
+
+Use the repo-standard bead launcher:
+
+```bash
+ep-spawn-agent <bead-id>
+```
+
+Until a dedicated `bead-start` wrapper exists, `ep-spawn-agent <bead-id>` is the default launch path. Do not choose between multiple launch modes during normal work.
+
+The launcher should:
+
+1. read the bead with `bd show <bead-id> --json`;
+2. claim or mark it in progress;
+3. create a fresh sibling worktree from latest `origin/main`;
+4. launch the agent with bead context;
+5. write the session/worktree/branch note back to the bead.
+
+Manual fallback only when the launcher is unavailable or broken:
+
+```bash
+bd show <id> --json
+bd update <id> --claim --json
+bd update <id> --status in_progress --json
+git fetch origin
+git worktree add ../agentv.worktrees/<id> -b work/<id> origin/main
+cd ../agentv.worktrees/<id>
+bun install
+cp "$(git worktree list --porcelain | head -1 | sed 's/worktree //')/.env" .env
+codex-eng
+```
+
+## Beads Viewer
+
+`bv` is optional graph/kanban visibility for the Beads graph. For agents, never run bare `bv` because it opens the interactive TUI and blocks the session. Use robot-mode commands only:
+
+```bash
+bv --robot-next
+bv --robot-triage
+bv --robot-plan
+bv --robot-graph
+```
+
+In worktrees where `.beads` is not present, point `bv` at the canonical project Beads directory:
+
+```bash
+bv --db /home/entity/projects/EntityProcess/agentv/.beads --robot-triage
+```
+
+## Worktrees
+
+For feature, bug fix, or non-trivial repo changes, work from a dedicated sibling worktree based on latest `origin/main`. Keep the primary checkout clean; do not do feature work in the main folder.
+
+AgentV worktrees live in sibling `../agentv.worktrees/`, not `.worktrees/` inside the repo and not the primary checkout.
+
+After checking out a branch or PR, run `bun install` if `package.json` or `bun.lock` may have changed.
+
+## Existing PR Takeover
+
+When continuing an existing PR, keep the PR branch as the source of truth for code and use Beads for durable task state/handoff.
+
+1. Inspect the PR first:
+
+   ```bash
+   gh pr view <number> --json number,title,state,isDraft,headRefName,headRefOid,baseRefName,mergeStateStatus,reviewDecision,statusCheckRollup,url
+   gh pr checks <number> --watch=false
+   ```
+
+2. Check out the PR branch. If Git reports the branch is already used by another worktree, do not force it; `cd` into that existing worktree instead.
+
+   ```bash
+   gh pr checkout <number>
+   # or: cd /path/to/existing/worktree
+   ```
+
+3. Make or update a bead for the continuation if one is not already provided. Reference the PR number in the bead description or notes.
+
+   ```bash
+   bd create "Continue PR <number>: <summary>" --description="Current state, requested changes, and handoff context" -t task -p 1 --json
+   bd note <id> "Working tree: <path>; PR: https://github.com/EntityProcess/agentv/pull/<number>"
+   ```
+
+4. Push focused commits to the existing PR branch. Do not create a second PR for the same work.
+
+## Draft PRs
+
+After the first meaningful commit, push and open a draft PR. Continue pushing meaningful checkpoints.
+
+```bash
+git push -u origin HEAD
+gh pr create --draft --title "<type>(scope): summary" --body "Refs <bead-id>"
+bd note <bead-id> "Draft PR: <url>"
+```
+
+Do not push directly to `main`. The default branch is `main`; do not use or document `master` for AgentV workflows.
+
+## PR Readiness
+
+Keep draft until verification evidence is complete: unit tests, test plan evidence, manual red/green UAT for user-facing changes, CI green, no conflicts, and final review pass when warranted.
+
+Before marking ready:
+
+```bash
+gh pr checks <number> --watch=false
+gh pr view <number> --json isDraft,mergeStateStatus,reviewDecision,statusCheckRollup
+bd note <bead-id> "Verification complete: <summary>"
+```
+
+## Merge and Cleanup
+
+Use squash merge only:
+
+```bash
+gh pr merge <PR_NUMBER> --squash --delete-branch
+```
+
+After squash merge, do not continue pushing to the old branch. Start follow-up fixes from fresh `main`.
+
+Before ending a session:
+
+```bash
+git status
+bd dolt push
+git push
+git status
+```
+
+Work is not complete until both Beads state and git commits are pushed.
diff --git a/.agents/skills/agentv-grader-changes/SKILL.md b/.agents/skills/agentv-grader-changes/SKILL.md
@@ -0,0 +1,51 @@
+---
+name: agentv-grader-changes
+description: Use when adding, modifying, renaming, parsing, or verifying AgentV graders/evaluators, assertion types, scoring behavior, thresholds, baseline files, or eval output shape.
+---
+
+# AgentV Grader Changes
+
+## Type System
+
+Grader types are kebab-case everywhere:
+
+- YAML config: `llm-grader`, `is-json`, `execution-metrics`.
+- Internal `EvaluatorKind`.
+- Output `scores[].type`.
+- Registry keys.
+
+Source of truth: `EVALUATOR_KIND_VALUES` in `packages/core/src/evaluation/types.ts`.
+
+Snake_case aliases can be accepted for backward compatibility through `normalizeGraderType()` in `grader-parser.ts`. SDK-facing `AssertionType` in `packages/eval/src/assertion.ts` must stay in sync.
+
+## Verification
+
+Unit tests are not enough for grader changes.
+
+1. Ensure `.env` exists in the worktree.
+2. Run an actual eval with a real example file:
+
+```bash
+bun apps/cli/src/cli.ts eval examples/features/rubric/evals/dataset.eval.yaml --test-id <test-id>
+```
+
+3. Inspect JSONL output:
+   - correct `scores[].type`
+   - expected score calculation
+   - assertions have `text`, `passed`, and optional `evidence`
+
+4. Update `*.baseline.jsonl` files when output format changes.
+
+`--dry-run` is useful for harness plumbing but returns mock scores and cannot validate grading quality.
+
+## Score Range Checks
+
+For manual e2e score guardrails:
+
+```bash
+bun apps/cli/src/cli.ts eval examples/path/to/suite.eval.yaml --target azure \
+  --out examples/path/to/suite.results.jsonl
+bun scripts/check-grader-scores.ts
+```
+
+Add `<eval-stem>.grader-scores.yaml` next to an eval when a new suite needs score-range assertions.
diff --git a/.agents/skills/agentv-release-publishing/SKILL.md b/.agents/skills/agentv-release-publishing/SKILL.md
@@ -0,0 +1,31 @@
+---
+name: agentv-release-publishing
+description: Use when changing AgentV versioning, release automation, package publishing, npm package configuration, or release docs.
+---
+
+# AgentV Release and Publishing
+
+## Versioning
+
+Git commit history is the changelog. Use GitHub Actions for releases; do not publish manually from a local machine.
+
+## Standard Release Flow
+
+1. Run the Release workflow with `channel=next` and desired bump. It creates `x.y.z-next.1`, commits, tags, and pushes.
+2. Publish workflow publishes npm `next`.
+3. Run Release workflow with `channel=finalize`. It strips the prerelease suffix.
+4. Publish workflow publishes npm `latest`.
+
+## Direct Stable Release
+
+Run the Release workflow with `channel=stable` and the desired bump. Publish workflow publishes npm `latest`.
+
+## Local Scripts
+
+`bun scripts/release.ts` can inspect version state locally, but do not run `bun run publish` or `bun run publish:next` locally. npm publish uses OIDC trusted publishing from GitHub Actions.
+
+## Packages
+
+- `packages/core/` publishes `@agentv/core`.
+- `apps/cli/` publishes `agentv`.
+- tsup bundles workspace dependencies with `noExternal: ["@agentv/core"]`.