Objective
Replace synthetic Phoenix adapter task outputs with actual AgentV target execution so Phoenix experiments represent real AgentV behavior.
Acceptance Signals
- Phoenix experiment tasks invoke AgentV execution for the corresponding normalized test case.
- Phoenix task output matches the actual AgentV target output, not synthesized expected-output content.
- AgentV scores, assertions, verdicts, duration, cost, token usage, target, and stable
agentv_test_id metadata are preserved in Phoenix run/evaluation metadata.
- Missing target/configuration produces a clear run error and is not reported as an evaluator failure.
- Dry-run/reference mode remains network-free and clearly separated from live execution.
- At least one live Phoenix smoke against a deterministic example records real task runs and evaluator runs.
Implementation Notes
Keep AgentV eval YAML, target execution, workspace lifecycle, and scoring authoritative. Phoenix should host experiment artifacts, not become a parallel AgentV runtime.
Relevant files:
packages/phoenix-adapter/src/phoenix/run-experiment.ts
packages/phoenix-adapter/src/run/options.ts
packages/phoenix-adapter/src/run/run-suite.ts
packages/phoenix-adapter/src/agentv/load-spec.ts
packages/phoenix-adapter/src/phoenix/types.ts
packages/phoenix-adapter/test/phoenix-run-experiment.test.ts
packages/phoenix-adapter/test/agentv-execution.test.ts
Non-goals
- Do not reimplement workspace pooling, Docker lifecycle, matrices, or trials inside the adapter.
- Do not make Phoenix required for normal
agentv eval execution.
Objective
Replace synthetic Phoenix adapter task outputs with actual AgentV target execution so Phoenix experiments represent real AgentV behavior.
Acceptance Signals
agentv_test_idmetadata are preserved in Phoenix run/evaluation metadata.Implementation Notes
Keep AgentV eval YAML, target execution, workspace lifecycle, and scoring authoritative. Phoenix should host experiment artifacts, not become a parallel AgentV runtime.
Relevant files:
packages/phoenix-adapter/src/phoenix/run-experiment.tspackages/phoenix-adapter/src/run/options.tspackages/phoenix-adapter/src/run/run-suite.tspackages/phoenix-adapter/src/agentv/load-spec.tspackages/phoenix-adapter/src/phoenix/types.tspackages/phoenix-adapter/test/phoenix-run-experiment.test.tspackages/phoenix-adapter/test/agentv-execution.test.tsNon-goals
agentv evalexecution.