Skip to content

feat(phoenix): run real AgentV targets inside Phoenix experiments #1284

@christso

Description

@christso

Objective

Replace synthetic Phoenix adapter task outputs with actual AgentV target execution so Phoenix experiments represent real AgentV behavior.

Acceptance Signals

  • Phoenix experiment tasks invoke AgentV execution for the corresponding normalized test case.
  • Phoenix task output matches the actual AgentV target output, not synthesized expected-output content.
  • AgentV scores, assertions, verdicts, duration, cost, token usage, target, and stable agentv_test_id metadata are preserved in Phoenix run/evaluation metadata.
  • Missing target/configuration produces a clear run error and is not reported as an evaluator failure.
  • Dry-run/reference mode remains network-free and clearly separated from live execution.
  • At least one live Phoenix smoke against a deterministic example records real task runs and evaluator runs.

Implementation Notes

Keep AgentV eval YAML, target execution, workspace lifecycle, and scoring authoritative. Phoenix should host experiment artifacts, not become a parallel AgentV runtime.

Relevant files:

  • packages/phoenix-adapter/src/phoenix/run-experiment.ts
  • packages/phoenix-adapter/src/run/options.ts
  • packages/phoenix-adapter/src/run/run-suite.ts
  • packages/phoenix-adapter/src/agentv/load-spec.ts
  • packages/phoenix-adapter/src/phoenix/types.ts
  • packages/phoenix-adapter/test/phoenix-run-experiment.test.ts
  • packages/phoenix-adapter/test/agentv-execution.test.ts

Non-goals

  • Do not reimplement workspace pooling, Docker lifecycle, matrices, or trials inside the adapter.
  • Do not make Phoenix required for normal agentv eval execution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    phoenixPhoenix integration and observability work

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions