Objective
Enable trace-derived AgentV graders in Phoenix experiments once Phoenix spans can be associated with each AgentV test case.
Acceptance Signals
- Phoenix task/evaluator context exposes a stable trace ID or span lookup path for each test case.
- Phoenix spans can be translated into AgentV trace summary or metric inputs.
- At least one
tool-trajectory happy path and one execution-metrics threshold are supported.
- Missing trace IDs produce clear unsupported/failed explanations rather than empty passes.
- Phoenix API lookup failures are surfaced as evaluation errors with clear guidance.
- Live verification demonstrates trace-derived scores from Phoenix-ingested span data.
Implementation Notes
Build this after real AgentV execution is wired into Phoenix experiments. Start narrow with one tool trajectory and one metrics case before broadening.
Relevant files:
packages/phoenix-adapter/src/phoenix/run-experiment.ts
packages/phoenix-adapter/src/phoenix/types.ts
packages/phoenix-adapter/src/evaluators/registry.ts
packages/phoenix-adapter/test/evaluators/trace-metrics.test.ts
packages/core/src/evaluation/trace.ts
apps/cli/src/commands/inspect/utils.ts
Non-goals
- Do not support trace graders without a reliable trace ID/span association.
- Do not silently pass when metrics are missing.
Objective
Enable trace-derived AgentV graders in Phoenix experiments once Phoenix spans can be associated with each AgentV test case.
Acceptance Signals
tool-trajectoryhappy path and oneexecution-metricsthreshold are supported.Implementation Notes
Build this after real AgentV execution is wired into Phoenix experiments. Start narrow with one tool trajectory and one metrics case before broadening.
Relevant files:
packages/phoenix-adapter/src/phoenix/run-experiment.tspackages/phoenix-adapter/src/phoenix/types.tspackages/phoenix-adapter/src/evaluators/registry.tspackages/phoenix-adapter/test/evaluators/trace-metrics.test.tspackages/core/src/evaluation/trace.tsapps/cli/src/commands/inspect/utils.tsNon-goals