Skip to content

feat(phoenix): support trace and metric graders through Phoenix trace IDs #1286

@christso

Description

@christso

Objective

Enable trace-derived AgentV graders in Phoenix experiments once Phoenix spans can be associated with each AgentV test case.

Acceptance Signals

  • Phoenix task/evaluator context exposes a stable trace ID or span lookup path for each test case.
  • Phoenix spans can be translated into AgentV trace summary or metric inputs.
  • At least one tool-trajectory happy path and one execution-metrics threshold are supported.
  • Missing trace IDs produce clear unsupported/failed explanations rather than empty passes.
  • Phoenix API lookup failures are surfaced as evaluation errors with clear guidance.
  • Live verification demonstrates trace-derived scores from Phoenix-ingested span data.

Implementation Notes

Build this after real AgentV execution is wired into Phoenix experiments. Start narrow with one tool trajectory and one metrics case before broadening.

Relevant files:

  • packages/phoenix-adapter/src/phoenix/run-experiment.ts
  • packages/phoenix-adapter/src/phoenix/types.ts
  • packages/phoenix-adapter/src/evaluators/registry.ts
  • packages/phoenix-adapter/test/evaluators/trace-metrics.test.ts
  • packages/core/src/evaluation/trace.ts
  • apps/cli/src/commands/inspect/utils.ts

Non-goals

  • Do not support trace graders without a reliable trace ID/span association.
  • Do not silently pass when metrics are missing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    phoenixPhoenix integration and observability work

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions