Skip to content

feat(phoenix): complete AgentV Phoenix integration #1281

@christso

Description

@christso

Objective

Track the remaining work to complete the AgentV Phoenix integration after the initial adapter and Phase A OTel preset work.

Context

Draft PR #1280 is the seed/handoff PR. It adds the planning document and first supported user-facing slice: agentv eval --export-otel --otel-backend phoenix plus documentation. It does not close this parent issue.

This parent issue covers the follow-up implementation issues needed before the Phoenix integration can be called complete.

Scope

Complete the integration as two intentionally bounded surfaces:

  • Phoenix OTLP observability for normal AgentV eval runs.
  • A Phoenix dataset/experiment adapter that keeps AgentV eval YAML and AgentV scoring semantics authoritative.

Subissues

Recommended sequencing:

  1. feat(phoenix): complete deterministic adapter parity #1282 - Deterministic adapter parity.
  2. fix(phoenix): make full dry-run structural parity actionable #1283 - Full dry-run structural parity.
  3. feat(phoenix): run real AgentV targets inside Phoenix experiments #1284 - Real AgentV target execution inside Phoenix experiments.
  4. feat(phoenix): support AgentV-authoritative LLM graders and rubrics #1285 - AgentV-authoritative LLM graders and rubrics.
  5. feat(phoenix): support trace and metric graders through Phoenix trace IDs #1286 - Trace and metric graders through Phoenix trace IDs.
  6. chore(phoenix): decide and implement adapter package publishing posture #1287 - Adapter package publishing posture.

Acceptance Signals

  • Users can use the documented Phoenix OTel preset for normal eval observability.
  • Adapter support gaps are explicit and conservative.
  • Phoenix experiments run real AgentV targets before claiming target parity.
  • AgentV scoring semantics remain authoritative for AgentV-authored evals.
  • Full dry-run parity is green or has documented exclusions.
  • Live Phoenix verification covers both OTLP export and at least one experiment path.
  • Maintainers can tell whether the adapter is intentionally private or publishable.

Non-goals

  • Do not replace AgentV's local result JSONL/artifact model with Phoenix.
  • Do not make Phoenix a required dependency for normal eval execution.
  • Do not reimplement workspace lifecycle, Docker sandboxing, target matrices, trials, or custom assertion discovery inside Phoenix unless a later concrete issue proves it is required.

Metadata

Metadata

Assignees

No one assigned

    Labels

    phoenixPhoenix integration and observability work

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions