Skip to content

feat(phoenix): support AgentV-authoritative LLM graders and rubrics #1285

@christso

Description

@christso

Objective

Support llm-grader and rubrics in Phoenix experiments while preserving AgentV prompt/schema/scoring semantics.

Acceptance Signals

  • Phoenix adapter runs AgentV's LLM/rubric grading path and logs the resulting score, verdict, assertions, evidence, and details into Phoenix evaluation metadata.
  • Checklist rubric results preserve per-rubric assertions and evidence.
  • Score-range rubric results preserve score, verdict, and details.
  • LLM grader provider failures surface as clear evaluation errors.
  • Unsupported custom prompt modes remain visible when they cannot safely run in adapter context.
  • Support matrix and e2e notes are updated.

Implementation Notes

AgentV scoring should be authoritative first. Phoenix-native model evaluator templates can be considered later only after semantic differences are understood and documented.

Relevant files:

  • packages/phoenix-adapter/src/evaluators/registry.ts
  • packages/phoenix-adapter/src/evaluators/types.ts
  • packages/phoenix-adapter/src/phoenix/run-experiment.ts
  • packages/phoenix-adapter/test/evaluators/llm-grader.test.ts
  • packages/phoenix-adapter/docs/support-matrix.md
  • packages/phoenix-adapter/docs/e2e-verification.md

Non-goals

  • Do not make Phoenix-native model evaluator scores the primary score for AgentV-authored rubrics in this issue.
  • Do not add provider-specific config knobs for one-off grading behavior.

Metadata

Metadata

Assignees

No one assigned

    Labels

    phoenixPhoenix integration and observability work

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions