Objective
Resolve or explicitly exclude current Phoenix adapter full dry-run failures so the dry-run report can become a reliable regression signal.
Acceptance Signals
- Full dry-run exits successfully, or reports only explicitly documented non-blocking exclusions.
- The four known failures are investigated and either fixed at source/baseline level or excluded with clear reasons:
examples/features/matrix-evaluation/evals/dataset.eval.yaml
examples/features/prompt-template-sdk/evals/dataset.eval.yaml
examples/features/tool-trajectory-simple/evals/dataset.eval.yaml
examples/features/weighted-graders/evals/dataset.eval.yaml
- Parity report output distinguishes conversion crashes from source/baseline drift and documented exclusions.
packages/phoenix-adapter/docs/e2e-verification.md reflects the current expected full dry-run result.
Implementation Notes
Prefer fixing stale baselines or source references over adding new mechanisms. If an eval intentionally diverges, encode a small documented exclusion in adapter parity reporting.
Relevant files:
packages/phoenix-adapter/src/parity/compare.ts
packages/phoenix-adapter/src/parity/report.ts
packages/phoenix-adapter/test/parity.test.ts
packages/phoenix-adapter/docs/e2e-verification.md
- The affected example evals/baselines listed above
Non-goals
- Do not accept both old and new wire key names unless the old form has shipped and needs compatibility.
- Do not turn dry-run into a live Phoenix or live provider test.
Objective
Resolve or explicitly exclude current Phoenix adapter full dry-run failures so the dry-run report can become a reliable regression signal.
Acceptance Signals
examples/features/matrix-evaluation/evals/dataset.eval.yamlexamples/features/prompt-template-sdk/evals/dataset.eval.yamlexamples/features/tool-trajectory-simple/evals/dataset.eval.yamlexamples/features/weighted-graders/evals/dataset.eval.yamlpackages/phoenix-adapter/docs/e2e-verification.mdreflects the current expected full dry-run result.Implementation Notes
Prefer fixing stale baselines or source references over adding new mechanisms. If an eval intentionally diverges, encode a small documented exclusion in adapter parity reporting.
Relevant files:
packages/phoenix-adapter/src/parity/compare.tspackages/phoenix-adapter/src/parity/report.tspackages/phoenix-adapter/test/parity.test.tspackages/phoenix-adapter/docs/e2e-verification.mdNon-goals