Handle QUEST judge structured parse errors by samsja · Pull Request #1620 · PrimeIntellect-ai/verifiers

samsja · 2026-06-11T02:23:30Z

Summary

Wrap pydantic structured-output parse failures from the QUEST judge as vf.InvalidModelResponseError
Add a regression test for malformed judge JSON so one bad structured response does not crash the full eval

Test

uv run pytest tests/test_quest_taskset.py
uv run ruff check verifiers/envs/experimental/composable/tasksets/search/quest/taskset.py tests/test_quest_taskset.py

Note

Low Risk
Narrow error-handling change on the QUEST judge client path with a focused unit test; no auth, data, or API contract changes.

Overview
QUEST judge structured outputs now treat Pydantic validation failures as first-class model errors instead of bubbling raw ValidationError.

In QuestOpenAIClient.async_response, when beta.chat.completions.parse succeeds at the HTTP/SDK layer but structured parsing fails (ValidationError), the code raises vf.InvalidModelResponseError with a clear message and chains the original exception—aligned with other judge response failures (empty parsed, SDK validation, etc.).

A new async test uses a fake OpenAI client that returns malformed JSON for a BaseModel response format and asserts the raised error is InvalidModelResponseError with ValidationError as __cause__, guarding against a single bad judge response taking down a full eval.

^{Reviewed by Cursor Bugbot for commit ef2f00d. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Convert Pydantic `ValidationError` to `InvalidModelResponseError` in `QuestOpenAIClient.async_response`

When the QUEST judge's structured response branch calls _client.beta.chat.completions.parse(...) and Pydantic raises a ValidationError, async_response now catches it and re-raises it as vf.InvalidModelResponseError, preserving the original error as __cause__. A new async test in test_quest_taskset.py verifies this behavior using a fake OpenAI client that returns invalid JSON.

^{Macroscope summarized ef2f00d.}

macroscopeapp · 2026-06-11T02:25:13Z

Approvability

Verdict: Approved

This is a straightforward error handling fix that catches ValidationError during QUEST judge structured parsing and converts it to a more appropriate InvalidModelResponseError. The change is small, self-contained, and includes test coverage.

^{You can customize Macroscope's approvability policy. Learn more.}

rasdani · 2026-06-11T23:19:52Z

+            except ValidationError as exc:
+                raise vf.InvalidModelResponseError(
+                    f"QUEST judge returned invalid structured response for {model}: {exc}"
+                ) from exc


can we put this into _raise_quest_judge_error?

Handle QUEST judge structured parse errors

ef2f00d

macroscopeapp Bot approved these changes Jun 11, 2026

View reviewed changes

rasdani reviewed Jun 11, 2026

View reviewed changes

rasdani approved these changes Jun 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle QUEST judge structured parse errors#1620

Handle QUEST judge structured parse errors#1620
samsja wants to merge 1 commit into
mainfrom
codex/quest-judge-parse-error

samsja commented Jun 11, 2026 •

edited by macroscopeapp Bot

Loading

Uh oh!

macroscopeapp Bot commented Jun 11, 2026

Uh oh!

rasdani Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

samsja commented Jun 11, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test

Convert Pydantic ValidationError to InvalidModelResponseError in QuestOpenAIClient.async_response

Uh oh!

macroscopeapp Bot commented Jun 11, 2026

Approvability

Uh oh!

rasdani Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

samsja commented Jun 11, 2026 •

edited by macroscopeapp Bot

Loading

Convert Pydantic `ValidationError` to `InvalidModelResponseError` in `QuestOpenAIClient.async_response`