Skip to content

Handle QUEST judge structured parse errors#1620

Open
samsja wants to merge 1 commit into
mainfrom
codex/quest-judge-parse-error
Open

Handle QUEST judge structured parse errors#1620
samsja wants to merge 1 commit into
mainfrom
codex/quest-judge-parse-error

Conversation

@samsja

@samsja samsja commented Jun 11, 2026

Copy link
Copy Markdown
Member

Summary

  • Wrap pydantic structured-output parse failures from the QUEST judge as vf.InvalidModelResponseError
  • Add a regression test for malformed judge JSON so one bad structured response does not crash the full eval

Test

  • uv run pytest tests/test_quest_taskset.py
  • uv run ruff check verifiers/envs/experimental/composable/tasksets/search/quest/taskset.py tests/test_quest_taskset.py

Note

Low Risk
Narrow error-handling change on the QUEST judge client path with a focused unit test; no auth, data, or API contract changes.

Overview
QUEST judge structured outputs now treat Pydantic validation failures as first-class model errors instead of bubbling raw ValidationError.

In QuestOpenAIClient.async_response, when beta.chat.completions.parse succeeds at the HTTP/SDK layer but structured parsing fails (ValidationError), the code raises vf.InvalidModelResponseError with a clear message and chains the original exception—aligned with other judge response failures (empty parsed, SDK validation, etc.).

A new async test uses a fake OpenAI client that returns malformed JSON for a BaseModel response format and asserts the raised error is InvalidModelResponseError with ValidationError as __cause__, guarding against a single bad judge response taking down a full eval.

Reviewed by Cursor Bugbot for commit ef2f00d. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Convert Pydantic ValidationError to InvalidModelResponseError in QuestOpenAIClient.async_response

When the QUEST judge's structured response branch calls _client.beta.chat.completions.parse(...) and Pydantic raises a ValidationError, async_response now catches it and re-raises it as vf.InvalidModelResponseError, preserving the original error as __cause__. A new async test in test_quest_taskset.py verifies this behavior using a fake OpenAI client that returns invalid JSON.

Macroscope summarized ef2f00d.

@macroscopeapp

macroscopeapp Bot commented Jun 11, 2026

Copy link
Copy Markdown

Approvability

Verdict: Approved

This is a straightforward error handling fix that catches ValidationError during QUEST judge structured parsing and converts it to a more appropriate InvalidModelResponseError. The change is small, self-contained, and includes test coverage.

You can customize Macroscope's approvability policy. Learn more.

Comment on lines +191 to +194
except ValidationError as exc:
raise vf.InvalidModelResponseError(
f"QUEST judge returned invalid structured response for {model}: {exc}"
) from exc

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we put this into _raise_quest_judge_error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants