Handle QUEST judge structured parse errors#1620
Open
samsja wants to merge 1 commit into
Open
Conversation
ApprovabilityVerdict: Approved This is a straightforward error handling fix that catches ValidationError during QUEST judge structured parsing and converts it to a more appropriate InvalidModelResponseError. The change is small, self-contained, and includes test coverage. You can customize Macroscope's approvability policy. Learn more. |
rasdani
reviewed
Jun 11, 2026
Comment on lines
+191
to
+194
| except ValidationError as exc: | ||
| raise vf.InvalidModelResponseError( | ||
| f"QUEST judge returned invalid structured response for {model}: {exc}" | ||
| ) from exc |
Contributor
There was a problem hiding this comment.
can we put this into _raise_quest_judge_error?
rasdani
approved these changes
Jun 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
vf.InvalidModelResponseErrorTest
uv run pytest tests/test_quest_taskset.pyuv run ruff check verifiers/envs/experimental/composable/tasksets/search/quest/taskset.py tests/test_quest_taskset.pyNote
Low Risk
Narrow error-handling change on the QUEST judge client path with a focused unit test; no auth, data, or API contract changes.
Overview
QUEST judge structured outputs now treat Pydantic validation failures as first-class model errors instead of bubbling raw
ValidationError.In
QuestOpenAIClient.async_response, whenbeta.chat.completions.parsesucceeds at the HTTP/SDK layer but structured parsing fails (ValidationError), the code raisesvf.InvalidModelResponseErrorwith a clear message and chains the original exception—aligned with other judge response failures (empty parsed, SDK validation, etc.).A new async test uses a fake OpenAI client that returns malformed JSON for a
BaseModelresponse format and asserts the raised error isInvalidModelResponseErrorwithValidationErroras__cause__, guarding against a single bad judge response taking down a full eval.Reviewed by Cursor Bugbot for commit ef2f00d. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Convert Pydantic
ValidationErrortoInvalidModelResponseErrorinQuestOpenAIClient.async_responseWhen the QUEST judge's structured response branch calls
_client.beta.chat.completions.parse(...)and Pydantic raises aValidationError,async_responsenow catches it and re-raises it asvf.InvalidModelResponseError, preserving the original error as__cause__. A new async test in test_quest_taskset.py verifies this behavior using a fake OpenAI client that returns invalid JSON.Macroscope summarized ef2f00d.