DO NOT MWERGE#1458
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 913032a. Configure here.
| return await _post_chat_completion_with_routed_experts_sidecar( | ||
| self.client, | ||
| "/chat/completions", | ||
| body=body, |
There was a problem hiding this comment.
Nested extra_body on post path
Medium Severity
get_native_response now sends chat requests through client.post whenever post exists (including standard AsyncOpenAI), but the body still includes a nested extra_body object. The SDK’s chat.completions.create merges extra_body into the top-level JSON, so OpenRouter/Pinference Anthropic reasoning_effort and other extra_body fields may never reach the API.
Reviewed by Cursor Bugbot for commit 913032a. Configure here.
| completion_logprobs = tokens.completion_logprobs[: max_seq_len - prompt_len] | ||
| routed_experts = truncate_routed_experts( | ||
| routed_experts, prompt_len + len(completion_ids) | ||
| ) |
There was a problem hiding this comment.
Routed experts not truncated
Medium Severity
When parse_response_tokens applies max_seq_len and shortens prompt or completion token lists, it still copies routed_experts unchanged. The removed truncate_routed_experts logic used to align expert routing data with the truncated sequence length.
Reviewed by Cursor Bugbot for commit 913032a. Configure here.
| overlong_prompt: bool | ||
| is_truncated: bool | ||
| routed_experts: str | None # base64 NumPy [seq_len, layers, topk] | ||
| routed_experts: RoutedExpertsPayload | None |
There was a problem hiding this comment.
Reference docs stale routed_experts
Low Severity
This PR changes routed_experts from a base64 string to a RoutedExpertsPayload with data and shape, but docs/reference.md still documents TrajectoryStepTokens.routed_experts as list[list[list[int]]] | None.
Triggered by project rule: BugBot Instructions
Reviewed by Cursor Bugbot for commit 913032a. Configure here.
There was a problem hiding this comment.
🟡 Medium
verifiers/verifiers/utils/response_utils.py
Lines 37 to 70 in 913032a
When max_seq_len truncation occurs, routed_experts is not truncated to match the shortened token arrays, so the returned TrajectoryStepTokens has mismatched dimensions between routed_experts and the actual token count. This breaks downstream consumers that expect routed_experts length to equal total sequence length. Consider restoring truncate_routed_experts() calls for both truncation branches, or if removal was intentional, document why the length mismatch is acceptable.
@@ -59,6 +59,7 @@
prompt_ids = prompt_ids[:max_seq_len]
prompt_mask = prompt_mask[:max_seq_len]
completion_ids = []
completion_mask = []
completion_logprobs = []
+ routed_experts = truncate_routed_experts(routed_experts, len(prompt_ids))
elif prompt_len + completion_len > max_seq_len:
is_truncated = True
completion_ids = tokens.completion_ids[: max_seq_len - prompt_len]
completion_mask = tokens.completion_mask[: max_seq_len - prompt_len]
completion_logprobs = tokens.completion_logprobs[: max_seq_len - prompt_len]
+ routed_experts = truncate_routed_experts(
+ routed_experts, prompt_len + len(completion_ids)
+ )🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file verifiers/utils/response_utils.py around lines 37-70:
When `max_seq_len` truncation occurs, `routed_experts` is not truncated to match the shortened token arrays, so the returned `TrajectoryStepTokens` has mismatched dimensions between `routed_experts` and the actual token count. This breaks downstream consumers that expect `routed_experts` length to equal total sequence length. Consider restoring `truncate_routed_experts()` calls for both truncation branches, or if removal was intentional, document why the length mismatch is acceptable.
Evidence trail:
verifiers/utils/response_utils.py lines 37-94 (REVIEWED_COMMIT): token arrays truncated at lines 58-69 but routed_experts unchanged at line 84. Commit 3708ede3 diff shows intentional removal of truncate_routed_experts() calls. docs/reference.md line 195: documents routed_experts shape as [seq_len, layers, topk]. verifiers/types.py lines 235-248: TrajectoryStepTokens TypedDict definition with routed_experts: RoutedExpertsPayload | None.
ApprovabilityVerdict: Needs human review 1 blocking correctness issue found. PR title explicitly indicates 'DO NOT MERGE', significant runtime behavior changes to chat completion handling and routed_experts parsing, and multiple unresolved medium-severity review comments identifying potential bugs with extra_body propagation and routed_experts truncation mismatches. You can customize Macroscope's approvability policy. Learn more. |


DO NOT KMERGE
Note
Medium Risk
Touches token/chat response parsing and RL trajectory token plumbing; behavior changes for any consumer that still expected base85 numpy
routed_expertsstrings or truncation in verifiers.Overview
This PR aligns routed experts handling with PrimeRL’s compact
{data, shape}sidecar (base64 uint8) instead of the old base85-encoded NumPy string, and threads that payload through chat, token, and renderer inference paths without re-decoding it in verifiers.Chat completions clients that expose
postnow fetch rawhttpxbodies, strip the largerouted_experts.dataJSON string before Pydantic validation, and reattach the bytes as amemoryviewinto the choice extras—same path for/chat/completionsand/chat/completions/tokens. Renderer inference bypassesrenderers.client.generatein favor of a local_generate_with_rendererthat passesrouted_expertsthrough unchanged, usesRendererPool.checkout()for pool work, and mergesextra_headersfrom sampling args and kwargs.Types now use
RoutedExpertsPayload;parse_routed_expertsonly normalizes dicts (numpy truncate/decode removed).parse_response_tokensmovesrouted_expertsonto the trajectory step only, like multimodal sidecars. Addspybase64and tests for sidecar parsing, msgpack round-trip, generate forwarding, and trajectory parsing.Reviewed by Cursor Bugbot for commit 913032a. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Preserve
routed_expertssidecar as a zero-copy memoryview through chat completion and renderer pipelines_parse_chat_completion_with_routed_experts_sidecarand_post_chat_completion_with_routed_experts_sidecarhelpers in openai_chat_completions_client.py that POST to the chat completions endpoint viacast_to=httpx.Responseand return aChatCompletionwithrouted_experts.dataas amemoryviewinto the raw response bytes (zero-copy).OpenAIChatCompletionsTokenClient.get_native_responseandOpenAIChatCompletionsClient.get_native_responseto use this helper, forwardingextra_headersthrough both paths.RendererClient.get_native_responsein renderer_client.py to forwardextra_headersto the generate endpoint and parserouted_expertsviaparse_routed_expertsinstead of passing through raw values.RoutedExpertsPayloadin types.py from a base64 string to aTypedDictwithdata(str | bytes | bytearray | memoryview) andshapefields;CustomBaseModelgainsarbitrary_types_allowed=True.parse_response_tokensin response_utils.py to moverouted_expertsoffresponse.message.tokensonto the returnedTrajectoryStepTokens(mirroringmulti_modal_data) and removestruncate_routed_experts.routed_expertsis no longer base64-encoded or truncated during token truncation; consumers expecting a base64 string will receive aRoutedExpertsPayloaddict instead.📊 Macroscope summarized 913032a. 6 files reviewed, 1 issue evaluated, 0 issues filtered, 1 comment posted
🗂️ Filtered Issues