Skip to content

DO NOT MWERGE#1458

Open
samsja wants to merge 8 commits into
mainfrom
sami/fix-smth
Open

DO NOT MWERGE#1458
samsja wants to merge 8 commits into
mainfrom
sami/fix-smth

Conversation

@samsja

@samsja samsja commented May 25, 2026

Copy link
Copy Markdown
Member

DO NOT KMERGE


Note

Medium Risk
Touches token/chat response parsing and RL trajectory token plumbing; behavior changes for any consumer that still expected base85 numpy routed_experts strings or truncation in verifiers.

Overview
This PR aligns routed experts handling with PrimeRL’s compact {data, shape} sidecar (base64 uint8) instead of the old base85-encoded NumPy string, and threads that payload through chat, token, and renderer inference paths without re-decoding it in verifiers.

Chat completions clients that expose post now fetch raw httpx bodies, strip the large routed_experts.data JSON string before Pydantic validation, and reattach the bytes as a memoryview into the choice extras—same path for /chat/completions and /chat/completions/tokens. Renderer inference bypasses renderers.client.generate in favor of a local _generate_with_renderer that passes routed_experts through unchanged, uses RendererPool.checkout() for pool work, and merges extra_headers from sampling args and kwargs.

Types now use RoutedExpertsPayload; parse_routed_experts only normalizes dicts (numpy truncate/decode removed). parse_response_tokens moves routed_experts onto the trajectory step only, like multimodal sidecars. Adds pybase64 and tests for sidecar parsing, msgpack round-trip, generate forwarding, and trajectory parsing.

Reviewed by Cursor Bugbot for commit 913032a. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Preserve routed_experts sidecar as a zero-copy memoryview through chat completion and renderer pipelines

  • Adds _parse_chat_completion_with_routed_experts_sidecar and _post_chat_completion_with_routed_experts_sidecar helpers in openai_chat_completions_client.py that POST to the chat completions endpoint via cast_to=httpx.Response and return a ChatCompletion with routed_experts.data as a memoryview into the raw response bytes (zero-copy).
  • Updates OpenAIChatCompletionsTokenClient.get_native_response and OpenAIChatCompletionsClient.get_native_response to use this helper, forwarding extra_headers through both paths.
  • Updates RendererClient.get_native_response in renderer_client.py to forward extra_headers to the generate endpoint and parse routed_experts via parse_routed_experts instead of passing through raw values.
  • Changes RoutedExpertsPayload in types.py from a base64 string to a TypedDict with data (str | bytes | bytearray | memoryview) and shape fields; CustomBaseModel gains arbitrary_types_allowed=True.
  • Updates parse_response_tokens in response_utils.py to move routed_experts off response.message.tokens onto the returned TrajectoryStepTokens (mirroring multi_modal_data) and removes truncate_routed_experts.
  • Behavioral Change: routed_experts is no longer base64-encoded or truncated during token truncation; consumers expecting a base64 string will receive a RoutedExpertsPayload dict instead.
📊 Macroscope summarized 913032a. 6 files reviewed, 1 issue evaluated, 0 issues filtered, 1 comment posted

🗂️ Filtered Issues

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 913032a. Configure here.

return await _post_chat_completion_with_routed_experts_sidecar(
self.client,
"/chat/completions",
body=body,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nested extra_body on post path

Medium Severity

get_native_response now sends chat requests through client.post whenever post exists (including standard AsyncOpenAI), but the body still includes a nested extra_body object. The SDK’s chat.completions.create merges extra_body into the top-level JSON, so OpenRouter/Pinference Anthropic reasoning_effort and other extra_body fields may never reach the API.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 913032a. Configure here.

Comment thread verifiers/utils/response_utils.py Outdated
completion_logprobs = tokens.completion_logprobs[: max_seq_len - prompt_len]
routed_experts = truncate_routed_experts(
routed_experts, prompt_len + len(completion_ids)
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Routed experts not truncated

Medium Severity

When parse_response_tokens applies max_seq_len and shortens prompt or completion token lists, it still copies routed_experts unchanged. The removed truncate_routed_experts logic used to align expert routing data with the truncated sequence length.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 913032a. Configure here.

Comment thread verifiers/types.py
overlong_prompt: bool
is_truncated: bool
routed_experts: str | None # base64 NumPy [seq_len, layers, topk]
routed_experts: RoutedExpertsPayload | None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference docs stale routed_experts

Low Severity

This PR changes routed_experts from a base64 string to a RoutedExpertsPayload with data and shape, but docs/reference.md still documents TrajectoryStepTokens.routed_experts as list[list[list[int]]] | None.

Fix in Cursor Fix in Web

Triggered by project rule: BugBot Instructions

Reviewed by Cursor Bugbot for commit 913032a. Configure here.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium

async def parse_response_tokens(
response: Response, max_seq_len: int | None = None
) -> TrajectoryStepTokens | None:
"""Parse token data from a vf.Response."""
if response is None:
return None
tokens = response.message.tokens
if tokens is None:
return None
prompt_ids = tokens.prompt_ids
prompt_mask = tokens.prompt_mask
completion_ids = tokens.completion_ids
completion_mask = tokens.completion_mask
completion_logprobs = tokens.completion_logprobs
routed_experts = tokens.routed_experts
multi_modal_data = tokens.multi_modal_data
if max_seq_len is not None:
prompt_len = len(prompt_ids)
completion_len = len(completion_ids)
overlong_prompt = prompt_len > max_seq_len
if overlong_prompt:
is_truncated = True
prompt_ids = prompt_ids[:max_seq_len]
prompt_mask = prompt_mask[:max_seq_len]
completion_ids = []
completion_mask = []
completion_logprobs = []
elif prompt_len + completion_len > max_seq_len:
is_truncated = True
completion_ids = tokens.completion_ids[: max_seq_len - prompt_len]
completion_mask = tokens.completion_mask[: max_seq_len - prompt_len]
completion_logprobs = tokens.completion_logprobs[: max_seq_len - prompt_len]
else:

When max_seq_len truncation occurs, routed_experts is not truncated to match the shortened token arrays, so the returned TrajectoryStepTokens has mismatched dimensions between routed_experts and the actual token count. This breaks downstream consumers that expect routed_experts length to equal total sequence length. Consider restoring truncate_routed_experts() calls for both truncation branches, or if removal was intentional, document why the length mismatch is acceptable.

@@ -59,6 +59,7 @@
             prompt_ids = prompt_ids[:max_seq_len]
             prompt_mask = prompt_mask[:max_seq_len]
             completion_ids = []
             completion_mask = []
             completion_logprobs = []
+            routed_experts = truncate_routed_experts(routed_experts, len(prompt_ids))
         elif prompt_len + completion_len > max_seq_len:
             is_truncated = True
             completion_ids = tokens.completion_ids[: max_seq_len - prompt_len]
             completion_mask = tokens.completion_mask[: max_seq_len - prompt_len]
             completion_logprobs = tokens.completion_logprobs[: max_seq_len - prompt_len]
+            routed_experts = truncate_routed_experts(
+                routed_experts, prompt_len + len(completion_ids)
+            )
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file verifiers/utils/response_utils.py around lines 37-70:

When `max_seq_len` truncation occurs, `routed_experts` is not truncated to match the shortened token arrays, so the returned `TrajectoryStepTokens` has mismatched dimensions between `routed_experts` and the actual token count. This breaks downstream consumers that expect `routed_experts` length to equal total sequence length. Consider restoring `truncate_routed_experts()` calls for both truncation branches, or if removal was intentional, document why the length mismatch is acceptable.

Evidence trail:
verifiers/utils/response_utils.py lines 37-94 (REVIEWED_COMMIT): token arrays truncated at lines 58-69 but routed_experts unchanged at line 84. Commit 3708ede3 diff shows intentional removal of truncate_routed_experts() calls. docs/reference.md line 195: documents routed_experts shape as [seq_len, layers, topk]. verifiers/types.py lines 235-248: TrajectoryStepTokens TypedDict definition with routed_experts: RoutedExpertsPayload | None.

@macroscopeapp

macroscopeapp Bot commented May 25, 2026

Copy link
Copy Markdown

Approvability

Verdict: Needs human review

1 blocking correctness issue found. PR title explicitly indicates 'DO NOT MERGE', significant runtime behavior changes to chat completion handling and routed_experts parsing, and multiple unresolved medium-severity review comments identifying potential bugs with extra_body propagation and routed_experts truncation mismatches.

You can customize Macroscope's approvability policy. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants