DO NOT MWERGE by samsja · Pull Request #1458 · PrimeIntellect-ai/verifiers

samsja · 2026-05-25T01:38:04Z

DO NOT KMERGE

Note

Medium Risk
Touches token/chat response parsing and RL trajectory token plumbing; behavior changes for any consumer that still expected base85 numpy routed_experts strings or truncation in verifiers.

Overview
This PR aligns routed experts handling with PrimeRL’s compact {data, shape} sidecar (base64 uint8) instead of the old base85-encoded NumPy string, and threads that payload through chat, token, and renderer inference paths without re-decoding it in verifiers.

Chat completions clients that expose post now fetch raw httpx bodies, strip the large routed_experts.data JSON string before Pydantic validation, and reattach the bytes as a memoryview into the choice extras—same path for /chat/completions and /chat/completions/tokens. Renderer inference bypasses renderers.client.generate in favor of a local _generate_with_renderer that passes routed_experts through unchanged, uses RendererPool.checkout() for pool work, and merges extra_headers from sampling args and kwargs.

Types now use RoutedExpertsPayload; parse_routed_experts only normalizes dicts (numpy truncate/decode removed). parse_response_tokens moves routed_experts onto the trajectory step only, like multimodal sidecars. Adds pybase64 and tests for sidecar parsing, msgpack round-trip, generate forwarding, and trajectory parsing.

^{Reviewed by Cursor Bugbot for commit 913032a. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Preserve `routed_experts` sidecar as a zero-copy memoryview through chat completion and renderer pipelines

Adds _parse_chat_completion_with_routed_experts_sidecar and _post_chat_completion_with_routed_experts_sidecar helpers in openai_chat_completions_client.py that POST to the chat completions endpoint via cast_to=httpx.Response and return a ChatCompletion with routed_experts.data as a memoryview into the raw response bytes (zero-copy).
Updates OpenAIChatCompletionsTokenClient.get_native_response and OpenAIChatCompletionsClient.get_native_response to use this helper, forwarding extra_headers through both paths.
Updates RendererClient.get_native_response in renderer_client.py to forward extra_headers to the generate endpoint and parse routed_experts via parse_routed_experts instead of passing through raw values.
Changes RoutedExpertsPayload in types.py from a base64 string to a TypedDict with data (str | bytes | bytearray | memoryview) and shape fields; CustomBaseModel gains arbitrary_types_allowed=True.
Updates parse_response_tokens in response_utils.py to move routed_experts off response.message.tokens onto the returned TrajectoryStepTokens (mirroring multi_modal_data) and removes truncate_routed_experts.
Behavioral Change: routed_experts is no longer base64-encoded or truncated during token truncation; consumers expecting a base64 string will receive a RoutedExpertsPayload dict instead.

📊 Macroscope summarized 913032a. 6 files reviewed, 1 issue evaluated, 0 issues filtered, 1 comment posted

🗂️ Filtered Issues

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 913032a. Configure here.}

cursor · 2026-05-25T01:38:57Z

+            return await _post_chat_completion_with_routed_experts_sidecar(
+                self.client,
+                "/chat/completions",
+                body=body,


Nested extra_body on post path

Medium Severity

get_native_response now sends chat requests through client.post whenever post exists (including standard AsyncOpenAI), but the body still includes a nested extra_body object. The SDK’s chat.completions.create merges extra_body into the top-level JSON, so OpenRouter/Pinference Anthropic reasoning_effort and other extra_body fields may never reach the API.

^{Reviewed by Cursor Bugbot for commit 913032a. Configure here.}

cursor · 2026-05-25T01:38:57Z

            completion_logprobs = tokens.completion_logprobs[: max_seq_len - prompt_len]
-            routed_experts = truncate_routed_experts(
-                routed_experts, prompt_len + len(completion_ids)
-            )


Routed experts not truncated

Medium Severity

When parse_response_tokens applies max_seq_len and shortens prompt or completion token lists, it still copies routed_experts unchanged. The removed truncate_routed_experts logic used to align expert routing data with the truncated sequence length.

^{Reviewed by Cursor Bugbot for commit 913032a. Configure here.}

cursor · 2026-05-25T01:38:57Z

    overlong_prompt: bool
    is_truncated: bool
-    routed_experts: str | None  # base64 NumPy [seq_len, layers, topk]
+    routed_experts: RoutedExpertsPayload | None


Reference docs stale routed_experts

Low Severity

This PR changes routed_experts from a base64 string to a RoutedExpertsPayload with data and shape, but docs/reference.md still documents TrajectoryStepTokens.routed_experts as list[list[list[int]]] | None.

^{Triggered by project rule: BugBot Instructions}

^{Reviewed by Cursor Bugbot for commit 913032a. Configure here.}

macroscopeapp · 2026-05-25T01:45:25Z

🟡 Medium

verifiers/verifiers/utils/response_utils.py

Lines 37 to 70 in 913032a

async def parse_response_tokens(

response: Response, max_seq_len: int | None = None

) -> TrajectoryStepTokens | None:

"""Parse token data from a vf.Response."""

if response is None:

return None

tokens = response.message.tokens

if tokens is None:

return None

prompt_ids = tokens.prompt_ids

prompt_mask = tokens.prompt_mask

completion_ids = tokens.completion_ids

completion_mask = tokens.completion_mask

completion_logprobs = tokens.completion_logprobs

routed_experts = tokens.routed_experts

multi_modal_data = tokens.multi_modal_data

if max_seq_len is not None:

prompt_len = len(prompt_ids)

completion_len = len(completion_ids)

overlong_prompt = prompt_len > max_seq_len

if overlong_prompt:

is_truncated = True

prompt_ids = prompt_ids[:max_seq_len]

prompt_mask = prompt_mask[:max_seq_len]

completion_ids = []

completion_mask = []

completion_logprobs = []

elif prompt_len + completion_len > max_seq_len:

is_truncated = True

completion_ids = tokens.completion_ids[: max_seq_len - prompt_len]

completion_mask = tokens.completion_mask[: max_seq_len - prompt_len]

completion_logprobs = tokens.completion_logprobs[: max_seq_len - prompt_len]

else:

When max_seq_len truncation occurs, routed_experts is not truncated to match the shortened token arrays, so the returned TrajectoryStepTokens has mismatched dimensions between routed_experts and the actual token count. This breaks downstream consumers that expect routed_experts length to equal total sequence length. Consider restoring truncate_routed_experts() calls for both truncation branches, or if removal was intentional, document why the length mismatch is acceptable.

@@ -59,6 +59,7 @@ prompt_ids = prompt_ids[:max_seq_len] prompt_mask = prompt_mask[:max_seq_len] completion_ids = [] completion_mask = [] completion_logprobs = [] + routed_experts = truncate_routed_experts(routed_experts, len(prompt_ids)) elif prompt_len + completion_len > max_seq_len: is_truncated = True completion_ids = tokens.completion_ids[: max_seq_len - prompt_len] completion_mask = tokens.completion_mask[: max_seq_len - prompt_len] completion_logprobs = tokens.completion_logprobs[: max_seq_len - prompt_len] + routed_experts = truncate_routed_experts( + routed_experts, prompt_len + len(completion_ids) + )

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:

In file verifiers/utils/response_utils.py around lines 37-70: When `max_seq_len` truncation occurs, `routed_experts` is not truncated to match the shortened token arrays, so the returned `TrajectoryStepTokens` has mismatched dimensions between `routed_experts` and the actual token count. This breaks downstream consumers that expect `routed_experts` length to equal total sequence length. Consider restoring `truncate_routed_experts()` calls for both truncation branches, or if removal was intentional, document why the length mismatch is acceptable. Evidence trail: verifiers/utils/response_utils.py lines 37-94 (REVIEWED_COMMIT): token arrays truncated at lines 58-69 but routed_experts unchanged at line 84. Commit 3708ede3 diff shows intentional removal of truncate_routed_experts() calls. docs/reference.md line 195: documents routed_experts shape as [seq_len, layers, topk]. verifiers/types.py lines 235-248: TrajectoryStepTokens TypedDict definition with routed_experts: RoutedExpertsPayload | None.

macroscopeapp · 2026-05-25T01:45:52Z

Approvability

Verdict: Needs human review

1 blocking correctness issue found. PR title explicitly indicates 'DO NOT MERGE', significant runtime behavior changes to chat completion handling and routed_experts parsing, and multiple unresolved medium-severity review comments identifying potential bugs with extra_body propagation and routed_experts truncation mismatches.

^{You can customize Macroscope's approvability policy. Learn more.}

S1ro1 and others added 8 commits May 16, 2026 02:45

Use raw uint8 routed experts payloads

461a730

Keep routed experts opaque during token truncation

3708ede

Speed up routed experts response parsing

9d546bd

Support older renderer bridge APIs

438c48e

Gate renderer multimodal generate kwargs

9a36320

Preserve compact routed experts in renderer client

9e68dcc

Move routed experts out of response tokens

a3edb70

Forward renderer generate request headers

913032a

cursor Bot reviewed May 25, 2026

View reviewed changes

macroscopeapp Bot reviewed May 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DO NOT MWERGE#1458

DO NOT MWERGE#1458
samsja wants to merge 8 commits into
mainfrom
sami/fix-smth

samsja commented May 25, 2026 •

edited by macroscopeapp Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 25, 2026

Uh oh!

cursor Bot May 25, 2026

Uh oh!

cursor Bot May 25, 2026

Uh oh!

macroscopeapp Bot May 25, 2026

Uh oh!

macroscopeapp Bot commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	async def parse_response_tokens(
	response: Response, max_seq_len: int \| None = None
	) -> TrajectoryStepTokens \| None:
	"""Parse token data from a vf.Response."""
	if response is None:
	return None
	tokens = response.message.tokens
	if tokens is None:
	return None
	prompt_ids = tokens.prompt_ids
	prompt_mask = tokens.prompt_mask
	completion_ids = tokens.completion_ids
	completion_mask = tokens.completion_mask
	completion_logprobs = tokens.completion_logprobs
	routed_experts = tokens.routed_experts
	multi_modal_data = tokens.multi_modal_data

	if max_seq_len is not None:
	prompt_len = len(prompt_ids)
	completion_len = len(completion_ids)
	overlong_prompt = prompt_len > max_seq_len
	if overlong_prompt:
	is_truncated = True
	prompt_ids = prompt_ids[:max_seq_len]
	prompt_mask = prompt_mask[:max_seq_len]
	completion_ids = []
	completion_mask = []
	completion_logprobs = []
	elif prompt_len + completion_len > max_seq_len:
	is_truncated = True
	completion_ids = tokens.completion_ids[: max_seq_len - prompt_len]
	completion_mask = tokens.completion_mask[: max_seq_len - prompt_len]
	completion_logprobs = tokens.completion_logprobs[: max_seq_len - prompt_len]
	else:

Conversation

samsja commented May 25, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Preserve routed_experts sidecar as a zero-copy memoryview through chat completion and renderer pipelines

🗂️ Filtered Issues

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 25, 2026

Choose a reason for hiding this comment

Nested extra_body on post path

Uh oh!

cursor Bot May 25, 2026

Choose a reason for hiding this comment

Routed experts not truncated

Uh oh!

cursor Bot May 25, 2026

Choose a reason for hiding this comment

Reference docs stale routed_experts

Uh oh!

macroscopeapp Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

macroscopeapp Bot commented May 25, 2026

Approvability

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

samsja commented May 25, 2026 •

edited by macroscopeapp Bot

Loading

Preserve `routed_experts` sidecar as a zero-copy memoryview through chat completion and renderer pipelines