feat(client): add dynamo_chat transport + routed_experts to renderer generate by biswapanda · Pull Request #79 · PrimeIntellect-ai/renderers

biswapanda · 2026-06-09T00:19:26Z

Description

Adds a dynamo_chat transport to the renderer-based generate() client so it can run against NVIDIA Dynamo, which serves no /inference/v1/generate route. Selected per-call via transport=; defaults to the existing vLLM path, so behavior is unchanged unless opted in.

Two transports:

vllm_generate (default): unchanged — messages → render_ids() → POST /inference/v1/generate → parse_response() (vLLM TITO surface).
dynamo_chat: messages → render_ids() → POST /v1/chat/completions with nvext.token_data (pre-tokenized prompt) + nvext.extra_fields=["engine_data"]. Completion token IDs and logprobs are read back from nvext.engine_data.

Dynamo wire shape (`_post_dynamo_chat`)

Mirrors the verifiers token client so the payload is identical whether a rollout goes through the token client or the renderer client. nvext.token_data (Dynamo skips tokenization when present); cache_salt → nvext.cache_salt, priority → nvext.agent_hints.priority; a single placeholder user message; sampling remap (max_tokens → max_completion_tokens, logprobs=N → logprobs=true + top_logprobs=N); passthrough fields ride the Dynamo allowlist. Tools are baked into token_data by the renderer (not sent on the wire).

routed_experts (MoE expert replay) — now surfaced on dynamo_chat

(Supersedes the earlier "routed_experts intentionally NOT surfaced" note — it now is.) parse reads routed_experts from nvext.routed_experts (or nvext.engine_data.routed_experts) and maps it to the downstream RoutedExpertsPayload {data, shape, start, dtype}. The Dynamo worker returns full-sequence routing with start=0; the renderer row-trims the leading prompt rows only when the caller explicitly sets routed_experts_prompt_start — a first-turn request with no caller start stays full-sequence with start=0 (no phantom prefix). Completion logprobs prefer nvext.engine_data.completion_logprobs (the same authoritative source as the engine token IDs) over the chat echo; a present-but-empty engine list is authoritative and does not fall back to chat.

Other

Public RendererTransport = Literal["vllm_generate", "dynamo_chat"] alias. A present-but-empty completion_token_ids is a valid zero-token completion; only a fully absent field raises. Multimodal renderers raise NotImplementedError on dynamo_chat (vLLM path / token-client TITO remain available for VLMs).

Type of Change

New feature (non-breaking change which adds functionality)

Review

Codex adversarial review: SIGN-OFF (F1/F2/F3 + the N1 logprob-presence finding resolved; head 5f2a914). All review threads resolved.

Testing

tests/test_client.py covers the Dynamo request body shape (priority/detokenize/sampling remap), routed_experts parse + row-trim (explicit prompt_start vs first-turn full-sequence), engine-logprob preference incl. present-but-empty, and missing/empty completion IDs.

Note

Medium Risk
New Dynamo wire/parse path affects completion IDs, logprobs, and MoE routed_experts for opt-in callers; default vLLM behavior is unchanged but misconfigured Dynamo responses fail at runtime.

Overview
Adds an opt-in dynamo backend to renderer generate() via a new transport argument (default vllm), so rollouts can target NVIDIA Dynamo without changing existing vLLM TITO behavior.

Wire/parse logic moves into _VllmGenerateTransport and _DynamoChatTransport, with responses normalized through _WireResult. The Dynamo path posts to /v1/chat/completions with nvext.token_data and reads nvext.engine_data.completion_token_ids / logprobs (engine channel wins over chat echo). It maps cache_salt, priority, and routed_experts_prompt_start into nvext, forwards sampling params via a denylist, and surfaces routed_experts via _normalize_routed_experts plus optional client-side _trim_dynamo_routed_experts. Large expert blobs use the same zero-copy JSON strip as vLLM (_parse_dynamo_response). Multimodal on Dynamo raises NotImplementedError.

tests/test_client.py adds broad Dynamo coverage (body shape, nvext merge, errors, logprob alignment, routing trim).

^{Reviewed by Cursor Bugbot for commit b9d25b1. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Add Dynamo chat-completions transport and `routed_experts` support to `generate()`

Adds a pluggable transport architecture to renderers/client.py with _VllmGenerateTransport (existing behavior) and _DynamoChatTransport (new) strategies selected via a transport parameter on generate().
The Dynamo transport posts to /chat/completions with nvext.token_data, routes cache_salt and priority into nvext, maps logprobs to OpenAI-style fields, and drops vLLM-only keys via a denylist.
Response parsing prefers nvext.engine_data for completion_token_ids and completion_logprobs, normalizes routed_experts into a typed struct, and preserves large base64 blobs as zero-copy memoryview objects.
Adds client-side trimming of routed_experts prompt rows when the worker did not apply routed_experts_prompt_start.
Risk: Dynamo transport raises NotImplementedError for multimodal inputs, RuntimeError when completion IDs are absent or logprobs length mismatches — these are new runtime failure modes with no fallback.

^{Macroscope summarized b9d25b1.}

…ols from dynamo body, raise on missing ids; rename transport to dynamo_chat

…d, drop routed_experts on dynamo (codex round 2)

…ake); docstring fix

…ached endpoints

…erge nvext, canonical completion-ids, logprobs alignment)

… on dynamo path

…payload to contract

…only

…fset

…ing only)

…first-turn stays full)

…ith engine ids

… chat fallback)

…trim is now a back-compat fallback

…omments

…oid event-loop json.loads)

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit fec0a81. Configure here.}

cursor Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread renderers/client.py Outdated

Comment thread renderers/client.py Outdated

Comment thread renderers/client.py Outdated

biswapanda mentioned this pull request Jun 9, 2026

feat: dynamo inference backend integration PrimeIntellect-ai/prime-rl#2737

Open

1 task

biswapanda changed the title ~~feat(client): add dynamo_chat_nvext transport to renderer generate()~~ feat(client): add dynamo_chat_nvext transport to renderer Jun 9, 2026

biswapanda added 2 commits June 8, 2026 19:11

feat(client): add transport selector + dynamo_chat_nvext branch

334e496

feat: forward Dynamo nvext TITO fields

6a21574

biswapanda force-pushed the rl-sdk-4 branch from 268e16b to 6a21574 Compare June 9, 2026 02:13

fix(client): address codex review — revert default vLLM path, drop to…

a35e023

…ols from dynamo body, raise on missing ids; rename transport to dynamo_chat

biswapanda mentioned this pull request Jun 9, 2026

feat(clients): add dynamo_chat renderer transport (TITO over Dynamo) PrimeIntellect-ai/verifiers#1574

Open

1 task

biswapanda added 2 commits June 9, 2026 01:21

fix(client): gate nvext fallbacks to dynamo path, fix zero-token guar…

b6f50d0

…d, drop routed_experts on dynamo (codex round 2)

test(client): prove routed_experts dropped on dynamo (Dynamo-shaped f…

5dbf494

…ake); docstring fix

biswapanda changed the title ~~feat(client): add dynamo_chat_nvext transport to renderer~~ feat(client): add dynamo_chat transport to renderer generate() Jun 9, 2026

biswapanda added 4 commits June 9, 2026 15:31

style: apply ruff format to client + tests (fix CI)

287871c

docs(client): trim verbose comments in dynamo_chat path

6041134

refactor(client): replace transport if/else with strategy classes + c…

503846c

…ached endpoints

fix(client): address codex F1-F4 on dynamo_chat (denylist sampling, m…

ed03eaa

…erge nvext, canonical completion-ids, logprobs alignment)

cursor Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread renderers/client.py

biswapanda changed the title ~~feat(client): add dynamo_chat transport to renderer generate()~~ feat(client): add dynamo_chat transport to renderer generate Jun 10, 2026

biswapanda added 2 commits June 9, 2026 20:05

fix(client): route sampling_params cache_salt and priority into nvext…

eb0bdb2

… on dynamo path

feat(client): surface routed_experts on dynamo_chat transport

30c01b6

cursor Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread renderers/client.py Outdated

biswapanda added 8 commits June 10, 2026 11:30

fix(client): drop duplicate routed_experts request; normalize parsed …

28e3d02

…payload to contract

test(client): update dynamo extra_fields expectations to engine_data …

c31854f

…only

fix(client): stamp routed_experts.start on dynamo_chat from prompt of…

59553da

…fset

test(client): expect stamped routed_experts.start on dynamo_chat

b7927f8

fix(client): trim dynamo_chat routed_experts rows to start (was stamp…

b554520

…ing only)

fix(client): only trim routed_experts when caller sets prompt_start (…

010c894

…first-turn stays full)

fix(client): prefer engine_data.completion_logprobs to stay aligned w…

51d9154

…ith engine ids

fix(client): treat present-empty engine logprobs as authoritative (no…

5f2a914

… chat fallback)

biswapanda changed the title ~~feat(client): add dynamo_chat transport to renderer generate~~ feat(client): add dynamo_chat transport + routed_experts to renderer generate Jun 10, 2026

feat(client): send routed_experts_prompt_start in nvext; client-side …

7567377

…trim is now a back-compat fallback

biswapanda mentioned this pull request Jun 11, 2026

feat(RL): forward routed_experts_prompt_start via nvext ai-dynamo/dynamo#10562

Open

3 tasks

biswapanda added 2 commits June 10, 2026 17:30

docs(client): drop PR-number references and stale vLLM version from c…

f5c480d

…omments

perf(client): zero-copy routed_experts on dynamo_chat (strip blob, av…

b62aabf

…oid event-loop json.loads)

biswapanda mentioned this pull request Jun 11, 2026

feat(client): add Dynamo inference backend PrimeIntellect-ai/prime-rl#2773

Open

biswapanda added 2 commits June 12, 2026 02:20

fix(dynamo): require aligned renderer logprobs

fe93638

chore(client): rename renderer transport values

fec0a81

cursor Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread renderers/client.py

fix(client): fall back to engine routed experts

b9d25b1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(client): add dynamo_chat transport + routed_experts to renderer generate#79

feat(client): add dynamo_chat transport + routed_experts to renderer generate#79
biswapanda wants to merge 25 commits into
PrimeIntellect-ai:mainfrom
biswapanda:rl-sdk-4

biswapanda commented Jun 9, 2026 •

edited by macroscopeapp Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

biswapanda commented Jun 9, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Dynamo wire shape (_post_dynamo_chat)

routed_experts (MoE expert replay) — now surfaced on dynamo_chat

Other

Type of Change

Review

Testing

Add Dynamo chat-completions transport and routed_experts support to generate()

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

biswapanda commented Jun 9, 2026 •

edited by macroscopeapp Bot

Loading

Dynamo wire shape (`_post_dynamo_chat`)

Add Dynamo chat-completions transport and `routed_experts` support to `generate()`