[NV] update B300 disagg recipes by biswapanda · Pull Request #1887 · SemiAnalysisAI/InferenceX

biswapanda · 2026-06-22T20:40:12Z

Note

Medium Risk
Large benchmark-only surface area, but the launcher mutates cloned srt-slurm and patches installed vLLM at runtime—failures would affect CI/cluster jobs rather than app users.

Overview
Adds MiniMax-M3 MXFP8 disaggregated Dynamo + vLLM fixed-sequence benchmarks on B300, including a new minimaxm3-fp8-b300-dynamo-vllm entry in nvidia-master.yaml with 1k/1k and 8k/1k search spaces (prefill DEP2, varied decode topologies and concurrencies).

Introduces local srt-slurm recipes under benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m3/b300-fp8/ and wires launch_b300-nv.sh to overlay them on sa-submission-q2-2026, set model paths, run minimax-m3-vllm-fixes.sh via srtctl --setup-script, apply the srt-slurm node-IP patch, optionally exclude b300-018, and verify #SBATCH --exclude in the generated script.

The setup script patches the installed vLLM image at job time: contiguous MSA prefill top-k for CSR, and NIXL KV block-length checks using GQA head ratios (heterogeneous TP). Recipes add TP4 + Marlin decode options, colocated 6-GPU pairs with CUDA IPC for NIXL on selected shapes, and 8k1k fp8 KV / attention settings aligned with 1k1k.

Documents the work in perf-changelog.yaml (PR #1863).

^{Reviewed by Cursor Bugbot for commit b2e71c8. Bugbot is set up for automated code reviews on this repo. Configure here.}

# Conflicts: # perf-changelog.yaml

# Conflicts: # .github/configs/nvidia-master.yaml

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit d08cc43. Configure here.}

Oseltamivir added 21 commits June 20, 2026 05:25

[NV] Add MiniMax M3 B300 Dynamo vLLM recipes

b506cd4

chore: update MiniMax M3 B300 container

84a023a

chore: update changelog PR link

b09bc78

Update perf-changelog.yaml

86da150

Update perf-changelog.yaml

f5727c2

fix(vllm): patch MiniMax M3 MSA contiguity

3b6dad4

fix(recipes): align MiniMax M3 parallel settings

71ba2ea

fix(vllm): backport MiniMax M3 eval fixes

b859a0b

ci(sweep): enable full MiniMax M3 validation

2d408e4

perf(vllm): right-size MiniMax M3 low concurrency

3956aee

Merge remote-tracking branch 'origin/main' into pr-1787-latest

33fe6a9

# Conflicts: # perf-changelog.yaml

Merge branch 'main' into pr-1787-latest

77c6391

perf(vllm): colocate MiniMax M3 TP4 workers

b99d3c9

fix(runner): exclude faulty B300 RDMA node

d2347aa

fix(runner): verify B300 node exclusion

8ace2e9

fix(runner): check generated B300 sbatch script

884ff12

ci(sweep): validate B300 node exclusion

3ae240b

Merge remote-tracking branch 'origin/main' into pr-1787-latest

9751d93

# Conflicts: # .github/configs/nvidia-master.yaml

refactor(vllm): trim MiniMax M3 runtime patches

03d27e7

Merge branch 'main' into pr-1787-latest

826a64e

Merge branch 'main' into pr-1787-latest

aec850f

biswapanda requested review from jgangani and kedarpotdar-nv as code owners June 22, 2026 20:40

github-project-automation Bot added this to InferenceMAX Board Jun 22, 2026

cursor Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m3/b300-fp8/8k1k/2p1d-dep2-tep4-8k1k.yaml

Comment thread benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m3/b300-fp8/1k1k/1p1d-dep2-tep8-1k1k.yaml Outdated

biswapanda force-pushed the pr-1787-latest--update branch from e981e26 to ce6b59d Compare June 22, 2026 20:44

Update MiniMax M3 B300 Dynamo vLLM recipes

37d5e2c

biswapanda force-pushed the pr-1787-latest--update branch from ce6b59d to 37d5e2c Compare June 22, 2026 20:45

cursor Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m3/b300-fp8/8k1k/2p1d-dep2-tep4-8k1k.yaml Outdated

Comment thread benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m3/b300-fp8/8k1k/4p2d-dep2-tep4-8k1k.yaml

biswapanda changed the title ~~update B300 disagg recipes~~ [NV] update B300 disagg recipes Jun 22, 2026

fix

adbe614

cursor Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread .github/configs/nvidia-master.yaml

Comment thread .github/configs/nvidia-master.yaml Outdated

Comment thread .github/configs/nvidia-master.yaml

jasonlizhengjian added the full-sweep-enabled label Jun 22, 2026

biswapanda added 2 commits June 22, 2026 17:11

update to flashinfer

fe0eda5

prune non-pareto

0a751a7

biswapanda changed the base branch from pr-1787-latest to main June 23, 2026 01:00

biswapanda requested a review from Ankur-singh as a code owner June 23, 2026 01:00

biswapanda requested a review from a team June 23, 2026 01:00

Merge branch 'main' into pr-1787-latest--update

d08cc43

cursor Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread .github/configs/nvidia-master.yaml Outdated

jasonlizhengjian mentioned this pull request Jun 23, 2026

[NV] update B300 disagg recipes (same-repo sweep copy) #1891

Open

clean up nvidia-master

b2e71c8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NV] update B300 disagg recipes#1887

[NV] update B300 disagg recipes#1887
biswapanda wants to merge 27 commits into
SemiAnalysisAI:mainfrom
biswapanda:pr-1787-latest--update

biswapanda commented Jun 22, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

biswapanda commented Jun 22, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

biswapanda commented Jun 22, 2026 •

edited by cursor Bot

Loading