[NV] Add MiniMax M3 B300 Dynamo vLLM recipes with performance image by Oseltamivir · Pull Request #1890 · SemiAnalysisAI/InferenceX

Oseltamivir · 2026-06-23T02:46:53Z

Summary

Reproduce PR [NV] Add MiniMax M3 B300 Dynamo vLLM recipes #1863's MiniMax-M3 MXFP8 B300 Dynamo-vLLM configuration, 16 srt-slurm recipes, runtime fixes, and B300 launcher integration on current main.
Keep the topology, concurrency, parallelism, CUDA graph, KV-transfer, colocation, and node-exclusion settings unchanged.
Use vllm/vllm-openai:vllm-minimax-m3-perf-x86_64-13.0.1-7a67223 in the master config and every recipe.
Retain the MSA top-k contiguity runtime fix, but do not reapply the NIXL heterogeneous-TP patch because vLLM commit 7a67223 already includes vLLM #45879.
The Docker Hub manifest is active Linux amd64.

Validation

Generated all 16 matching B300 sweep entries with the requested image.
Confirmed all recipe containers match the master configuration.
Tested the MSA setup script twice against vLLM 7a67223 source to verify patching and idempotence.
bash -n runners/launch_b300-nv.sh
bash -n benchmarks/multi_node/srt-slurm-recipes/configs/minimax-m3-vllm-fixes.sh
python3 utils/validate_perf_changelog.py --base-ref origin/main --head-ref HEAD
uv run --with pytest --with pydantic --with pyyaml python -m pytest utils/matrix_logic/ utils/changelog_gate_tests/test_validate_perf_changelog.py -q (200 passed)
git diff --check origin/main...HEAD

Note

Low Risk
Benchmark and launcher configuration only; no production service code. Main operational risk is Slurm job misconfiguration or failed setup-script patching on cluster runs.

Overview
Adds minimaxm3-fp8-b300-dynamo-vllm to the NVIDIA master matrix with multinode disaggregated fixed-seq-len sweeps for 1k/1k and 8k/1k, wired to 16 local srt-slurm recipe YAMLs (DEP2 prefill, TEP8 / DEP8 / DEP4 / TP4+Marlin decode topologies).

launch_b300-nv.sh gains a minimaxm3 dynamo-vLLM path: overlay recipes/vllm/minimax-m3, pin sa-submission-q2-2026, apply the NVIDIA/srt-slurm#38 node-IP fix, run minimax-m3-vllm-fixes.sh via srtctl --setup-script, and inject Slurm exclude: b300-018 (overridable via env) with a post-submit sanity check on the rendered sbatch script.

The runtime fix script is narrowed to a single idempotent MSA prefill_topk.contiguous() patch; obsolete NIXL string patches are dropped for image vllm-minimax-m3-perf-x86_64-13.0.1-7a67223 (upstream #45879). KLAUD_DEBUG.md documents that failure mode. perf-changelog.yaml records the new config key and operational notes (colocated TP4 + CUDA IPC, Marlin decode variants).

^{Reviewed by Cursor Bugbot for commit 7903b1f. Bugbot is set up for automated code reviews on this repo. Configure here.}

github-actions · 2026-06-23T02:47:00Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

感谢你的贡献！对于 vLLM 与 SGLang，请确保你的 recipe 与官方 vLLM recipes 和/或 SGLang cookbook 保持一致

如果不一致，请先创建一个 PR，之后我们才能将你的单节点 PR 合并到 master 分支。让我们确保文档保持一流水准，使整个 ML 社区都能从你的辛勤工作中受益！谢谢

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动（flake），重新运行失败的任务即可解决。如果选择重新运行失败的任务，PR 作者有责任确保其最终通过。参见 GitHub 关于重新运行失败任务的文档：https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

一般而言，PR 作者应先向相应公司的 CODEOWNERS 请求审阅并获得 PR 批准，然后再请求核心维护者审阅。

如需更多帮助，PR 作者可通过 Slack 联系核心维护者。

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 03ce791. Configure here.}

cursor · 2026-06-23T02:49:23Z

+    cp -rT "$GITHUB_WORKSPACE/benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m3" recipes/vllm/minimax-m3
+    SRTCTL_SETUP_SCRIPT="minimax-m3-vllm-fixes.sh"
+    # NVIDIA/srt-slurm#38
+    git show 22d46ba9971615016d2339c9ffbc7b4597accfad --format= -- src/srtctl/core/ip_utils/get_node_ip.sh | git apply - || exit 1


Hard fail on git apply

Medium Severity

The MiniMax M3 clone path pipes the get_node_ip.sh backport through git apply and exits the whole launcher on any apply failure. Once sa-submission-q2-2026 already contains that change (or the file diverges), apply fails and no srtctl job is submitted even though the fix is already present.

^{Reviewed by Cursor Bugbot for commit 03ce791. Configure here.}

github-actions · 2026-06-23T02:58:07Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27998660868
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27998660868

github-actions · 2026-06-23T05:34:22Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27998992389
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27998992389

Oseltamivir requested a review from a team June 23, 2026 02:46

Oseltamivir requested review from Ankur-singh, jgangani and kedarpotdar-nv as code owners June 23, 2026 02:46

github-project-automation Bot added this to InferenceMAX Board Jun 23, 2026

feat: add MiniMax M3 B300 Dynamo vLLM sweep

03ce791

Oseltamivir force-pushed the update/minimax-m3-b300-perf-image branch from 832e0d8 to 03ce791 Compare June 23, 2026 02:47

Oseltamivir added the full-sweep-enabled label Jun 23, 2026

cursor Bot reviewed Jun 23, 2026

View reviewed changes

fix(vllm): skip upstream MiniMax M3 NIXL patch

7903b1f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NV] Add MiniMax M3 B300 Dynamo vLLM recipes with performance image#1890

[NV] Add MiniMax M3 B300 Dynamo vLLM recipes with performance image#1890
Oseltamivir wants to merge 2 commits into
mainfrom
update/minimax-m3-b300-perf-image

Oseltamivir commented Jun 23, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented Jun 23, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 23, 2026

Uh oh!

github-actions Bot commented Jun 23, 2026

Uh oh!

github-actions Bot commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Oseltamivir commented Jun 23, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Uh oh!

github-actions Bot commented Jun 23, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 23, 2026

Choose a reason for hiding this comment

Hard fail on git apply

Uh oh!

github-actions Bot commented Jun 23, 2026

Uh oh!

github-actions Bot commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Oseltamivir commented Jun 23, 2026 •

edited by cursor Bot

Loading