Skip to content

[NV] update B300 disagg recipes#1887

Open
biswapanda wants to merge 27 commits into
SemiAnalysisAI:mainfrom
biswapanda:pr-1787-latest--update
Open

[NV] update B300 disagg recipes#1887
biswapanda wants to merge 27 commits into
SemiAnalysisAI:mainfrom
biswapanda:pr-1787-latest--update

Conversation

@biswapanda

@biswapanda biswapanda commented Jun 22, 2026

Copy link
Copy Markdown

Note

Medium Risk
Large benchmark-only surface area, but the launcher mutates cloned srt-slurm and patches installed vLLM at runtime—failures would affect CI/cluster jobs rather than app users.

Overview
Adds MiniMax-M3 MXFP8 disaggregated Dynamo + vLLM fixed-sequence benchmarks on B300, including a new minimaxm3-fp8-b300-dynamo-vllm entry in nvidia-master.yaml with 1k/1k and 8k/1k search spaces (prefill DEP2, varied decode topologies and concurrencies).

Introduces local srt-slurm recipes under benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m3/b300-fp8/ and wires launch_b300-nv.sh to overlay them on sa-submission-q2-2026, set model paths, run minimax-m3-vllm-fixes.sh via srtctl --setup-script, apply the srt-slurm node-IP patch, optionally exclude b300-018, and verify #SBATCH --exclude in the generated script.

The setup script patches the installed vLLM image at job time: contiguous MSA prefill top-k for CSR, and NIXL KV block-length checks using GQA head ratios (heterogeneous TP). Recipes add TP4 + Marlin decode options, colocated 6-GPU pairs with CUDA IPC for NIXL on selected shapes, and 8k1k fp8 KV / attention settings aligned with 1k1k.

Documents the work in perf-changelog.yaml (PR #1863).

Reviewed by Cursor Bugbot for commit b2e71c8. Bugbot is set up for automated code reviews on this repo. Configure here.

@biswapanda biswapanda force-pushed the pr-1787-latest--update branch from e981e26 to ce6b59d Compare June 22, 2026 20:44
@biswapanda biswapanda force-pushed the pr-1787-latest--update branch from ce6b59d to 37d5e2c Compare June 22, 2026 20:45
@biswapanda biswapanda changed the title update B300 disagg recipes [NV] update B300 disagg recipes Jun 22, 2026
Comment thread .github/configs/nvidia-master.yaml
Comment thread .github/configs/nvidia-master.yaml Outdated
Comment thread .github/configs/nvidia-master.yaml
@biswapanda biswapanda changed the base branch from pr-1787-latest to main June 23, 2026 01:00
@biswapanda biswapanda requested a review from Ankur-singh as a code owner June 23, 2026 01:00
@biswapanda biswapanda requested a review from a team June 23, 2026 01:00

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit d08cc43. Configure here.

Comment thread .github/configs/nvidia-master.yaml Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants