[AMD] Add DSv4-FP4-MI355X ATOMMESH MTP by seungrokj · Pull Request #1855 · SemiAnalysisAI/InferenceX

seungrokj · 2026-06-19T08:04:23Z

Summary

Add dsv4-fp4-mi355x-atom-disagg-mtp recipe to amd-master.yaml: multi-node disaggregated prefill+decode on MI355X via ATOM with MTP speculative decoding (2P1D DPA+TBO+MTP1 at ISL8192, 1P1D TP8+MTP3, 1P1D DPA+MTP1)
Improve server_atom.sh config print block: replace individual echo lines with a cat <<INFO heredoc that shows EP/DP flags, KV cache settings (dtype, block size, mem fraction), xP/yD topology, and all parallel/spec/opt args
Minor fix in dsv4_fp4_mi355x_atom-disagg.sh

Test plan

Verify dsv4-fp4-mi355x-atom-disagg-mtp sweep launches correctly on MI355X
Confirm config print output is correct on a live run

🤖 Generated with Claude Code

Note

Medium Risk
Touches multi-node ATOM server command construction and Slurm/Docker env plumbing; mistakes could break disagg+MTP sweeps or change serving flags, but scope is benchmark infrastructure only.

Overview
Adds dsv4-fp4-mi355x-atom-disagg-mtp to amd-master.yaml and documents it in perf-changelog.yaml: multi-node disaggregated prefill/decode on MI355X with ATOM + MTP, covering 2P1D DPA/TBO sweeps at ISL 8192 and 1P1D TP8 / DPA+MTP at 1k and 8k sequence lengths (rocm/atom-dev:nightly_202606181332).

ATOM disagg launch path is extended so recipe knobs reach the cluster: dsv4_fp4_mi355x_atom-disagg.sh exports SPEC_DECODING and DECODE_MTP_SIZE; job.slurm passes BENCH_REQUEST_RATE into containers and renames static GPU memory env to MEM_FRAC_STATIC.

server_atom.sh wires MTP (--method mtp --num-speculative-tokens), enables TBO on DP-attention prefill (and related GPU_MAX_HW_QUEUES / ATOM_CPU_AFFINITY), applies DeepSeek-V4-Pro --hf-overrides for index cache, sets IS_MTP for benchmarks, and replaces scattered echo logging with a single config summary block. Prefill/decode server commands now pass parallel/spec/HF flags through eval with expanded arg strings.

bench.sh uses --dsv4 (instead of --use-chat-template) when MTP benchmarking DeepSeek-V4-Pro.

^{Reviewed by Cursor Bugbot for commit 8b4a94c. Bugbot is set up for automated code reviews on this repo. Configure here.}

- Replace individual echo lines with cat <<INFO heredoc showing EP/DP flags, KV cache settings alongside TP/port info - Minor cleanup in parallel args setup Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-06-19T08:04:32Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

感谢你的贡献！对于 vLLM 与 SGLang，请确保你的 recipe 与官方 vLLM recipes 和/或 SGLang cookbook 保持一致

如果不一致，请先创建一个 PR，之后我们才能将你的单节点 PR 合并到 master 分支。让我们确保文档保持一流水准，使整个 ML 社区都能从你的辛勤工作中受益！谢谢

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动（flake），重新运行失败的任务即可解决。如果选择重新运行失败的任务，PR 作者有责任确保其最终通过。参见 GitHub 关于重新运行失败任务的文档：https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

一般而言，PR 作者应先向相应公司的 CODEOWNERS 请求审阅并获得 PR 批准，然后再请求核心维护者审阅。

如需更多帮助，PR 作者可通过 Slack 联系核心维护者。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…disagg - Export DECODE_MTP_SIZE and SPEC_DECODING in dsv4_fp4_mi355x_atom-disagg.sh so they reach server_atom.sh via submit.sh → job.slurm - Add DECODE_MTP_SIZE to check_env_vars in dsv4_fp4_mi355x_atom-disagg.sh - Pass BENCH_REQUEST_RATE into Docker container in job.slurm DOCKER_ENV_COMMON Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

DECODE_MTP_SIZE comes from additional-settings and has a default of 0, so it should not be required. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…h.sh Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Remove spaces from JSON value so it doesn't get word-split when expanded inside the eval'd command string. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

OPT_ARGS array expansion inside eval'd string caused bash word-splitting, breaking the --hf-overrides JSON argument. Inline the flag directly in all three server commands and remove the now-unused OPT_ARGS definition. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Define once near SPEC_ARGS and reference in all three server commands (prefill node 0, additional prefill nodes, decode nodes). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

"${ARRAY[@]}" inside a double-quoted assignment breaks bash -n's quote parser. Since all three CMD strings are passed to eval, ${ARRAY[*]} is equivalent — eval handles word splitting. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-06-19T14:19:36Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27822678019
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27822678019

…tp; add printenv dump and cudagraph-capture-sizes to server_atom.sh Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-06-19T15:14:06Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27833371284
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27833371284

functionstackx · 2026-06-19T17:16:28Z

requeueing this workflow to make room for m3 disagg

functionstackx

requeueing this workflow

functionstackx · 2026-06-19T17:17:21Z

+  image: rocm/atom-dev:nightly_202606181332
+  model: deepseek-ai/DeepSeek-V4-Pro
+  model-prefix: dsv4
+  runner: mi355x


plz use runner mi355x-disagg

github-actions · 2026-06-19T17:17:26Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27833825585
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27833825585

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 8b4a94c. Configure here.}

cursor · 2026-06-22T13:27:04Z

+          - "DECODE_NODES=1"
+          - "DECODE_MTP_SIZE=3"
+      # 1P1D TP8+DPA+TBO+MTP1
+    - isl: 1024


Missing ISL8192 DPA MTP sweep

Medium Severity

The 1P1D TP8+DPA+TBO+MTP1 comment at ISL8192 is not followed by a search-space entry, so that configuration never runs. The ISL8192 block ends after 1P1D TP8+MTP3, unlike the ISL1024 block where the matching DPA+MTP1 sweep is defined.

^{Reviewed by Cursor Bugbot for commit 8b4a94c. Configure here.}

cursor · 2026-06-22T13:27:04Z

+  image: rocm/atom-dev:nightly_202606181332
+  model: deepseek-ai/DeepSeek-V4-Pro
+  model-prefix: dsv4
+  runner: mi355x


Wrong CI runner for disagg

Medium Severity

The new multinode disaggregated recipe sets runner: mi355x, but disagg benchmarks on MI355X are scheduled on the mi355x-disagg pool per runners.yaml and other disagg recipes; PR review also requested mi355x-disagg.

^{Reviewed by Cursor Bugbot for commit 8b4a94c. Configure here.}

github-actions · 2026-06-22T13:27:58Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27956132438
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27956132438

[AMD] server_atom: improve config print and cleanup

ea33910

- Replace individual echo lines with cat <<INFO heredoc showing EP/DP flags, KV cache settings alongside TP/port info - Minor cleanup in parallel args setup Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

seungrokj requested a review from a team June 19, 2026 08:04

github-project-automation Bot added this to InferenceMAX Board Jun 19, 2026

seungrokj requested review from 1am9trash, billishyahao, chunfangamd and yctseng0211 as code owners June 19, 2026 08:04

update perf-changelog for dsv4-fp4-mi355x-atom-disagg-mtp

027f3f1

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor Bot reviewed Jun 19, 2026

View reviewed changes

Comment thread benchmarks/multi_node/dsv4_fp4_mi355x_atom-disagg.sh Outdated

Comment thread benchmarks/multi_node/amd_utils/server_atom.sh Outdated

Comment thread benchmarks/multi_node/amd_utils/server_atom.sh Outdated

seungrokj changed the title ~~[AMD] Add DSv4-FP4-MI355X atom-disagg MTP recipe and improve server_atom config print~~ [AMD] Add DSv4-FP4-MI355X atom-disagg MTP Jun 19, 2026

seungrokj and others added 2 commits June 19, 2026 17:22

[AMD] server_atom: pass SPEC_ARGS to prefill server

cd745fa

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor Bot reviewed Jun 19, 2026

View reviewed changes

Comment thread benchmarks/multi_node/dsv4_fp4_mi355x_atom-disagg.sh Outdated

seungrokj and others added 2 commits June 19, 2026 17:27

[AMD] amd-master: fix comment for 1P1D TP8+DPA+TBO+MTP1 config

baf0e06

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

[AMD] dsv4_atom-disagg: remove DECODE_MTP_SIZE from check_env_vars

1485744

DECODE_MTP_SIZE comes from additional-settings and has a default of 0, so it should not be required. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor Bot reviewed Jun 19, 2026

View reviewed changes

Comment thread benchmarks/multi_node/amd_utils/server_atom.sh Outdated

cursor Bot reviewed Jun 19, 2026

View reviewed changes

Comment thread benchmarks/multi_node/amd_utils/server_atom.sh

seungrokj and others added 2 commits June 19, 2026 17:37

[AMD] bench: use --dsv4 flag for DeepSeek-V4-Pro MTP benchmarks

4e039bc

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

[AMD] server_atom: export IS_MTP=true when SPEC_DECODING=mtp for benc…

0868467

…h.sh Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor Bot reviewed Jun 19, 2026

View reviewed changes

Comment thread benchmarks/multi_node/amd_utils/server_atom.sh Outdated

seungrokj and others added 9 commits June 19, 2026 18:44

[AMD] server_atom: fix hf-overrides JSON quoting

c7d48b0

Remove spaces from JSON value so it doesn't get word-split when expanded inside the eval'd command string. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

update perf-changelog for minimaxm3-fp4-mi355x-atom

39e62eb

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

update perf-changelog for dsv4-fp4-mi355x-atom-disagg-mtp

ba37d04

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge branch 'main' into amd/atom_mesh_0619_mtp

eb7179f

refactor: extract --hf-overrides into HF_OVERRIDES_ARG variable

5106002

Define once near SPEC_ARGS and reference in all three server commands (prefill node 0, additional prefill nodes, decode nodes). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: enable --hf-overrides only for DeepSeek-V4-Pro

55c810d

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: add HF_OVERRIDES_ARG to INFO config print block

6386657

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor Bot reviewed Jun 19, 2026

View reviewed changes

Comment thread benchmarks/multi_node/amd_utils/server_atom.sh Outdated

fix: remove ${CUDAGRAPH_OPT} from decode CMD

97f0cab

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

seungrokj added AMD full-sweep-enabled labels Jun 19, 2026

seungrokj changed the title ~~[AMD] Add DSv4-FP4-MI355X atom-disagg MTP~~ [AMD] Add DSv4-FP4-MI355X ATOMMESH MTP Jun 19, 2026

feat: add 2P1D DPA+MTP3 search space to dsv4-fp4-mi355x-atom-disagg-m…

f9a93c4

…tp; add printenv dump and cudagraph-capture-sizes to server_atom.sh Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor Bot reviewed Jun 19, 2026

View reviewed changes

Comment thread benchmarks/multi_node/amd_utils/server_atom.sh

Merge branch 'main' into amd/atom_mesh_0619_mtp

4d4fe2b

functionstackx requested changes Jun 19, 2026

View reviewed changes

Merge branch 'main' into amd/atom_mesh_0619_mtp

8b4a94c

seungrokj removed the full-sweep-enabled label Jun 22, 2026

cursor Bot reviewed Jun 22, 2026

View reviewed changes

seungrokj mentioned this pull request Jun 23, 2026

[AMD] Add MiniMax-M3-FP4 MI355X ATOMMESH #1856

Open

4 tasks

Conversation

seungrokj commented Jun 19, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

github-actions Bot commented Jun 19, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 19, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 19, 2026

Uh oh!

functionstackx commented Jun 19, 2026

Uh oh!

functionstackx left a comment

Choose a reason for hiding this comment

Uh oh!

functionstackx Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 19, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 22, 2026

Choose a reason for hiding this comment

Missing ISL8192 DPA MTP sweep

Uh oh!

cursor Bot Jun 22, 2026

Choose a reason for hiding this comment

Wrong CI runner for disagg

Uh oh!

github-actions Bot commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

seungrokj commented Jun 19, 2026 •

edited by cursor Bot

Loading