[AMD] add dsv4 sglang disagg#1818
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit c22652b. Configure here.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27731746053 |
1 similar comment
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27731746053 |
# Conflicts: # perf-changelog.yaml
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27731746053 |
| description: | ||
| - "init submission of dsv4 sglang disagg " |
There was a problem hiding this comment.
can u also ur ai agent include descriptions + links of some of the bug fix PRs in here like
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27893589025 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27896968169 |
|
hi @billishyahao there seems to be an accuracy issues with TP8+TP8. codex has narrowed it down to conc=4, here is the bug report for when u wake up, please take a look https://github.com/SemiAnalysisAI/InferenceX/actions/runs/27896968169/job/82550287079?pr=1818 |
functionstackx
left a comment
There was a problem hiding this comment.
can u remove https://github.com/SemiAnalysisAI/InferenceX/blob/main/benchmarks/multi_node/amd_utils/patches/mori_conn.py too
this is no longer needed now that sgl-project/sglang#26525 is fixed in sgl-project/sglang#26539
InferenceX/benchmarks/multi_node/amd_utils/job.slurm
Lines 73 to 80 in 6a07901

cc @Duyi-Wang
Note
Medium Risk
Touches shared disagg launch paths (
server_sglang.sh,models.yaml) for all models, not only DSv4; behavior changes when EP is disabled and MoE auto-sizing is partially commented out.Overview
Adds
dsv4-fp4-mi355x-sglang-disaggto the AMD master benchmark matrix (8k/1k, non-MTP) with sweeps over pure TP8, DEP8 (MoRI KV + MoE a2a), and dp-attention + TP-MoE, plus a new workflow runnerdsv4_fp4_mi355x_sglang-disagg.shand a perf-changelog entry.The multi-node harness is extended for DSv4 PD: a
DeepSeek-V4-Problock inmodels.yaml(dsv4 attention backend, mori disagg, prefilldisable_cuda_graph) and matching MoRI/kernel env overrides inenv.sh; the bench client uses--dsv4framing instead of chat templates.server_sglang.sh/models.yamlrefactor MoE CLI soep_flags(mori a2a, deepep, fake dispatch) apply only when EP is on—ep=1stays TP-MoE even with dp-attention—and prefill can honor per-modeldisable_cuda_graph,context_length, and optionalMORI_NUM_MAX_DISPATCH_TOKENS_PER_RANK_*overrides.submit.shthreadsDRY_RUNfor previewing composed launch commands on a real allocation.Reviewed by Cursor Bugbot for commit f56f8de. Bugbot is set up for automated code reviews on this repo. Configure here.