[DNM][AMD] agentx-v0.4 rebased from commit chore/agentx-v0.4 commit 823189e by seungrokj · Pull Request #1709 · SemiAnalysisAI/InferenceX

seungrokj · 2026-06-11T06:08:14Z

Summary

Add qwen3.5-fp4-mi355x-sglang-agentic-hicache config: SGLang agentic-coding sweep with and without hicache offloading (TP2, EP1)
Add minimaxm2.5-fp4-mi355x-vllm-agentic-lmcache config: vLLM agentic-coding sweep with lmcache
Add new agentic benchmark scripts: minimaxm2.5_fp4_mi355x.sh, qwen3.5_fp4_mi355x.sh
Update existing agentic scripts: glm5.1_fp4_mi355x.sh, kimik2.5_fp4_mi355x.sh, minimaxm2.5_fp8_mi355x.sh, qwen3.5_fp8_mi355x.sh
Update launch_mi355x-amds.sh

Test plan

Verify hicache/lmcache agentic configs run correctly on MI355X
Confirm new agentic scripts launch without errors

🤖 Generated with Claude Code

Note

Medium Risk
Changes are confined to benchmark CI and launch scripts but touch multi-node SLURM, container image pins, and KV offload paths where misconfiguration could waste cluster time or skew agentic results; no application auth or user-data handling.

Overview
AMD master sweep config (amd-master.yaml) bumps SGLang/vLLM/Atom images and retunes several fixed-seq search spaces (e.g. Qwen3.5 FP8 TP4 sweeps, FP4 MTP conc caps). Agentic workloads are split into -agentic sibling entries (often on older or nightly images with cpu/none offload grids) so existing main recipes stay aligned with origin/main. New or expanded coverage includes DSv4 (single-node image bump, new dsv4-fp4-mi355x-sglang-disagg 8k/1k PD topologies, vLLM agentic grids), HiCache agentic configs (Qwen3.5, DSv4 single- and disagg-node), and MiniMax-M3 vLLM agentic; Kimi/MiniMax int4–fp8 vLLM images move to v0.21.0 or ROCm nightlies for offload-capable agentic runs.

CI passes offloading from the matrix into multi-node agentic sweeps (run-sweep.yml). Default agentic trace loaders switch to semianalysis_cc_traces_weka_061526 (+ 256k variant) in benchmark_lib.sh.

Multi-node SGLang disagg (amd_utils) gains end-to-end agentic support: trace_replay.sh, IS_AGENTIC routing, OFFLOADING=hicache with Mooncake/L2 flags via a bind-mounted hicache_mc.env, EP vs DP flag decoupling in models.yaml, a DeepSeek-V4-Pro PD recipe block, DSv4 bench --dsv4 framing, router circuit-breaker disables for long prefills, and startup patches (prefill bootstrap desync, optional host KV assert). New entry scripts dsr1_fp4 / dsv4_fp4 disagg agentic wire YAML to SLURM submit.

Single-node agentic scripts are updated or added for DSv4 (SGLang/Atom), GLM5.1 (HiCache on MI355X, new MI300X FP8), and Kimi FP4 (simplified LMCache install path, trace override, vLLM FP8 KV).

^{Reviewed by Cursor Bugbot for commit 4f8123d. Bugbot is set up for automated code reviews on this repo. Configure here.}

…r mi355x models Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-06-11T06:08:23Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-06-11T06:08:23Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

cursor · 2026-06-11T06:09:45Z

+    $ASYNC_SCHEDULING_ARGS 
+    "${PREFIX_CACHE_ARGS[@]}"
+    "${OFFLOAD_ARGS[@]}"
+)


vLLM uses wrong model

High Severity

The vLLM command serves "$MODEL" and omits --served-model-name, while the script downloads weights into MODEL_PATH and build_replay_cmd sends --model $MODEL to aiperf. That breaks the usual MODEL_PATH + served-name pairing used by sibling agentic scripts and can fail when MODEL is a Hub id but weights live under MODEL_PATH.

^{Reviewed by Cursor Bugbot for commit 01cc2af. Configure here.}

cursor · 2026-06-11T06:09:45Z

    --mem-fraction-static 0.8 \
-    --context-length $MAX_MODEL_LEN \
+    "${CACHE_ARGS[@]}" \
+    "${WARMUP_ARGS[@]}" \


SGLang ignores MODEL_PATH

Medium Severity

SGLang is started with --model-path $MODEL and no --served-model-name, after the script may download into MODEL_PATH. Matrix jobs that set a local MODEL_PATH can still point the server at the Hub id, and the OpenAI model name may not match MODEL used by aiperf.

Additional Locations (1)

benchmarks/single_node/agentic/qwen3.5_fp4_mi355x.sh#L123-L141

^{Reviewed by Cursor Bugbot for commit 01cc2af. Configure here.}

cursor · 2026-06-11T06:09:45Z

+        cd LMCache
+        pip install -r requirements/build.txt 
+        CXX=hipcc BUILD_WITH_HIP=1 pip install -e .   --no-build-isolation
+        cd ..


LMCache clone not idempotent

Medium Severity

The lmcache path runs git clone https://github.com/LMCache/LMCache.git unconditionally. With set -e, a second run in the same working directory exits when LMCache already exists, so lmcache agentic jobs fail on retry or reuse of the job cwd.

Additional Locations (1)

benchmarks/single_node/agentic/minimaxm2.5_fp4_mi355x.sh#L149-L154

^{Reviewed by Cursor Bugbot for commit 01cc2af. Configure here.}

Signed-off-by: ajith-sirra-amd <ajith.sirra@amd.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…onfig Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor · 2026-06-12T06:15:23Z

+
+python3 -m sglang.launch_server \
+    --attention-backend aiter \
+    --model-path $MODEL \


Server ignores MODEL_PATH

Medium Severity

Weights are downloaded into MODEL_PATH when the workflow sets that directory, but SGLang is started with --model-path $MODEL (Hub id) instead of MODEL_PATH. The server may load a different cache path than the one prepared for the job.

^{Reviewed by Cursor Bugbot for commit 32f5007. Configure here.}

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor · 2026-06-12T08:06:30Z

+
+# ---- Resolve traces and install deps ----------------------------------------
+# https://huggingface.co/datasets/semianalysisai/cc-traces-weka-with-subagents-060826
+export WEKA_LOADER_OVERRIDE=semianalysis_cc_traces_weka_with_subagents_060826


DSv4 atom uncapped traces

Medium Severity

This new DSv4 ATOM agentic script sets WEKA_LOADER_OVERRIDE to the uncapped 060826 trace set, while peer MI355X agentic scripts in the same PR use 060226_256k to avoid ~1M-token traces that are rejected and skew sweeps.

^{Reviewed by Cursor Bugbot for commit 351e729. Configure here.}

Signed-off-by: ajith-sirra-amd <ajith.sirra@amd.com>

…nalysisAI/InferenceX into amd/agentx-v0.4_rebase0611

cursor · 2026-06-12T08:45:54Z

+    $ASYNC_SCHEDULING_ARGS 
+    "${PREFIX_CACHE_ARGS[@]}"
+    "${OFFLOAD_ARGS[@]}"
+)


MiniMax FP8 launcher regressed

High Severity

The MI355X MiniMax FP8 agentic launcher was replaced with a Kimi-style vLLM recipe. Existing minimaxm2.5-fp8-mi355x-vllm-agentic jobs (TP4/EP4, offloading=cpu) lose the prior --max-model-len, ROCM_AITER_UNIFIED_ATTN backend, MODEL_PATH-based serve, and SimpleCPU offload wiring they depended on.

^{Reviewed by Cursor Bugbot for commit faba18f. Configure here.}

Signed-off-by: ajith-sirra-amd <ajith.sirra@amd.com>

… config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…cripts and master yaml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor · 2026-06-15T00:22:06Z

-    --cuda-graph-max-bs "$PER_ENGINE_MAX_RUNNING" \
+    --disable-radix-cache \
+    --attention-backend dsv4 \
+    --max-running-requests ${CONC} \


DP max-running requests wrong

Medium Severity

When DP_ATTENTION=true, the script computes PER_ENGINE_MAX_RUNNING as CONC/TP for per-engine limits, but the server is started with --max-running-requests ${CONC}. Each DP engine may accept too many sequences versus the harness load-balancing assumption.

^{Reviewed by Cursor Bugbot for commit 76d90e0. Configure here.}

…ript Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor · 2026-06-15T02:56:39Z

-python3 -m sglang.launch_server \
-    --model-path "$MODEL_PATH" --served-model-name "$MODEL" \
+sglang serve \
+    --model-path $MODEL \


Wrong model path for serve

Medium Severity

The script downloads weights into MODEL_PATH when set, but sglang serve uses --model-path $MODEL (Hub id) instead of "$MODEL_PATH". Runs that pre-stage a local directory can ignore the prepared path and rely on a different cache location.

^{Reviewed by Cursor Bugbot for commit 4ebc4e2. Configure here.}

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…c script Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor · 2026-06-16T06:58:57Z

    decode_dp_ranks=$DECODE_TP_SIZE
    MORI_MAX_DISPATCH_TOKENS_DECODE=$((BENCH_MAX_CONC_VALUE / decode_dp_ranks))
-    MORI_MOE_MAX_INPUT_TOKENS_DECODE=$((MORI_MAX_DISPATCH_TOKENS_DECODE * decode_dp_ranks * 7 / 10))
+    # MORI_MOE_MAX_INPUT_TOKENS_DECODE=$((MORI_MAX_DISPATCH_TOKENS_DECODE * decode_dp_ranks * 7 / 10))


Disagg MoE token overrides removed

Medium Severity

This change comments out assignments that set MORI_MOE_MAX_INPUT_TOKENS_PREFILL and MORI_MOE_MAX_INPUT_TOKENS_DECODE for DP+EP and MTP decode paths, while launch commands still conditionally export those variables. Disagg sweeps that relied on the computed caps may run with unset MoE input limits.

^{Reviewed by Cursor Bugbot for commit c21ad06. Configure here.}

cursor · 2026-06-16T13:58:08Z

    fi
    set +x
-    PREFILL_CMD="SGLANG_MORI_COMBINE_DTYPE=${MORI_COMBINE_DTYPE_PREFILL} ${PREFILL_SDMA_ENV} ${PREFILL_MORI_MOE_ENV} SGLANG_MORI_NUM_MAX_DISPATCH_TOKENS_PER_RANK=${MORI_MAX_DISPATCH_TOKENS_PREFILL} python3 -m sglang.launch_server \
+    PREFILL_CMD="SGLANG_MORI_COMBINE_DTYPE=${MORI_COMBINE_DTYPE_PREFILL} ${PREFILL_SDMA_ENV} ${PREFILL_MORI_MOE_ENV} SGLANG_MORI_NUM_MAX_DISPATCH_TOKENS_PER_RANK=${MORI_NUM_MAX_DISPATCH_TOKENS_PER_RANK_PREFILL:-${MORI_MAX_DISPATCH_TOKENS_PREFILL}} MORI_IO_SQ_BACKOFF_TIMEOUT_US=${MORI_IO_SQ_BACKOFF_TIMEOUT_US} MORI_IO_QP_MAX_SEND_WR=${MORI_IO_QP_MAX_SEND_WR} ${LAUNCH_PREFIX:-} python3 -m sglang.launch_server \


Server ignores resolved MODEL_PATH

Medium Severity

job.slurm now resolves and exports a canonical MODEL_PATH (caller path, hf_dir, or MODEL_DIR/MODEL_NAME), but server_sglang.sh still launches with --model-path $MODEL_DIR/$MODEL_NAME. When the resolved path differs from that join, prefill/decode can fail to load weights or load from the wrong directory.

Additional Locations (1)

benchmarks/multi_node/amd_utils/job.slurm#L209-L242

^{Reviewed by Cursor Bugbot for commit c7f269e. Configure here.}

Signed-off-by: Theresa Shan <theresa.shan@amd.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor · 2026-06-17T00:23:37Z

    fi
    set +x
-    DECODE_CMD="SGLANG_MORI_COMBINE_DTYPE=${MORI_COMBINE_DTYPE_DECODE} ${DECODE_MORI_MOE_ENV} SGLANG_MORI_NUM_MAX_DISPATCH_TOKENS_PER_RANK=${MORI_MAX_DISPATCH_TOKENS_DECODE} python3 -m sglang.launch_server \
+    DECODE_CMD="SGLANG_MORI_COMBINE_DTYPE=${MORI_COMBINE_DTYPE_DECODE} ${DECODE_MORI_MOE_ENV} SGLANG_MORI_NUM_MAX_DISPATCH_TOKENS_PER_RANK=${MORI_NUM_MAX_DISPATCH_TOKENS_PER_RANK_DECODE:-${MORI_MAX_DISPATCH_TOKENS_DECODE}} MORI_IO_SQ_BACKOFF_TIMEOUT_US=${MORI_IO_SQ_BACKOFF_TIMEOUT_US} MORI_IO_QP_MAX_SEND_WR=${MORI_IO_QP_MAX_SEND_WR} ${LAUNCH_PREFIX:-} python3 -m sglang.launch_server \


Custom all-reduce flag unused

Medium Severity

DISABLE_CUSTOM_ALL_REDUCE is threaded into the container from job.slurm, and the DSR1 disagg agentic recipe defaults it to 1 for an Aiter fault workaround, but prefill/decode launch commands never append --disable-custom-all-reduce.

^{Reviewed by Cursor Bugbot for commit b5626fb. Configure here.}

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Signed-off-by: Theresa Shan <theresa.shan@amd.com>

cursor · 2026-06-17T03:29:42Z

-  multinode: false
+  framework: sglang-disagg
+  multinode: true
+  disagg: true


Multinode agentic scripts not selected

High Severity

New disaggregated agentic YAML entries and benchmarks/multi_node/agentic/* wrappers are added, but launch_mi355x-amds.sh still invokes benchmarks/multi_node/${SCRIPT_NAME} and never agentic/ or IS_AGENTIC. Those jobs run the fixed-seq disagg script without trace replay, HiCache env, or DURATION/OFFLOADING wiring.

Additional Locations (1)

benchmarks/multi_node/agentic/dsr1_fp4_mi355x_sglang-disagg.sh#L1-L34

^{Reviewed by Cursor Bugbot for commit f10f456. Configure here.}

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Signed-off-by: Theresa Shan <theresa.shan@amd.com>

cursor · 2026-06-19T04:20:53Z

        # LMCache backend.
-        TOTAL_CPU_DRAM_GB=2500
+        #TODO: fix
+        TOTAL_CPU_DRAM_GB=3000


Ignores TOTAL_CPU_DRAM_GB env

Medium Severity

The scripts require TOTAL_CPU_DRAM_GB via check_env_vars, then hardcode TOTAL_CPU_DRAM_GB=3000 inside the cpu and lmcache branches. Workflow-supplied offload memory sizing is discarded and partition math uses the fixed constant instead.

Additional Locations (1)

benchmarks/single_node/agentic/dsv4_fp4_mi355x_atom.sh#L113-L155

^{Reviewed by Cursor Bugbot for commit 73756ab. Configure here.}

…ker for agentic disagg Reconstructs three fixes for DeepSeek-V4-Pro 1M-context agentic disaggregated runs: - server_sglang.sh: patch disaggregation/prefill.py resolve_waiting_queue_bootstrap() to narrow the all_reduce candidate set to pending_bootstrap requests, eliminating the TP-rank collective desync deadlock. - models.yaml: cap chunked_prefill_size at 65280 (dsv4 compressor kernel uint16 token limit, 255*256) to avoid the c_plan.cuh:508 runtime crash at 131072; set context_length/max_total_tokens to 1048576; drop verbose --decode-log-interval 1 / --log-level info so default logging is used. - server_sglang.sh: add router resilience flags (disable circuit breaker, relax health checks) so a busy single prefill worker is not ejected, which previously made the aiperf profiling burst fail 100% with "No available prefill workers". Overridable via ROUTER_RESILIENCE_FLAGS. Co-authored-by: Cursor <cursoragent@cursor.com>

…HiCache Relocate the disagg prefill bootstrap-desync fix and the memory_pool_host assert softening from inline blocks in server_sglang.sh into dedicated, idempotent patch functions in setup_deps.sh. Also set MORI_IO_QP_MAX_SGE=1 and default HICACHE_TIER=L2. Co-authored-by: Cursor <cursoragent@cursor.com>

…061526 Register the new semianalysis_cc_traces_weka_061326{,_256k} and 061526{,_256k} loader keys in resolve_trace_source, and point the dsv4 default + trace_replay override at the 061526 corpus. Co-authored-by: Cursor <cursoragent@cursor.com>

cursor · 2026-06-23T03:34:25Z

+mkdir -p "$RESULT_DIR"
+
+export WEKA_LOADER_OVERRIDE=semianalysis_cc_traces_weka_061526
+resolve_trace_source


Disagg trace corpus not 256k capped

Medium Severity

Multinode agentic replay hardcodes WEKA_LOADER_OVERRIDE=semianalysis_cc_traces_weka_061526 (full corpus). Single-node agentic scripts use _256k variants for non-DSv4 models, while benchmark_lib.sh now defaults non-dsv4 recipes to 061526_256k. Disagg agentic jobs can load traces that exceed max_model_len and fail or skew results.

^{Reviewed by Cursor Bugbot for commit 219bcc9. Configure here.}

Signed-off-by: Theresa Shan <theresa.shan@amd.com>

cursor · 2026-06-23T03:48:33Z

+
+if [ "$ANY_FAILED" -ne 0 ]; then
+    echo "WARNING: at least one conc had a non-zero exit; per-conc result files were still written when possible." >&2
+fi


Replay failures do not fail job

Medium Severity

When any concurrency sweep in trace_replay.sh fails, the script only prints a warning and exits 0, so SLURM and CI can treat a broken agentic disagg run as success without a valid aggregate result.

^{Reviewed by Cursor Bugbot for commit 7c2bcbc. Configure here.}

cursor · 2026-06-23T03:48:33Z

+        --gpu-memory-utilization 0.85 \
+    "${PREFIX_CACHE_ARGS[@]}"
+    "${OFFLOAD_ARGS[@]}"
+)


DSv4 atom script is Kimi

High Severity

The added dsv4_fp4_mi355x_atom.sh is a Kimi-K2.5 ATOM/vLLM offload recipe (comments, LMCache clone, --kv_offloading_backend, atom.entrypoints.openai_server) and does not implement DeepSeek-V4-Pro ATOM serving despite the filename and PR intent.

^{Reviewed by Cursor Bugbot for commit 7c2bcbc. Configure here.}

cursor · 2026-06-23T03:48:33Z

+    -e DISABLE_CUSTOM_ALL_REDUCE=\${DISABLE_CUSTOM_ALL_REDUCE:-0}
+    -e MAX_MODEL_LEN=\${MAX_MODEL_LEN:-}
+    -e DURATION=\${DURATION:-1800}
+    -e IS_AGENTIC=\${IS_AGENTIC:-0}


Agentic zero max model len

High Severity

job.slurm now forwards MAX_MODEL_LEN from the workflow into containers while multinode agentic sweeps pass max-model-len: '0', and the SLURM entry script still used for disagg does not apply the 163840/1M defaults that the new multi_node/agentic/* recipes define.

^{Reviewed by Cursor Bugbot for commit 7c2bcbc. Configure here.}

…ache Co-authored-by: Cursor <cursoragent@cursor.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 22 total unresolved issues (including 21 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 4f8123d. Configure here.}

cursor · 2026-06-23T04:06:35Z

+# HiCache-offload configuration is ported from local_test_dsr1_agentic_offload.sh
+# and is fully env-overridable so a YAML config can tune it.
+
+source "$(dirname "$0")/../benchmark_lib.sh"


Wrong benchmark_lib source path

High Severity

These recipes live under benchmarks/multi_node/agentic/ but source ../benchmark_lib.sh, which resolves to benchmarks/multi_node/benchmark_lib.sh. The library actually lives at benchmarks/benchmark_lib.sh, so the scripts fail on the first source line.

Additional Locations (1)

benchmarks/multi_node/agentic/dsv4_fp4_mi355x_sglang-disagg.sh#L10-L11

^{Reviewed by Cursor Bugbot for commit 4f8123d. Configure here.}

[AMD] agentic: add hicache/lmcache configs, update agentic scripts fo…

01cc2af

…r mi355x models Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

seungrokj requested review from 1am9trash, billishyahao, chunfangamd and yctseng0211 as code owners June 11, 2026 06:08

github-project-automation Bot added this to InferenceMAX Board Jun 11, 2026

seungrokj mentioned this pull request Jun 11, 2026

[DNM][AMD] agentx-v0.4 #1654

Closed

cursor Bot reviewed Jun 11, 2026

View reviewed changes

seungrokj changed the title ~~[AMD] agentic: add hicache/lmcache configs, update agentic scripts for mi355x models~~ [DNM][AMD] agentx-v0.4 rebased from commit chore/agentx-v0.4 commit 7f61 Jun 11, 2026

ajith-sirra-amd and others added 3 commits June 11, 2026 12:54

Add GLM5.1 & Qwen3.5 MI300 Agentic Scripts

ba1bb37

Signed-off-by: ajith-sirra-amd <ajith.sirra@amd.com>

[AMD] add DSV4-FP4-MI355x atom agentic benchmark and master yaml config

eba4233

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

[AMD] update DSV4-FP4-MI355x atom agentic benchmark and master yaml c…

32f5007

…onfig Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor Bot reviewed Jun 12, 2026

View reviewed changes

[AMD] dsv4_fp4_mi355x_atom.sh: update agentic benchmark script

351e729

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor Bot reviewed Jun 12, 2026

View reviewed changes

ajith-sirra-amd added 2 commits June 12, 2026 14:12

Add DSV4 MI355X Agentic Scripts

64ce90c

Signed-off-by: ajith-sirra-amd <ajith.sirra@amd.com>

Merge branch 'amd/agentx-v0.4_rebase0611' of https://github.com/SemiA…

faba18f

…nalysisAI/InferenceX into amd/agentx-v0.4_rebase0611

cursor Bot reviewed Jun 12, 2026

View reviewed changes

Add DSV4 MI355X Agentic Scripts

8ca4bc1

Signed-off-by: ajith-sirra-amd <ajith.sirra@amd.com>

cursor Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread benchmarks/single_node/agentic/kimik2.5_fp4_mi355x.sh

[AMD] update DSV4-FP4-MI355X SGLang agentic benchmark and master yaml…

37f57a7

… config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor Bot reviewed Jun 14, 2026

View reviewed changes

Comment thread benchmarks/single_node/fixed_seq_len/dsv4_fp4_mi355x_sglang.sh Outdated

Comment thread benchmarks/single_node/fixed_seq_len/dsv4_fp4_mi355x_sglang.sh Outdated

[AMD] update DSV4-FP4-MI355X SGLang agentic/fixed-seq-len benchmark s…

76d90e0

…cripts and master yaml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor Bot reviewed Jun 15, 2026

View reviewed changes

[AMD] remove unused CACHE_ARGS from dsv4_fp4_mi355x_sglang agentic sc…

4ebc4e2

…ript Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor Bot reviewed Jun 15, 2026

View reviewed changes

seungrokj and others added 2 commits June 15, 2026 11:56

[AMD] tune hicache ratio and disable none-offloading in agentic config

735e9a3

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

[AMD] remove --disable-radix-cache from dsv4_fp4_mi355x_sglang agenti…

d3caa2b

…c script Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor Bot reviewed Jun 16, 2026

View reviewed changes

update dsv4 recipe

e37fbc2

Signed-off-by: Theresa Shan <theresa.shan@amd.com>

ichbinblau force-pushed the amd/agentx-v0.4_rebase0611 branch from c7f269e to e37fbc2 Compare June 16, 2026 14:35

seungrokj and others added 2 commits June 17, 2026 09:14

update dsv4 agentic config and benchmark script

72bff2c

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

bump sglang image for dsv4-fp4-mi355x-agentic-hicache

b5626fb

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor Bot reviewed Jun 17, 2026

View reviewed changes

seungrokj and others added 3 commits June 17, 2026 10:05

revert sglang image, comment out blk size 1 variant

085049a

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

switch to sglang v0.5.13 image with page-size 1

b79b098

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test high con only in SA

f10f456

Signed-off-by: Theresa Shan <theresa.shan@amd.com>

cursor Bot reviewed Jun 17, 2026

View reviewed changes

seungrokj and others added 2 commits June 17, 2026 13:03

revert to v0.5.12 image with page-size 256

34dba4d

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

update dsr1 agentic con=1

e753830

Signed-off-by: Theresa Shan <theresa.shan@amd.com>

cursor Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread benchmarks/multi_node/agentic/dsr1_fp4_mi355x_sglang-disagg.sh

ichbinblau added 4 commits June 17, 2026 12:06

add hicache L3 support

de52bf1

Signed-off-by: Theresa Shan <theresa.shan@amd.com>

dump commands in log files

c348dbb

Signed-off-by: Theresa Shan <theresa.shan@amd.com>

bump to latest image and fix

e36bf75

Signed-off-by: Theresa Shan <theresa.shan@amd.com>

fix

f10841f

Signed-off-by: Theresa Shan <theresa.shan@amd.com>

cursor Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread benchmarks/single_node/agentic/glm5.1_fp4_mi355x.sh

seungrokj changed the title ~~[DNM][AMD] agentx-v0.4 rebased from commit chore/agentx-v0.4 commit 7f61~~ [DNM][AMD] agentx-v0.4 rebased from commit chore/agentx-v0.4 commit 823189e Jun 19, 2026

Adding MINIMAX-M3 MI355X MXFP8 VLLM Agentic support

73756ab

cursor Bot reviewed Jun 19, 2026

View reviewed changes

ichbinblau and others added 3 commits June 22, 2026 14:44

cursor Bot reviewed Jun 23, 2026

View reviewed changes

add more con list to dsv4 disagg agentic

7c2bcbc

Signed-off-by: Theresa Shan <theresa.shan@amd.com>

cursor Bot reviewed Jun 23, 2026

View reviewed changes

dsv4: bump sglang-rocm image to mi35x-20260618 for disagg-agentic-hic…

4f8123d

…ache Co-authored-by: Cursor <cursoragent@cursor.com>

cursor Bot reviewed Jun 23, 2026

View reviewed changes

Conversation

seungrokj commented Jun 11, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

Uh oh!

cursor Bot Jun 11, 2026

Choose a reason for hiding this comment

vLLM uses wrong model

Uh oh!

cursor Bot Jun 11, 2026

Choose a reason for hiding this comment

SGLang ignores MODEL_PATH

Uh oh!

cursor Bot Jun 11, 2026

Choose a reason for hiding this comment

LMCache clone not idempotent

Uh oh!

cursor Bot Jun 12, 2026

Choose a reason for hiding this comment

Server ignores MODEL_PATH

Uh oh!

Uh oh!

Uh oh!

cursor Bot Jun 12, 2026

Choose a reason for hiding this comment

DSv4 atom uncapped traces

Uh oh!

cursor Bot Jun 12, 2026

Choose a reason for hiding this comment

MiniMax FP8 launcher regressed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot Jun 15, 2026

Choose a reason for hiding this comment

DP max-running requests wrong

Uh oh!

Uh oh!

cursor Bot Jun 15, 2026

Choose a reason for hiding this comment

Wrong model path for serve

Uh oh!

cursor Bot Jun 16, 2026

Choose a reason for hiding this comment

Disagg MoE token overrides removed

Uh oh!

Uh oh!

cursor Bot Jun 16, 2026

Choose a reason for hiding this comment

Server ignores resolved MODEL_PATH

Uh oh!

cursor Bot Jun 17, 2026

Choose a reason for hiding this comment

Custom all-reduce flag unused

Uh oh!

Uh oh!

cursor Bot Jun 17, 2026

Choose a reason for hiding this comment

Multinode agentic scripts not selected

Uh oh!

Uh oh!

Uh oh!

cursor Bot Jun 19, 2026

Choose a reason for hiding this comment

Ignores TOTAL_CPU_DRAM_GB env

Uh oh!

cursor Bot Jun 23, 2026

Choose a reason for hiding this comment

Disagg trace corpus not 256k capped

Uh oh!

cursor Bot Jun 23, 2026

seungrokj commented Jun 11, 2026 •

edited by cursor Bot

Loading