-
Notifications
You must be signed in to change notification settings - Fork 206
[AMD] Add MiniMax-M3-FP4 MI355X ATOM EAGLE3 only #1866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
seungrokj
wants to merge
21
commits into
main
Choose a base branch
from
amd/m3_atom_mtp
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+115
−0
Open
Changes from all commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
c3221b2
feat: add minimaxm3-fp4-mi355x-atom-mtp benchmark script and CI config
seungrokj d6426c4
chore: add perf-changelog entry for minimaxm3-fp4-mi355x-atom-mtp
seungrokj a1b42c4
fix: update minimaxm3-fp4-mi355x-atom-mtp image to MiniMax-M3-20260619
seungrokj ae97ec6
fix: add spec-decoding: mtp to minimaxm3-fp4-mi355x-atom-mtp search s…
seungrokj 4fcaa1e
fix: remove duplicate -tp and --max-model-len from minimaxm3_fp4_mi35…
seungrokj d806d96
fix: trim minimaxm3-fp4-mi355x-atom-mtp ISL=8192 search space; remove…
seungrokj ae68465
Merge branch 'main' into amd/m3_atom_mtp
seungrokj 9f780ff
Merge branch 'main' into amd/m3_atom_mtp
seungrokj db0bc1f
Merge branch 'main' into amd/m3_atom_mtp
seungrokj 86308ff
fix: bump minimaxm3-fp4-mi355x-atom-mtp image to MiniMax-M3-20260622
seungrokj 9284de7
Merge branch 'main' into amd/m3_atom_mtp
seungrokj 5317967
Merge branch 'amd/m3_atom_mtp' of https://github.com/SemiAnalysisAI/I…
seungrokj 4458bbe
fix: disable prefix caching for minimaxm3-fp4-mi355x-atom-mtp
seungrokj bb638f9
fix: update minimaxm3-fp4-mi355x-atom scripts and image bump
seungrokj 30a04d1
Merge branch 'main' into amd/m3_atom_mtp
seungrokj a592385
fix: add minimaxm3-fp4-mi355x-atom to perf-changelog entry
seungrokj 182decd
fix: revert minimaxm3-fp4-mi355x-atom image/search-space, delete scri…
seungrokj e69b5f2
Merge branch 'main' into amd/m3_atom_mtp
seungrokj e8b2aba
fix: restore minimaxm3_fp4_mi355x_atom.sh
seungrokj c6e8d15
fix: bump MAX_NUM_SEQS to 256 for minimaxm3-fp4-mi355x-atom-mtp
seungrokj 65a9d7d
Merge branch 'main' into amd/m3_atom_mtp
seungrokj File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
87 changes: 87 additions & 0 deletions
87
benchmarks/single_node/fixed_seq_len/minimaxm3_fp4_mi355x_atom_mtp.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,87 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
| source "$(dirname "$0")/../../benchmark_lib.sh" | ||
|
|
||
| check_env_vars \ | ||
| MODEL \ | ||
| TP \ | ||
| CONC \ | ||
| ISL \ | ||
| OSL \ | ||
| RANDOM_RANGE_RATIO \ | ||
| RESULT_FILENAME \ | ||
| EP_SIZE \ | ||
| DP_ATTENTION | ||
|
cursor[bot] marked this conversation as resolved.
|
||
|
|
||
| if [[ -n "$SLURM_JOB_ID" ]]; then | ||
| echo "JOB $SLURM_JOB_ID running on $SLURMD_NODENAME" | ||
| fi | ||
|
|
||
| echo "TP: $TP, CONC: $CONC, ISL: $ISL, OSL: $OSL, EP_SIZE: $EP_SIZE, DP_ATTENTION: $DP_ATTENTION" | ||
|
|
||
| SERVER_LOG=/workspace/server.log | ||
|
|
||
| PARALLEL_ARGS=(-tp "$TP") #TP | ||
| if [ "$DP_ATTENTION" = "true" ]; then | ||
| if [ "$EP_SIZE" -gt 1 ]; then #DP+EP | ||
| PARALLEL_ARGS=(-tp "$TP" --enable-expert-parallel --enable-dp-attention ) | ||
| else #DP+TP | ||
| PARALLEL_ARGS=(-tp "$TP" --enable-dp-attention ) | ||
| fi | ||
| fi | ||
|
|
||
| SPEC_ARGS=(--method eagle3 --draft-model Inferact/MiniMax-M3-EAGLE3 --num-speculative-tokens 3 ) | ||
|
|
||
| # Start GPU monitoring (power, temperature, clocks every second) | ||
| start_gpu_monitor | ||
| MEM_FRAC_STATIC=0.8 | ||
|
|
||
| set -x | ||
| export AITER_QUICK_REDUCE_QUANTIZATION=INT4 | ||
| export MAX_MODEL_LEN=32768 | ||
| export MAX_NUM_BATCHED_TOKENS=32768 | ||
| export MAX_NUM_SEQS=256 | ||
| # (srok), not yet | ||
| # --kv_cache_dtype fp8 \ | ||
| python3 -m atom.entrypoints.openai_server \ | ||
| --model $MODEL \ | ||
| --server-port $PORT \ | ||
| "${PARALLEL_ARGS[@]}" \ | ||
|
cursor[bot] marked this conversation as resolved.
|
||
| "${SPEC_ARGS[@]}" \ | ||
| --block-size 128 \ | ||
| --gpu-memory-utilization $MEM_FRAC_STATIC \ | ||
| --max-model-len $MAX_MODEL_LEN \ | ||
|
seungrokj marked this conversation as resolved.
|
||
| --max-num-batched-tokens $MAX_NUM_BATCHED_TOKENS \ | ||
| --max-num-seqs $MAX_NUM_SEQS \ | ||
|
seungrokj marked this conversation as resolved.
|
||
| --trust-remote-code \ | ||
| --no-enable_prefix_caching \ | ||
| > $SERVER_LOG 2>&1 & | ||
|
seungrokj marked this conversation as resolved.
seungrokj marked this conversation as resolved.
cursor[bot] marked this conversation as resolved.
|
||
|
|
||
| SERVER_PID=$! | ||
|
|
||
| # Wait for server to be ready | ||
| wait_for_server_ready --port "$PORT" --server-log "$SERVER_LOG" --server-pid "$SERVER_PID" | ||
|
|
||
| export PYTHONDONTWRITEBYTECODE=1 | ||
| run_benchmark_serving \ | ||
| --model "$MODEL" \ | ||
| --port "$PORT" \ | ||
| --backend vllm \ | ||
| --input-len "$ISL" \ | ||
| --output-len "$OSL" \ | ||
| --random-range-ratio "$RANDOM_RANGE_RATIO" \ | ||
| --num-prompts "$((CONC * 10))" \ | ||
| --max-concurrency "$CONC" \ | ||
| --result-filename "$RESULT_FILENAME" \ | ||
| --result-dir /workspace/ \ | ||
| --trust-remote-code $( [[ ${#SPEC_ARGS[@]} -gt 0 ]] && echo "--use-chat-template" ) | ||
|
|
||
| # After throughput, run evaluation only if RUN_EVAL is true | ||
| if [ "${RUN_EVAL}" = "true" ]; then | ||
| run_eval --framework lm-eval --port "$PORT" | ||
| append_lm_eval_summary | ||
| fi | ||
|
|
||
| # Stop GPU monitoring | ||
| stop_gpu_monitor | ||
| set +x | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.