Add mmq device table for RDNA3.5 by Annieren · Pull Request #25 · ROCm/llama.cpp

Annieren · 2026-06-17T06:20:06Z

Overview

Add mmq device table for RDNA3.5.

get_mmq_y_host — returns 64 for RDNA3.5 (host).
get_mmq_y_device — returns 64 under #if defined(RDNA3_5) (device).
mmq_get_nwarps_host — returns 4 for RDNA3.5 (host).
mmq_get_nwarps_device — returns 4 under #if defined(RDNA3_5) (device).

Additional information

27 models including both dense and moe (gemma4_26b_a4b, qwen35_35b_a3b, qwen3_30b_a3b), from small (qwen25_05b, smollm2_17b, gemma2_2b) to large (qwen3_17b) models, all Q4_K_M were tests on gfx1151. Prefill at n=128 has the most performance boost, many models see +14% to +18% improvement. At longer sequences (512–4096), most models see a consistent +2% to +8% prefill improvement. It changes the mmq nwarps and mmq_y_max which do not impact mmvq's performance. Decode is essentially neutral, no regression.

PPL check has been performed, All 27 models are bit-identical.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: I used AI to do investigation.

jimw567 · 2026-06-17T16:41:29Z

Code Review

Summary of Changes

Adds RDNA3.5 (gfx115x) specific tuning to the CUDA/HIP MMQ (quantized matmul) path in ggml/src/ggml-cuda/mmq.cuh:

get_mmq_y_host / get_mmq_y_device: tile height mmq_y set to 64 for RDNA3.5 (was 128 via the generic AMD/else paths).
mmq_get_nwarps_host / mmq_get_nwarps_device: warps-per-block set to 4 for RDNA3.5 (was 8 — host: 256/warp_size with warp_size 32; device: the AMD_WMMA_AVAILABLE branch).

Checklist: summary present ✅, unit tests ⏭️ (perf-tuning constants only, no testable logic — validation is the benchmark + PPL data in the description).

Potential Issues

Host/device agreement (the thing that matters here): MMQ requires the host (grid/shared-mem sizing) and device (kernel) to compute identical mmq_y and nwarps, or launches break. Both pairs match for RDNA3.5 — 64/64 and 4/4 — so this is correct. ✅
Branch ordering: the new #if defined(RDNA3_5) is placed before the RDNA1 and AMD_MFMA/WMMA branches, and the host GGML_CUDA_CC_IS_RDNA3_5(cc) guard sits before the generic AMD ternary. Correct specific-before-general ordering. ✅
Macro assumptions: correctness depends on RDNA3_5 (device) being defined for gfx115x builds and GGML_CUDA_CC_IS_RDNA3_5(cc) (host) existing. Both are pre-existing in the tree, so this is consistent with how RDNA1 is already handled — just flagging it as the load-bearing assumption.
No regression risk to the mmvq path: these constants only feed MMQ, consistent with the description's "neutral decode" claim.

Suggestions for Improvement

Consider a one-line comment on the 64 / 4 constants noting they're empirically tuned for RDNA3.5 (prefill +14–18% at n=128), so a future reader doesn't "simplify" them back into the generic AMD path.

Overall: small, low-risk, well-validated tuning change — 27 models benchmarked on gfx1151 with bit-identical PPL and no decode regression. LGTM.

jimw567

looks good

Add mmq device table for RDNA3.5

e109b83

Annieren requested a review from jimw567 June 17, 2026 06:21

jimw567 approved these changes Jun 17, 2026

View reviewed changes

Merge branch 'gfx11' into annier.mmq-device-table

a18e333

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add mmq device table for RDNA3.5#25

Add mmq device table for RDNA3.5#25
Annieren wants to merge 2 commits into
gfx11from
annier.mmq-device-table

Annieren commented Jun 17, 2026 •

edited by jimw567

Loading

Uh oh!

jimw567 commented Jun 17, 2026 •

edited

Loading

Uh oh!

jimw567 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Annieren commented Jun 17, 2026 • edited by jimw567 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Additional information

Requirements

Uh oh!

jimw567 commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Summary of Changes

Potential Issues

Suggestions for Improvement

Uh oh!

jimw567 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Annieren commented Jun 17, 2026 •

edited by jimw567

Loading

jimw567 commented Jun 17, 2026 •

edited

Loading