[CUDA] Faster compilation and batch support in QMV by zcbenz · Pull Request #3213 · ml-explore/mlx

zcbenz · 2026-03-06T08:13:03Z

Refs #2536.

Add support for batched QMV (assuming the matrices are contiguous).

Speedup compilation by removing impossible combinations (for example mxfp4 with bias and group size other than 16).

angeloskath

👍

[CUDA] Faster compilation and batch support in QMV

2fca716

angeloskath approved these changes Mar 10, 2026

View reviewed changes

zcbenz merged commit 5a347b2 into ml-explore:main Mar 10, 2026
43 of 48 checks passed

zcbenz deleted the qmv-faster-compile branch March 10, 2026 04:45

Provide feedback