Skip to content

[CUDA] Faster compilation and batch support in QMV#3213

Merged
zcbenz merged 1 commit intoml-explore:mainfrom
zcbenz:qmv-faster-compile
Mar 10, 2026
Merged

[CUDA] Faster compilation and batch support in QMV#3213
zcbenz merged 1 commit intoml-explore:mainfrom
zcbenz:qmv-faster-compile

Conversation

@zcbenz
Copy link
Collaborator

@zcbenz zcbenz commented Mar 6, 2026

Refs #2536.

Add support for batched QMV (assuming the matrices are contiguous).

Speedup compilation by removing impossible combinations (for example mxfp4 with bias and group size other than 16).

Copy link
Member

@angeloskath angeloskath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@zcbenz zcbenz merged commit 5a347b2 into ml-explore:main Mar 10, 2026
43 of 48 checks passed
@zcbenz zcbenz deleted the qmv-faster-compile branch March 10, 2026 04:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants