Documentation for linalg.softmax lowering in lighthouse + IREE attention lowering walkthrough #2
Open
charithaintc wants to merge 42 commits intomainfrom
Open
Documentation for linalg.softmax lowering in lighthouse + IREE attention lowering walkthrough #2charithaintc wants to merge 42 commits intomainfrom
linalg.softmax lowering in lighthouse + IREE attention lowering walkthrough #2charithaintc wants to merge 42 commits intomainfrom
Conversation
linalg.softmax lowering in lighthouse.
linalg.softmax lowering in lighthouse.linalg.softmax lowering in lighthouse + IREE attention lowering walkthrough
Jianhui-Li
reviewed
Apr 9, 2026
| - **Vectorization** via IREE's vector distribution pipeline | ||
| - **Mapping to MMA intrinsics** (e.g., MFMA on MI300X) for the two matmuls (Steps 1 and 5) | ||
| - **Register-level tiling** and shared memory promotion for GPU targets | ||
| - The `scf.for` loop around these ops implements the streaming/online iteration over K/V chunks |
| | 3a | `P = exp(S - new_max)` | Elementwise | `[16, 16]` | | ||
| | 3b | `alpha = exp(old_max - new_max)` | Elementwise | `[16]` | | ||
| | 4 | `new_sum = alpha * old_sum + Σ P` | Scale + row reduction | `[16]` | | ||
| | 5 | `new_acc = alpha * old_acc + P @ V` | Scale + matmul | `[16, 64]` ← `[16, 16] × [16, 64]` | |
There was a problem hiding this comment.
[MLIR] Fusible Softmax with Following Matrix Multiplication · Issue #1617 · intel-innersource/frame…
describes a high level idea that try to decompse softmax to the step 2/3a/3b/4/5' (with V replaced as I, so using P@I instead of P@V), which allows P@V being fused. Since the last step has same loop structure, the second GEMM loop would be able to be fused into the softmax. But not sure how the linalg tile/fusion can be enhanced to support this fusion.
docs/softmax_lowering.md
Outdated
| **Notes** | ||
| - Sets the layout for anchor xegpu ops. Each Wg consistes of [8, 1] subgroups | ||
| doing 8x64 softmax slice. | ||
| - Only sets the layotu for `store_nd`. Layout propagation does the rest. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.