Added planar types to speed up complex half precision GEMMs#1142
Added planar types to speed up complex half precision GEMMs#1142cliffburdick wants to merge 7 commits intomainfrom
Conversation
Greptile SummaryThis PR introduces The three previously-flagged blocking issues are all addressed: Confidence Score: 5/5Safe to merge; all three previously-blocking issues are resolved and only minor P2 style/performance findings remain The three P0/P1 issues flagged in earlier review rounds (EPT vectorization regression in SetOp, TotalSize correctness for non-contiguous planar views, c_adj leading-dimension mismatch) are all closed. The two remaining findings are P2: one is a performance concern (unconditional scalar EPT in ReshapeOp for non-planar types), the other is cosmetic (dead code in sparse2dense path). Neither affects correctness of the new planar GEMM feature. include/matx/operators/reshape.h (unnecessary vectorization disable for non-planar types) and include/matx/transforms/convert/sparse2dense_cusparse.h (dead-code fallback block) Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant MatMulLaunch
participant PlanarConv as Planar Conversion
participant cuBLASLt
participant CopyBack
User->>MatMulLaunch: matmul(c, a, b) with fp16/bf16 complex
MatMulLaunch->>MatMulLaunch: detect a_is_planar / b_is_planar / c_is_planar
alt A not planar
MatMulLaunch->>PlanarConv: alloc a_hp, (a_planar = planar(a)).run()
PlanarConv-->>MatMulLaunch: a_adj.Reset(a_hp)
else A already planar
MatMulLaunch->>MatMulLaunch: a_adj unchanged (uses a.Data())
end
alt B not planar
MatMulLaunch->>PlanarConv: alloc b_hp, (b_planar = planar(b)).run()
PlanarConv-->>MatMulLaunch: b_adj.Reset(b_hp)
else B already planar
MatMulLaunch->>MatMulLaunch: b_adj unchanged (uses b.Data())
end
alt C not planar
MatMulLaunch->>PlanarConv: alloc c_hp as planar buffer
PlanarConv-->>MatMulLaunch: c_adj.Reset(c_hp)
else C already planar
MatMulLaunch->>MatMulLaunch: c_adj.Reset(c.Data())
end
MatMulLaunch->>cuBLASLt: cublasLtMatmul(a_adj, b_adj, c_adj) with PLANE_OFFSET
cuBLASLt-->>MatMulLaunch: result in c_adj
alt C was not planar (c_hp allocated)
MatMulLaunch->>CopyBack: interleaved(c_hp) → c
end
MatMulLaunch-->>User: done
Greploops — Automatically fix all review issues by running Reviews (6): Last reviewed commit: "More changes for affine indexing" | Re-trigger Greptile |
|
/build |
1 similar comment
|
/build |
|
/build |
|
/build |
|
/build |
1 similar comment
|
/build |
No description provided.