Skip to content

Added planar types to speed up complex half precision GEMMs#1142

Open
cliffburdick wants to merge 7 commits intomainfrom
planar_tensor
Open

Added planar types to speed up complex half precision GEMMs#1142
cliffburdick wants to merge 7 commits intomainfrom
planar_tensor

Conversation

@cliffburdick
Copy link
Copy Markdown
Collaborator

No description provided.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 19, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 19, 2026

Greptile Summary

This PR introduces matxFp16ComplexPlanar / matxBf16ComplexPlanar marker types and plumbs them through the tensor, GEMM, FFT, and operator layers so that cuBLASLt can consume the required planar (split real/imag) layout for complex half-precision GEMMs directly — without a mandatory runtime conversion when the caller pre-allocates planar storage.

The three previously-flagged blocking issues are all addressed: SetOp EPT is now only forced scalar for planar output types, LoadPlanarComplex/StorePlanarComplex are correct because constructor validation enforces contiguity, and c_adj is reset to c.Data() for the planar-C path. Two minor P2 style issues remain.

Confidence Score: 5/5

Safe to merge; all three previously-blocking issues are resolved and only minor P2 style/performance findings remain

The three P0/P1 issues flagged in earlier review rounds (EPT vectorization regression in SetOp, TotalSize correctness for non-contiguous planar views, c_adj leading-dimension mismatch) are all closed. The two remaining findings are P2: one is a performance concern (unconditional scalar EPT in ReshapeOp for non-planar types), the other is cosmetic (dead code in sparse2dense path). Neither affects correctness of the new planar GEMM feature.

include/matx/operators/reshape.h (unnecessary vectorization disable for non-planar types) and include/matx/transforms/convert/sparse2dense_cusparse.h (dead-code fallback block)

Important Files Changed

Filename Overview
include/matx/core/half_complex.h Adds matxFp16ComplexPlanar and matxBf16ComplexPlanar as thin marker types inheriting from their interleaved counterparts for tag-based dispatch
include/matx/core/tensor_impl.h Adds PlanarComplexProxy for reference-like write semantics and LoadPlanarComplex/StorePlanarComplex; correctness relies on contiguity enforced by new constructor validation
include/matx/core/tensor.h Adds ValidatePlanarLayoutOnCreate_() in all constructors and Reset overloads to enforce unit innermost stride and contiguity for planar types, closing the TotalSize offset bug
include/matx/transforms/matmul/matmul_cuda.h Detects pre-allocated planar A/B/C inputs; skips conversion for already-planar tensors; c_adj.Reset(c.Data()) fixes leading-dimension mismatch for planar-C path
include/matx/operators/set.h EPT capability now conditionally forces {ONE,ONE} only when the output value type is a planar complex type, preserving vectorization for all other SetOp uses
include/matx/operators/reshape.h Forces {ONE,ONE} ELEMENTS_PER_THREAD unconditionally for all ReshapeOp, disabling vectorization even for non-planar types — may regress performance for common float/complex float reshape kernels
include/matx/transforms/convert/sparse2dense_cusparse.h Replaces hard assert with unreachable fallback block; getS2DSupportedTensor() already guarantees unit innermost stride so the new runtime branch is dead code
include/matx/transforms/fft/fft_cuda.h Fixes copy-paste bug: is_complex_half_v
include/matx/core/type_utils_both.h Extends is_complex_v, is_complex_half_v, is_bf16_type_v, is_fp16_type_v traits to include planar types; adds is_planar_complex_v
include/matx/operators/interleaved.h Adds InnerOp() accessor and round-trip convenience overloads so interleaved(planar(x)) and planar(interleaved(x)) collapse back to the original operator

Sequence Diagram

sequenceDiagram
    participant User
    participant MatMulLaunch
    participant PlanarConv as Planar Conversion
    participant cuBLASLt
    participant CopyBack

    User->>MatMulLaunch: matmul(c, a, b) with fp16/bf16 complex
    MatMulLaunch->>MatMulLaunch: detect a_is_planar / b_is_planar / c_is_planar
    alt A not planar
        MatMulLaunch->>PlanarConv: alloc a_hp, (a_planar = planar(a)).run()
        PlanarConv-->>MatMulLaunch: a_adj.Reset(a_hp)
    else A already planar
        MatMulLaunch->>MatMulLaunch: a_adj unchanged (uses a.Data())
    end
    alt B not planar
        MatMulLaunch->>PlanarConv: alloc b_hp, (b_planar = planar(b)).run()
        PlanarConv-->>MatMulLaunch: b_adj.Reset(b_hp)
    else B already planar
        MatMulLaunch->>MatMulLaunch: b_adj unchanged (uses b.Data())
    end
    alt C not planar
        MatMulLaunch->>PlanarConv: alloc c_hp as planar buffer
        PlanarConv-->>MatMulLaunch: c_adj.Reset(c_hp)
    else C already planar
        MatMulLaunch->>MatMulLaunch: c_adj.Reset(c.Data())
    end
    MatMulLaunch->>cuBLASLt: cublasLtMatmul(a_adj, b_adj, c_adj) with PLANE_OFFSET
    cuBLASLt-->>MatMulLaunch: result in c_adj
    alt C was not planar (c_hp allocated)
        MatMulLaunch->>CopyBack: interleaved(c_hp) → c
    end
    MatMulLaunch-->>User: done
Loading

Greploops — Automatically fix all review issues by running /greploops in Claude Code. It iterates: fix, push, re-review, repeat until 5/5 confidence.
Use the Greptile plugin for Claude Code to query reviews, search comments, and manage custom context directly from your terminal.

Reviews (6): Last reviewed commit: "More changes for affine indexing" | Re-trigger Greptile

@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

1 similar comment
@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

1 similar comment
@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant