Added planar types to speed up complex half precision GEMMs by cliffburdick · Pull Request #1142 · NVIDIA/MatX

cliffburdick · 2026-03-19T20:08:30Z

No description provided.

copy-pr-bot · 2026-03-19T20:08:34Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-03-19T20:15:43Z

Greptile Summary

This PR introduces matxFp16ComplexPlanar / matxBf16ComplexPlanar marker types and plumbs them through the tensor, GEMM, FFT, and operator layers so that cuBLASLt can consume the required planar (split real/imag) layout for complex half-precision GEMMs directly — without a mandatory runtime conversion when the caller pre-allocates planar storage.

The three previously-flagged blocking issues are all addressed: SetOp EPT is now only forced scalar for planar output types, LoadPlanarComplex/StorePlanarComplex are correct because constructor validation enforces contiguity, and c_adj is reset to c.Data() for the planar-C path. Two minor P2 style issues remain.

Confidence Score: 5/5

Safe to merge; all three previously-blocking issues are resolved and only minor P2 style/performance findings remain

The three P0/P1 issues flagged in earlier review rounds (EPT vectorization regression in SetOp, TotalSize correctness for non-contiguous planar views, c_adj leading-dimension mismatch) are all closed. The two remaining findings are P2: one is a performance concern (unconditional scalar EPT in ReshapeOp for non-planar types), the other is cosmetic (dead code in sparse2dense path). Neither affects correctness of the new planar GEMM feature.

include/matx/operators/reshape.h (unnecessary vectorization disable for non-planar types) and include/matx/transforms/convert/sparse2dense_cusparse.h (dead-code fallback block)

Important Files Changed

Filename	Overview
include/matx/core/half_complex.h	Adds matxFp16ComplexPlanar and matxBf16ComplexPlanar as thin marker types inheriting from their interleaved counterparts for tag-based dispatch
include/matx/core/tensor_impl.h	Adds PlanarComplexProxy for reference-like write semantics and LoadPlanarComplex/StorePlanarComplex; correctness relies on contiguity enforced by new constructor validation
include/matx/core/tensor.h	Adds ValidatePlanarLayoutOnCreate_() in all constructors and Reset overloads to enforce unit innermost stride and contiguity for planar types, closing the TotalSize offset bug
include/matx/transforms/matmul/matmul_cuda.h	Detects pre-allocated planar A/B/C inputs; skips conversion for already-planar tensors; c_adj.Reset(c.Data()) fixes leading-dimension mismatch for planar-C path
include/matx/operators/set.h	EPT capability now conditionally forces {ONE,ONE} only when the output value type is a planar complex type, preserving vectorization for all other SetOp uses
include/matx/operators/reshape.h	Forces {ONE,ONE} ELEMENTS_PER_THREAD unconditionally for all ReshapeOp, disabling vectorization even for non-planar types — may regress performance for common float/complex float reshape kernels
include/matx/transforms/convert/sparse2dense_cusparse.h	Replaces hard assert with unreachable fallback block; getS2DSupportedTensor() already guarantees unit innermost stride so the new runtime branch is dead code
include/matx/transforms/fft/fft_cuda.h	Fixes copy-paste bug: is_complex_half_v
include/matx/core/type_utils_both.h	Extends is_complex_v, is_complex_half_v, is_bf16_type_v, is_fp16_type_v traits to include planar types; adds is_planar_complex_v
include/matx/operators/interleaved.h	Adds InnerOp() accessor and round-trip convenience overloads so interleaved(planar(x)) and planar(interleaved(x)) collapse back to the original operator

Sequence Diagram

sequenceDiagram
    participant User
    participant MatMulLaunch
    participant PlanarConv as Planar Conversion
    participant cuBLASLt
    participant CopyBack

    User->>MatMulLaunch: matmul(c, a, b) with fp16/bf16 complex
    MatMulLaunch->>MatMulLaunch: detect a_is_planar / b_is_planar / c_is_planar
    alt A not planar
        MatMulLaunch->>PlanarConv: alloc a_hp, (a_planar = planar(a)).run()
        PlanarConv-->>MatMulLaunch: a_adj.Reset(a_hp)
    else A already planar
        MatMulLaunch->>MatMulLaunch: a_adj unchanged (uses a.Data())
    end
    alt B not planar
        MatMulLaunch->>PlanarConv: alloc b_hp, (b_planar = planar(b)).run()
        PlanarConv-->>MatMulLaunch: b_adj.Reset(b_hp)
    else B already planar
        MatMulLaunch->>MatMulLaunch: b_adj unchanged (uses b.Data())
    end
    alt C not planar
        MatMulLaunch->>PlanarConv: alloc c_hp as planar buffer
        PlanarConv-->>MatMulLaunch: c_adj.Reset(c_hp)
    else C already planar
        MatMulLaunch->>MatMulLaunch: c_adj.Reset(c.Data())
    end
    MatMulLaunch->>cuBLASLt: cublasLtMatmul(a_adj, b_adj, c_adj) with PLANE_OFFSET
    cuBLASLt-->>MatMulLaunch: result in c_adj
    alt C was not planar (c_hp allocated)
        MatMulLaunch->>CopyBack: interleaved(c_hp) → c
    end
    MatMulLaunch-->>User: done

_{Greploops — Automatically fix all review issues by running /greploops in Claude Code. It iterates: fix, push, re-review, repeat until 5/5 confidence.
Use the Greptile plugin for Claude Code to query reviews, search comments, and manage custom context directly from your terminal.}

_{Reviews (6): Last reviewed commit: "More changes for affine indexing" | Re-trigger Greptile}

include/matx/operators/set.h

include/matx/core/tensor_impl.h

include/matx/transforms/matmul/matmul_cuda.h

cliffburdick · 2026-03-19T21:04:14Z

/build

cliffburdick · 2026-03-20T15:41:57Z

/build

cliffburdick · 2026-03-20T21:05:22Z

/build

cliffburdick · 2026-04-03T16:16:17Z

/build

cliffburdick · 2026-04-06T22:43:02Z

/build

cliffburdick · 2026-04-08T16:35:55Z

/build

cliffburdick added 2 commits March 19, 2026 13:04

Added planar types to speed up complex half precision GEMMs

33ec90f

Cleanup

2507608

greptile-apps bot reviewed Mar 19, 2026

View reviewed changes

include/matx/operators/set.h Show resolved Hide resolved

include/matx/core/tensor_impl.h Show resolved Hide resolved

include/matx/transforms/matmul/matmul_cuda.h Show resolved Hide resolved

cliffburdick added 2 commits March 19, 2026 13:29

Code review updates

c47a6cc

Code review updates

59d5320

Compilation error

de287c9

Fix failing sparse and reshape unit tests

4da48da

More changes for affine indexing

10902a4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added planar types to speed up complex half precision GEMMs#1142

Added planar types to speed up complex half precision GEMMs#1142
cliffburdick wants to merge 7 commits intomainfrom
planar_tensor

cliffburdick commented Mar 19, 2026

Uh oh!

copy-pr-bot bot commented Mar 19, 2026

Uh oh!

greptile-apps bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cliffburdick commented Mar 19, 2026

Uh oh!

cliffburdick commented Mar 20, 2026

Uh oh!

cliffburdick commented Mar 20, 2026

Uh oh!

cliffburdick commented Apr 3, 2026

Uh oh!

cliffburdick commented Apr 6, 2026

Uh oh!

cliffburdick commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cliffburdick commented Mar 19, 2026

Uh oh!

copy-pr-bot bot commented Mar 19, 2026

Uh oh!

greptile-apps bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cliffburdick commented Mar 19, 2026

Uh oh!

cliffburdick commented Mar 20, 2026

Uh oh!

cliffburdick commented Mar 20, 2026

Uh oh!

cliffburdick commented Apr 3, 2026

Uh oh!

cliffburdick commented Apr 6, 2026

Uh oh!

cliffburdick commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps bot commented Mar 19, 2026 •

edited

Loading