Skip to content

LLVM and SPIRV-LLVM-Translator pulldown (WW15 2026)#21723

Draft
iclsrc wants to merge 3218 commits intosyclfrom
llvmspirv_pulldown
Draft

LLVM and SPIRV-LLVM-Translator pulldown (WW15 2026)#21723
iclsrc wants to merge 3218 commits intosyclfrom
llvmspirv_pulldown

Conversation

@iclsrc
Copy link
Copy Markdown
Collaborator

@iclsrc iclsrc commented Apr 10, 2026

rampitec and others added 30 commits March 27, 2026 15:20
These are previously covered by AMDGPUWmmaIntrinsicModsAllReuse.
  CONFLICT (content): Merge conflict in clang/include/clang/Basic/DiagnosticSemaKinds.td
As proposed in
riscv-non-isa/riscv-c-api-doc#110.

No real compiler-rt implementation as Linux does not list these
extensions in hwprobe.

Signed-off-by: Luke Wren <wren6991@gmail.com>
…yout (#188139)

fixes #188131

This change address stylistic changes @bogners requested in
llvm/llvm-project#186215 It also adds the
`storeMatrixArrayFromVector`. to
SPIRVLegalizePointerCast.cpp when we detect the matrix array of vector
memory layout
Changes to storeArrayFromVector were cleanup

Assisted-by Github Copilot for test case check lines
…#188896)

When SPIRV-LLVM-Translator is built in-tree (i.e., placed in
llvm/projects folder), llvm-spirv target exists.

Drop legacy llvm-spirv_target dependency (was for non-runtime build) and
add llvm-spirv to runtimes dependencies.
Get rid of several .h.def files which were used to ensure that the
macro definitions from llvm-libc-macro would be included in the public
header. Replace this logic with YAML instead - add entries to the
"macros" list that point to the correct "macro_header" to ensure it
would be included.

For C standard library headers, list several standard-define macros
to document their availability. For POSIX/Linux headers, only reference
a handful of macro, since more planning is needed to decide how to
represent platform-specific macro in YAML.
…123)

Use the generic switch rather than encoding the version number it
currently corresponds to.
… for risc-v (#110690)

The code generated for calls with FPCC eligible structs as arguments
doesn't consider the bitfield, which results in a store crossing the
boundary of the memory allocated using alloca, e.g.
For the code:
```
struct __attribute__((packed, aligned(1))) S {
   const float  f0;
   unsigned f1 : 1;
};
unsigned  func(struct S  arg)
{
    return arg.f1;
} 
```
The generated IR is:
```
 define dso_local signext i32 @func(
 float [[TMP0:%.*]], i32 [[TMP1:%.*]]) #[[ATTR0:[0-9]+]] {
  [[ENTRY:.*:]]
    [[ARG:%.*]] = alloca [[STRUCT_S:%.*]], align 1
    [[TMP2:%.*]] = getelementptr inbounds nuw { float, i32 }, ptr [[ARG]], i32 0, i32 0
    store float [[TMP0]], ptr [[TMP2]], align 1
    [[TMP3:%.*]] = getelementptr inbounds nuw { float, i32 }, ptr [[ARG]], i32 0, i32 1
    store i32 [[TMP1]], ptr [[TMP3]], align 1
    [[F1:%.*]] = getelementptr inbounds nuw [[STRUCT_S]], ptr [[ARG]], i32 0, i32 1
    [[BF_LOAD:%.*]] = load i8, ptr [[F1]], align 1
    [[BF_CLEAR:%.*]] = and i8 [[BF_LOAD]], 1
    [[BF_CAST:%.*]] = zext i8 [[BF_CLEAR]] to i32
    ret i32 [[BF_CAST]]
```
Where, `store i32 [[TMP1]], ptr [[TMP3]], align 1` can be seen crossing
the boundary of the allocated memory. If, the IR is seen after
optimizations (EarlyCSEPass), the IR left is:
```
 define dso_local noundef signext i32 @func(
 float [[TMP0:%.*]], i32 [[TMP1:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
  [[ENTRY:.*:]]
    ret i32 0
```
The patch trims the second member of the struct after taking into
consideration the bitwidth to decide the appropriate integer type and
the test shows the results of this patch.

Note that the bug is seen only when `f` extension is enabled for FPCC
eligibility.

Co-authored-by: muhammad.kamran4 <muhammad.kamran@esperantotech.com>
…697)

Device libs has a fast sqrt macro implemented this way.
Add tests targeting assembly printing and miscellaneous CodeGen areas
with low coverage:

- asm-printer-cpool.ll: HexagonAsmPrinter exercising constant pool entry
emission.

- asm-operand-modifiers.ll: Inline asm operand modifier printing paths
(lo/hi/mem).

- target-objfile-sdata.ll, split-double-volatile.ll, reg-info-types.ll:
Miscellaneous CodeGen coverage for HexagonTargetObjectFile small data
classification, HexagonSplitDouble volatile load handling, and
HexagonRegisterInfo register class queries.

- constext-store-imm.ll: HexagonConstExtenders store-immediate
optimization paths.
This removes dyn_cast invocations where the argument is already of the
target type (including through subtyping). This was created by adding a
static assert in dyn_cast and letting an LLM iterate until the code base
compiled. I then went through each example and cleaned it up. This does
not commit the static assert in dyn_cast, because it would prevent a lot
of uses in templated code. To prevent backsliding we should instead add
an LLVM aware version of
https://clang.llvm.org/extra/clang-tidy/checks/readability/redundant-casting.html
(or expand the existing one).
  CONFLICT (content): Merge conflict in llvm/lib/IR/DiagnosticInfo.cpp
The test used to look all good, but actually not. The WeakVH just make
itself null after the pointed value being replaced. So a zero value was
used because VarIndex become null. The test checks looks all good.

Actually only the WeakTrackingVH have the ability to be updated to new
value.

Change the test slightly to make that using zero index is wrong.
Previously, it generated extra `single` quote marks around the outer
braces (i.e., `'{'` `6442:\220,1\22` `'}'`). SPIR-V backend does not
expect that. It expects `{6442:\220,1\22}`.
… device (#189140)

[Driver][HIP] Fix bundled -S emitting bitcode instead of assembly for
device

PR #188262 added support for bundling HIP -S output under the new
offload driver, but the device backend still entered the
bitcode-emitting path in ConstructPhaseAction. The condition at the
Backend phase checked for the new offload driver and directed device
code to emit TY_LLVM_BC, without excluding the -S case. This caused
the device section in the bundled .s to contain LLVM bitcode instead
of textual AMDGPU assembly.

This broke the HIP UT CheckCodeObjAttr test which greps
copyKernel.s for "uniform_work_group_size" — a string that only
appears in textual assembly, not in bitcode.

Fix by excluding -S (without -emit-llvm) from the new-driver
bitcode path, so the device backend falls through to emit TY_PP_Asm
(textual assembly). Also add a missing lit test check that the
device backend produces assembler output for the bundled -S case.

Fixes: LCOMPILER-553
…aries (#189044)

We only did this for local variables but were were missing it for
globals.
…ardOperands API to BranchOpInterface (#187864)

To simplify the output of the reduction-tree pass, this PR introduces
the eraseRedundantBlocksInRegion. For regions containing multiple
execution paths, this functionality selects the shortest 'interesting'
path. Additionally, this PR adds the getSuccessorForwardOperands API to
BranchOpInterface. This allows us to extract the ForwardOperands for a
specific path chosen from multiple alternatives, enabling the creation
of a cf.br operation for the redirected jump.
…on index (#188508)

When a dynamic index of -1 (the kPoisonIndex sentinel) was folded into
the static position of a vector.insert op,
foldDenseElementsAttrDestInsertOp would proceed to call
calculateInsertPosition, which returned -1. The subsequent iterator
arithmetic (allValues.begin() + (-1)) was undefined behaviour, causing
an assertion in DenseElementsAttr::get.

Fix by bailing out early in foldDenseElementsAttrDestInsertOp when any
static position equals kPoisonIndex, consistent with how
InsertChainFullyInitialized already guards this case.

Fixes #188404

Assisted-by: Claude Code
…nt (#189163)

When invoking `-test-bytecode-roundtrip=test-dialect-version=X.Y` on a
module that contains no test dialect operations, the reader type
callback in `runTest0` called
`reader.getDialectVersion<test::TestDialect>()` and then immediately
asserted that it succeeded. However, if the test dialect was never
referenced in the bytecode (because no test dialect types appear in the
module), the dialect's version information is not stored in the
bytecode, so `getDialectVersion` legitimately returns failure.

When the test dialect version is unavailable in the bytecode being read,
the module contains no test dialect types, so no "funky"-group overrides
are needed and the callback can safely skip by returning `success()`.

A regression test is added with a module that has no test dialect ops,
exercising the `test-dialect-version=2.0` path that previously crashed.

Fixes #128321
Fixes #128325

Assisted-by: Claude Code
… (#188064)

This PR adds two new field specifiers (`operand` and `attribute`) and
extends the existing one (`result`):
- `default_factory` parameter is added for `result` and `attribute` to
specify default value via a lambda/function
- `kw_only` parameter is added for all these three specifiers, to make a
field a keyword-only parameter (without giving a default value).

```python
def result(
    *,
    infer_type: bool = False,
    default_factory: Optional[Callable[[], Any]] = None,
    kw_only: bool = False,
) -> Any: ...


def operand(
    *,
    kw_only: bool = False,
) -> Any: ...


def attribute(
    *,
    default_factory: Optional[Callable[[], Any]] = None,
    kw_only: bool = False,
) -> Any: ...
```

Examples about how to use them:
```python
class OperandSpecifierOp(TestFieldSpecifiers.Operation, name="operand_specifier"):
    a: Operand[IntegerType[32]] = operand()
    b: Optional[Operand[IntegerType[32]]] = None
    c: Operand[IntegerType[32]] = operand(kw_only=True)

class ResultSpecifierOp(TestFieldSpecifiers.Operation, name="result_specifier"):
    a: Result[IntegerType[32]] = result()
    b: Result[IntegerType[16]] = result(infer_type=True)
    c: Result[IntegerType] = result(
        default_factory=lambda: IntegerType.get_signless(8)
    )
    d: Sequence[Result[IntegerType]] = result(default_factory=list)
    e: Result[IntegerType[32]] = result(kw_only=True)

class AttributeSpecifierOp(
    TestFieldSpecifiers.Operation, name="attribute_specifier"
):
    a: IntegerAttr = attribute()
    b: IntegerAttr = attribute(
        default_factory=lambda: IntegerAttr.get(IntegerType.get_signless(32), 42)
    )
    c: StringAttr["a"] | StringAttr["b"] = attribute(
        default_factory=lambda: StringAttr.get("a")
    )
    d: IntegerAttr = attribute(kw_only=True)
```

---------

Co-authored-by: Rolf Morel <rolfmorel@gmail.com>
jhuber6 and others added 27 commits March 30, 2026 14:32
This fixes 04785ad.

Co-authored-by: Google Bazel Bot <google-bazel-bot@google.com>
Before the start of the algorithm in weak crossing SIV test, we need to
ensure both addrecs are `nsw`
If the trimming candidate subtree is rooted at an alternate-shuffle node
with binary ops, and this subtree has the same cost as the buildvector
node cost, better to stick with the buildvector node to avoid runtime
perf regressions from shuffle/extra operations  overhead that the cost model may
underestimate. Skip trimming if the subtree contains ExtractElement
nodes, since those operate on already-materialized vectors, which may
reduced vector-to-scalar code movement and have better perf.

Reviewers: hiraditya, bababuck, fhahn, RKSimon

Pull Request: llvm/llvm-project#188272
Implement non-negative value tracking for SUB-CTLZ chains in GlobalISel,
matching the behavior previously added to SelectionDAG.

Additionally, refactor the SelectionDAG implementation from the previous
patch to improve performance and code density.

Related to llvm/llvm-project#136516 and
llvm/llvm-project#186338 (comment)
…ace (#188514)

The `PromotableRegionOpInterface` implementations use two helpers that
are likely useful for other dialects implementing this interface as
well:
- `updateTerminator`: Appends the reaching definition as an operand to a
block's terminator, falling back to a default when the block has no
entry (e.g. dead code).
- `replaceWithNewResults`: Clones an operation with additional result
types while preserving its regions, then replaces the original.

This PR extracts them into a common utility header so that downstream
dialects can reuse them directly.
I'm open to discussion about the location of these utilities.
This implements handling for throwing calls inside an EH cleanup
handler. When such a call occurs, the CFG flattening pass replaces it
with a cir.try_call op that unwinds to a terminate block.

A new CIR operation, cir.eh.terminate, is added to facilitate this
handling, and the design document is updated to describe the new
behavior.

Assisted-by: Cursor / claude-4.6-opus-high
…320)

We had an errorNYI diagnostic to trigger when we generated an alias for
a ctor or dtor that had an existing declaration. Because functions are
used via flat symbol references, all that is needed is to erase the old
declaration. This change does that.
Move some functions around so that the CallBrInst processing is
contained. The 'static' functions don't need to be declared at the top;
just place them before the calls. Fix the naming to use lower-case for
the first letter of function names.
This fixes b6e4d27.

Co-authored-by: Google Bazel Bot <google-bazel-bot@google.com>
…t & mask ops in sg to wi pass (#187392)

This PR adds patterns for following vector ops in the new sg-to-wi pass

1. Transpose
2. BitCast
3. CreateMask
4. ConstantMask
…(#189279)

This patch introduces an amdgpu wrapper for
`rocdl.global.load.async.to.lds.bN` intrinsics, which were introduced in
gfx1250.

Assisted-by: Claude

---------

Signed-off-by: Eric Feng <Eric.Feng@amd.com>
…e.delinearize_index (#188369)

Allow `affine.delinearize_index` and `affine.linearize_index` to operate
on `vector<...x index>` types in addition to scalar indices.

---------

Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This implements handling of cleanup scopes in cases where a flag is
needed to indicate whether or not the cleanup is active. This happens in
cases where a cleanup is no longer required, but it isn't at the top of
the cleanup stack so it can't be popped. A temporary variable is used to
set the cleanup to an inactive state when it is no longer needed.

Assisted-by: Cursor / claude-4.6-opus-high (implementation)
Assisted-by: Cursor / gpt-5.3-codex (tests)
…sts (#3660)

Round trip for corresponding CHECK-LLVM is already working for some
tests. So they could be enabled

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@3f5257681447f4c
Update after llvm-project commit 8e1e371 ("[IR][NFC] Mark
BranchInst as deprecated (#187314)", 2026-03-19).

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@6b5f17f12b4be00
After llvm-project commit cf92512 ("[DebugInfo] Add Verifier
check for local imports in CU's imports field (#187118)", 2026-03-19),
DebugInfo got lost for these tests.  Ensure the metadata follows the
expected format.

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@9691713f67ce02c
The tests started to fail with "Unable to meet SPIR-V requirements for
this target" after upstream commit llvm/llvm-project@85049fc357ac
("[HLSL][SPIRV] Add support for -g to generate NonSemantic Debug Info
(#187051)", 2026-03-25).

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@40ce6c71d8d5b56
Replace manual save/set/restore of `SPIRVUseTextFormat` with
`llvm::SaveAndRestore` to guarantee restoration on all exit paths,
including the early return on write error.

Fixes Coverity CID 546125.
Resolves
KhronosGroup/SPIRV-LLVM-Translator#3414

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@01ee67ccc9a2c61
Move annotation strings created from UserSemantic decorations to the
constant address space. Even though these strings should disappear
before instruction selection, we ought to avoid globals in the private
addrspace.

Also set the source file and auxilliary data arguments to `null` instead
poison/undef which seems to be more common in llvm.

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@8f16307ff9dbe9e
A recent version of SPIRV-Tools found several issues with the test, such
as `DebugTypeFunction` having the wrong return type operand and
`DebugTypeBasic` missing the flags operand.

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@bf469923a25d484
)

A malformed SPIR-V binary can contain an instruction WordCount below the
instruction's minimum, causing wraparound in `resize(WordCount -
FixedWC)` and a ~17 GB allocation that can result in `std::bad_alloc`
when VA space is limited (32-bit systems, ulimit) or process hang on
memory access.

Fix by rejecting the malformed input early.

AI-assisted: Claude Sonnet 4.6 (commercial SaaS)

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@5adf335eedd8ba0
As in title, problem exposed during `sanitize_overflow` enablement in
triton compiler:
intel/intel-xpu-backend-for-triton#6533

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@b2410000b1ff3c9
@iclsrc iclsrc added the disable-lint Skip linter check step and proceed with build jobs label Apr 10, 2026
 Conflicts:
	clang/test/lit.site.cfg.py.in
	libclc/clc/lib/amdgpu/workitem/clc_get_local_id.cl
	libclc/libspirv/lib/amdgcn-amdhsa/SOURCES
Copy link
Copy Markdown
Contributor

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zizmor found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

disable-lint Skip linter check step and proceed with build jobs

Projects

None yet

Development

Successfully merging this pull request may close these issues.