LLVM and SPIRV-LLVM-Translator pulldown (WW15 2026)#21723
Draft
LLVM and SPIRV-LLVM-Translator pulldown (WW15 2026)#21723
Conversation
These are previously covered by AMDGPUWmmaIntrinsicModsAllReuse.
CONFLICT (content): Merge conflict in clang/include/clang/Basic/DiagnosticSemaKinds.td
As proposed in riscv-non-isa/riscv-c-api-doc#110. No real compiler-rt implementation as Linux does not list these extensions in hwprobe. Signed-off-by: Luke Wren <wren6991@gmail.com>
…yout (#188139) fixes #188131 This change address stylistic changes @bogners requested in llvm/llvm-project#186215 It also adds the `storeMatrixArrayFromVector`. to SPIRVLegalizePointerCast.cpp when we detect the matrix array of vector memory layout Changes to storeArrayFromVector were cleanup Assisted-by Github Copilot for test case check lines
…#188896) When SPIRV-LLVM-Translator is built in-tree (i.e., placed in llvm/projects folder), llvm-spirv target exists. Drop legacy llvm-spirv_target dependency (was for non-runtime build) and add llvm-spirv to runtimes dependencies.
Get rid of several .h.def files which were used to ensure that the macro definitions from llvm-libc-macro would be included in the public header. Replace this logic with YAML instead - add entries to the "macros" list that point to the correct "macro_header" to ensure it would be included. For C standard library headers, list several standard-define macros to document their availability. For POSIX/Linux headers, only reference a handful of macro, since more planning is needed to decide how to represent platform-specific macro in YAML.
…123) Use the generic switch rather than encoding the version number it currently corresponds to.
… for risc-v (#110690)
The code generated for calls with FPCC eligible structs as arguments
doesn't consider the bitfield, which results in a store crossing the
boundary of the memory allocated using alloca, e.g.
For the code:
```
struct __attribute__((packed, aligned(1))) S {
const float f0;
unsigned f1 : 1;
};
unsigned func(struct S arg)
{
return arg.f1;
}
```
The generated IR is:
```
define dso_local signext i32 @func(
float [[TMP0:%.*]], i32 [[TMP1:%.*]]) #[[ATTR0:[0-9]+]] {
[[ENTRY:.*:]]
[[ARG:%.*]] = alloca [[STRUCT_S:%.*]], align 1
[[TMP2:%.*]] = getelementptr inbounds nuw { float, i32 }, ptr [[ARG]], i32 0, i32 0
store float [[TMP0]], ptr [[TMP2]], align 1
[[TMP3:%.*]] = getelementptr inbounds nuw { float, i32 }, ptr [[ARG]], i32 0, i32 1
store i32 [[TMP1]], ptr [[TMP3]], align 1
[[F1:%.*]] = getelementptr inbounds nuw [[STRUCT_S]], ptr [[ARG]], i32 0, i32 1
[[BF_LOAD:%.*]] = load i8, ptr [[F1]], align 1
[[BF_CLEAR:%.*]] = and i8 [[BF_LOAD]], 1
[[BF_CAST:%.*]] = zext i8 [[BF_CLEAR]] to i32
ret i32 [[BF_CAST]]
```
Where, `store i32 [[TMP1]], ptr [[TMP3]], align 1` can be seen crossing
the boundary of the allocated memory. If, the IR is seen after
optimizations (EarlyCSEPass), the IR left is:
```
define dso_local noundef signext i32 @func(
float [[TMP0:%.*]], i32 [[TMP1:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
[[ENTRY:.*:]]
ret i32 0
```
The patch trims the second member of the struct after taking into
consideration the bitwidth to decide the appropriate integer type and
the test shows the results of this patch.
Note that the bug is seen only when `f` extension is enabled for FPCC
eligibility.
Co-authored-by: muhammad.kamran4 <muhammad.kamran@esperantotech.com>
…697) Device libs has a fast sqrt macro implemented this way.
Add tests targeting assembly printing and miscellaneous CodeGen areas with low coverage: - asm-printer-cpool.ll: HexagonAsmPrinter exercising constant pool entry emission. - asm-operand-modifiers.ll: Inline asm operand modifier printing paths (lo/hi/mem). - target-objfile-sdata.ll, split-double-volatile.ll, reg-info-types.ll: Miscellaneous CodeGen coverage for HexagonTargetObjectFile small data classification, HexagonSplitDouble volatile load handling, and HexagonRegisterInfo register class queries. - constext-store-imm.ll: HexagonConstExtenders store-immediate optimization paths.
This removes dyn_cast invocations where the argument is already of the target type (including through subtyping). This was created by adding a static assert in dyn_cast and letting an LLM iterate until the code base compiled. I then went through each example and cleaned it up. This does not commit the static assert in dyn_cast, because it would prevent a lot of uses in templated code. To prevent backsliding we should instead add an LLVM aware version of https://clang.llvm.org/extra/clang-tidy/checks/readability/redundant-casting.html (or expand the existing one).
CONFLICT (content): Merge conflict in llvm/lib/IR/DiagnosticInfo.cpp
The test used to look all good, but actually not. The WeakVH just make itself null after the pointed value being replaced. So a zero value was used because VarIndex become null. The test checks looks all good. Actually only the WeakTrackingVH have the ability to be updated to new value. Change the test slightly to make that using zero index is wrong.
Previously, it generated extra `single` quote marks around the outer
braces (i.e., `'{'` `6442:\220,1\22` `'}'`). SPIR-V backend does not
expect that. It expects `{6442:\220,1\22}`.
… device (#189140) [Driver][HIP] Fix bundled -S emitting bitcode instead of assembly for device PR #188262 added support for bundling HIP -S output under the new offload driver, but the device backend still entered the bitcode-emitting path in ConstructPhaseAction. The condition at the Backend phase checked for the new offload driver and directed device code to emit TY_LLVM_BC, without excluding the -S case. This caused the device section in the bundled .s to contain LLVM bitcode instead of textual AMDGPU assembly. This broke the HIP UT CheckCodeObjAttr test which greps copyKernel.s for "uniform_work_group_size" — a string that only appears in textual assembly, not in bitcode. Fix by excluding -S (without -emit-llvm) from the new-driver bitcode path, so the device backend falls through to emit TY_PP_Asm (textual assembly). Also add a missing lit test check that the device backend produces assembler output for the bundled -S case. Fixes: LCOMPILER-553
…aries (#189044) We only did this for local variables but were were missing it for globals.
…ardOperands API to BranchOpInterface (#187864) To simplify the output of the reduction-tree pass, this PR introduces the eraseRedundantBlocksInRegion. For regions containing multiple execution paths, this functionality selects the shortest 'interesting' path. Additionally, this PR adds the getSuccessorForwardOperands API to BranchOpInterface. This allows us to extract the ForwardOperands for a specific path chosen from multiple alternatives, enabling the creation of a cf.br operation for the redirected jump.
…tions (#189113) Fixes llvm/llvm-project#187716.
…ssorForwardOperands API to BranchOpInterface" (#189150) Reverts llvm/llvm-project#187864, because it is causing same build bot failures. See https://lab.llvm.org/buildbot/#/builders/138/builds/27662 and https://lab.llvm.org/buildbot/#/builders/169/builds/21376/steps/11/logs/stdio for memory leak issues.
…on index (#188508) When a dynamic index of -1 (the kPoisonIndex sentinel) was folded into the static position of a vector.insert op, foldDenseElementsAttrDestInsertOp would proceed to call calculateInsertPosition, which returned -1. The subsequent iterator arithmetic (allValues.begin() + (-1)) was undefined behaviour, causing an assertion in DenseElementsAttr::get. Fix by bailing out early in foldDenseElementsAttrDestInsertOp when any static position equals kPoisonIndex, consistent with how InsertChainFullyInitialized already guards this case. Fixes #188404 Assisted-by: Claude Code
…nt (#189163) When invoking `-test-bytecode-roundtrip=test-dialect-version=X.Y` on a module that contains no test dialect operations, the reader type callback in `runTest0` called `reader.getDialectVersion<test::TestDialect>()` and then immediately asserted that it succeeded. However, if the test dialect was never referenced in the bytecode (because no test dialect types appear in the module), the dialect's version information is not stored in the bytecode, so `getDialectVersion` legitimately returns failure. When the test dialect version is unavailable in the bytecode being read, the module contains no test dialect types, so no "funky"-group overrides are needed and the callback can safely skip by returning `success()`. A regression test is added with a module that has no test dialect ops, exercising the `test-dialect-version=2.0` path that previously crashed. Fixes #128321 Fixes #128325 Assisted-by: Claude Code
… (#188064)
This PR adds two new field specifiers (`operand` and `attribute`) and
extends the existing one (`result`):
- `default_factory` parameter is added for `result` and `attribute` to
specify default value via a lambda/function
- `kw_only` parameter is added for all these three specifiers, to make a
field a keyword-only parameter (without giving a default value).
```python
def result(
*,
infer_type: bool = False,
default_factory: Optional[Callable[[], Any]] = None,
kw_only: bool = False,
) -> Any: ...
def operand(
*,
kw_only: bool = False,
) -> Any: ...
def attribute(
*,
default_factory: Optional[Callable[[], Any]] = None,
kw_only: bool = False,
) -> Any: ...
```
Examples about how to use them:
```python
class OperandSpecifierOp(TestFieldSpecifiers.Operation, name="operand_specifier"):
a: Operand[IntegerType[32]] = operand()
b: Optional[Operand[IntegerType[32]]] = None
c: Operand[IntegerType[32]] = operand(kw_only=True)
class ResultSpecifierOp(TestFieldSpecifiers.Operation, name="result_specifier"):
a: Result[IntegerType[32]] = result()
b: Result[IntegerType[16]] = result(infer_type=True)
c: Result[IntegerType] = result(
default_factory=lambda: IntegerType.get_signless(8)
)
d: Sequence[Result[IntegerType]] = result(default_factory=list)
e: Result[IntegerType[32]] = result(kw_only=True)
class AttributeSpecifierOp(
TestFieldSpecifiers.Operation, name="attribute_specifier"
):
a: IntegerAttr = attribute()
b: IntegerAttr = attribute(
default_factory=lambda: IntegerAttr.get(IntegerType.get_signless(32), 42)
)
c: StringAttr["a"] | StringAttr["b"] = attribute(
default_factory=lambda: StringAttr.get("a")
)
d: IntegerAttr = attribute(kw_only=True)
```
---------
Co-authored-by: Rolf Morel <rolfmorel@gmail.com>
This fixes 04785ad. Co-authored-by: Google Bazel Bot <google-bazel-bot@google.com>
Before the start of the algorithm in weak crossing SIV test, we need to ensure both addrecs are `nsw`
If the trimming candidate subtree is rooted at an alternate-shuffle node with binary ops, and this subtree has the same cost as the buildvector node cost, better to stick with the buildvector node to avoid runtime perf regressions from shuffle/extra operations overhead that the cost model may underestimate. Skip trimming if the subtree contains ExtractElement nodes, since those operate on already-materialized vectors, which may reduced vector-to-scalar code movement and have better perf. Reviewers: hiraditya, bababuck, fhahn, RKSimon Pull Request: llvm/llvm-project#188272
Implement non-negative value tracking for SUB-CTLZ chains in GlobalISel, matching the behavior previously added to SelectionDAG. Additionally, refactor the SelectionDAG implementation from the previous patch to improve performance and code density. Related to llvm/llvm-project#136516 and llvm/llvm-project#186338 (comment)
…ace (#188514) The `PromotableRegionOpInterface` implementations use two helpers that are likely useful for other dialects implementing this interface as well: - `updateTerminator`: Appends the reaching definition as an operand to a block's terminator, falling back to a default when the block has no entry (e.g. dead code). - `replaceWithNewResults`: Clones an operation with additional result types while preserving its regions, then replaces the original. This PR extracts them into a common utility header so that downstream dialects can reuse them directly. I'm open to discussion about the location of these utilities.
This implements handling for throwing calls inside an EH cleanup handler. When such a call occurs, the CFG flattening pass replaces it with a cir.try_call op that unwinds to a terminate block. A new CIR operation, cir.eh.terminate, is added to facilitate this handling, and the design document is updated to describe the new behavior. Assisted-by: Cursor / claude-4.6-opus-high
…320) We had an errorNYI diagnostic to trigger when we generated an alias for a ctor or dtor that had an existing declaration. Because functions are used via flat symbol references, all that is needed is to erase the old declaration. This change does that.
Move some functions around so that the CallBrInst processing is contained. The 'static' functions don't need to be declared at the top; just place them before the calls. Fix the naming to use lower-case for the first letter of function names.
This fixes b6e4d27. Co-authored-by: Google Bazel Bot <google-bazel-bot@google.com>
…t & mask ops in sg to wi pass (#187392) This PR adds patterns for following vector ops in the new sg-to-wi pass 1. Transpose 2. BitCast 3. CreateMask 4. ConstantMask
…6 (#189468) Fixes: LCOMPILER-1673
…ol-conversion (#189149) Fixes llvm/llvm-project#176889.
…(#189279) This patch introduces an amdgpu wrapper for `rocdl.global.load.async.to.lds.bN` intrinsics, which were introduced in gfx1250. Assisted-by: Claude --------- Signed-off-by: Eric Feng <Eric.Feng@amd.com>
…e.delinearize_index (#188369) Allow `affine.delinearize_index` and `affine.linearize_index` to operate on `vector<...x index>` types in addition to scalar indices. --------- Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This implements handling of cleanup scopes in cases where a flag is needed to indicate whether or not the cleanup is active. This happens in cases where a cleanup is no longer required, but it isn't at the top of the cleanup stack so it can't be popped. A temporary variable is used to set the cleanup to an inactive state when it is no longer needed. Assisted-by: Cursor / claude-4.6-opus-high (implementation) Assisted-by: Cursor / gpt-5.3-codex (tests)
…sts (#3660) Round trip for corresponding CHECK-LLVM is already working for some tests. So they could be enabled Original commit: KhronosGroup/SPIRV-LLVM-Translator@3f5257681447f4c
Update after llvm-project commit 8e1e371 ("[IR][NFC] Mark BranchInst as deprecated (#187314)", 2026-03-19). Original commit: KhronosGroup/SPIRV-LLVM-Translator@6b5f17f12b4be00
After llvm-project commit cf92512 ("[DebugInfo] Add Verifier check for local imports in CU's imports field (#187118)", 2026-03-19), DebugInfo got lost for these tests. Ensure the metadata follows the expected format. Original commit: KhronosGroup/SPIRV-LLVM-Translator@9691713f67ce02c
The tests started to fail with "Unable to meet SPIR-V requirements for this target" after upstream commit llvm/llvm-project@85049fc357ac ("[HLSL][SPIRV] Add support for -g to generate NonSemantic Debug Info (#187051)", 2026-03-25). Original commit: KhronosGroup/SPIRV-LLVM-Translator@40ce6c71d8d5b56
Replace manual save/set/restore of `SPIRVUseTextFormat` with `llvm::SaveAndRestore` to guarantee restoration on all exit paths, including the early return on write error. Fixes Coverity CID 546125. Resolves KhronosGroup/SPIRV-LLVM-Translator#3414 Original commit: KhronosGroup/SPIRV-LLVM-Translator@01ee67ccc9a2c61
Move annotation strings created from UserSemantic decorations to the constant address space. Even though these strings should disappear before instruction selection, we ought to avoid globals in the private addrspace. Also set the source file and auxilliary data arguments to `null` instead poison/undef which seems to be more common in llvm. Original commit: KhronosGroup/SPIRV-LLVM-Translator@8f16307ff9dbe9e
A recent version of SPIRV-Tools found several issues with the test, such as `DebugTypeFunction` having the wrong return type operand and `DebugTypeBasic` missing the flags operand. Original commit: KhronosGroup/SPIRV-LLVM-Translator@bf469923a25d484
) A malformed SPIR-V binary can contain an instruction WordCount below the instruction's minimum, causing wraparound in `resize(WordCount - FixedWC)` and a ~17 GB allocation that can result in `std::bad_alloc` when VA space is limited (32-bit systems, ulimit) or process hang on memory access. Fix by rejecting the malformed input early. AI-assisted: Claude Sonnet 4.6 (commercial SaaS) Original commit: KhronosGroup/SPIRV-LLVM-Translator@5adf335eedd8ba0
As in title, problem exposed during `sanitize_overflow` enablement in triton compiler: intel/intel-xpu-backend-for-triton#6533 Original commit: KhronosGroup/SPIRV-LLVM-Translator@b2410000b1ff3c9
Conflicts: clang/test/lit.site.cfg.py.in libclc/clc/lib/amdgpu/workitem/clc_get_local_id.cl libclc/libspirv/lib/amdgcn-amdhsa/SOURCES
Contributor
There was a problem hiding this comment.
zizmor found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
LLVM: llvm/llvm-project@7a3b7f1
SPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@b241000