Skip to content

Signal when SPIR-V legalization needs targeted cleanup#8283

Open
AnastaZIuk wants to merge 2 commits intomicrosoft:mainfrom
Devsh-Graphics-Programming:unroll
Open

Signal when SPIR-V legalization needs targeted cleanup#8283
AnastaZIuk wants to merge 2 commits intomicrosoft:mainfrom
Devsh-Graphics-Programming:unroll

Conversation

@AnastaZIuk
Copy link
Contributor

@AnastaZIuk AnastaZIuk commented Mar 20, 2026

Summary

  • add dedicated SpirvEmitter bits for legalization-time loop unroll and legalization-time SSA rewrite
  • set those bits only for the lowering paths that still need those transforms for correctness
  • pass those signals into the narrowed SPIR-V legalization overload
  • update external/SPIRV-Headers and external/SPIRV-Tools submodule pointers to the matching branch state
  • adapt DXC to the current SPIRV-Tools legalization API and the current MSVC getFunctionType signature
  • fix vector splat of spec constants so specialization-constant composites emit the legal instruction form
  • update 7 CodeGenSPIRV checks to match the accepted SPIR-V shape

Root cause

The expensive blanket policy lives in the SPIR-V optimizer recipe, not in HLSL parsing or the DXC CLI front-end.

DXC is still the producer that knows when a specific lowering path genuinely needs those expensive legalization transforms for correctness, so the optimizer should not have to guess that globally.

Two confirmed correctness-sensitive cases are:

  • variable image sample offsets, where a producer-side signal is still needed for lowerings that must end in constant-image-operand forms such as ConstOffset
  • combined image sampler conversion, which does not need loop unroll but still needs legalize-time SSA rewrite cleanup to avoid invalid SPIR-V such as storing sampled image types

This branch also fixes one separate producer-side correctness bug: vector splat of bool spec constants was being emitted as OpConstantComposite instead of the legal OpSpecConstantComposite.

The same narrowing also exposed one existing image-atomic cleanup dependency. The matching LocalSingleStoreElim follow-up lives in the companion KhronosGroup/SPIRV-Tools#6612 branch and is rolled here through the updated submodule pointer.

Spec basis

The hard legality rules here are much narrower than the old blanket cleanup.

Image, sampler, and sampled image objects must not appear as operands to OpPhi instructions, or OpSelect instructions, or any instructions other than the image or sampler instructions specified to operate on them.

All OpSampledImage instructions must be in the same block in which their Result <id> are consumed.

Spec:

The involved types are explicitly opaque:

This type is opaque: values of this type have no defined physical size or bit pattern.

Spec:

For the image-offset case, the relevant SPIR-V rule is also explicit:

ConstOffset ... It must be an <id> of a constant instruction

Spec:

For the retained-loop case, OpSelect is also explicitly allowed on pointer-typed objects:

Before version 1.4, Result Type must be a pointer, scalar, or vector.

The types of Object 1 and Object 2 must be the same as Result Type.

Spec:

For the image-atomic follow-up, OpImageTexelPointer is also explicit:

Image must have a type of OpTypePointer with Type OpTypeImage.

Spec:

For the spec-constant splat fix, the composite rules are also explicit:

The Constituents must all be <id>s of non-specialization constant-instruction declarations or an OpUndef.

The Constituents must be the <id> of other specialization constants, constant declarations, or an OpUndef.

Spec:

Vulkan tightens this further:

Objects of types OpTypeImage, OpTypeSampler, OpTypeSampledImage ... must not be stored to or modified

If the DescriptorHeapEXT capability is not declared, structure types must not contain opaque types

Spec:

Benchmark

Validation

  • fresh full local CodeGenSPIRV on this branch pair passes with 1438 expected passes, 2 expected failures, and 0 unexpected failures

Updated test notes

  • vk.spec-constant.reuse.hlsl: the old check expected OpConstantComposite from a bool spec-constant splat. That shape is invalid. This branch now emits the legal OpSpecConstantComposite.
  • 15-loop-var-unroll-ok.hlsl: the old check implicitly depended on eager optimizer-side full unroll. This branch keeps LoopControl::Unroll in the IR and uses a legal pointer OpSelect path instead.
  • cast.flat-conversion.matrix.hlsl: the old check assumed the matrix-to-struct path stayed as pre-folded constant composites. This branch rebuilds the target values from the loaded matrix via OpCompositeExtract and OpCompositeConstruct. The stored value is the same and the row-major or column-major decoration still carries the memory layout rule.
  • type.rwstructured-buffer.array.counter.indirect.hlsl and type.rwstructured-buffer.array.counter.indirect2.hlsl: the old checks assumed a single flattened access chain directly to the counter or data element. This branch keeps an intermediate structured access chain and then indexes the member. The resource, binding, atomic, and data access semantics are unchanged.
  • vk.binding.global-struct-of-resources.optimized.hlsl: the old regex overconstrained generated ids. This branch preserves the same resource names and bindings but uses different legal ids, so the test now checks the stable semantic parts instead.
  • max_id.hlsl: the old check overconstrained an exact Bound value. This branch still exercises the same larger-limit success path, but the exact bound is no longer stable after the accepted optimizer recipe changes.

Companion SPIR-V optimizer PR:
KhronosGroup/SPIRV-Tools#6612

@github-actions
Copy link
Contributor

github-actions bot commented Mar 20, 2026

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 54afd2c6a5f1a5bcbef66967515fa69f5a3650fd 9ac57982725c4264348b20ad3fb271acc0993754 -- tools/clang/lib/Frontend/Rewrite/RewriteObjC.cpp tools/clang/lib/SPIRV/SpirvEmitter.cpp tools/clang/lib/SPIRV/SpirvEmitter.h
View the diff from clang-format here.
diff --git a/tools/clang/lib/SPIRV/SpirvEmitter.cpp b/tools/clang/lib/SPIRV/SpirvEmitter.cpp
index 4a616a00..0920fc1c 100644
--- a/tools/clang/lib/SPIRV/SpirvEmitter.cpp
+++ b/tools/clang/lib/SPIRV/SpirvEmitter.cpp
@@ -593,8 +593,7 @@ SpirvEmitter::SpirvEmitter(CompilerInstance &ci)
       constEvaluator(astContext, spvBuilder), entryFunction(nullptr),
       curFunction(nullptr), curThis(nullptr), seenPushConstantAt(),
       isSpecConstantMode(false), needsLegalization(false),
-      needsLegalizationLoopUnroll(false),
-      needsLegalizationSsaRewrite(false),
+      needsLegalizationLoopUnroll(false), needsLegalizationSsaRewrite(false),
       beforeHlslLegalization(false), mainSourceFile(nullptr) {
 
   // Get ShaderModel from command line hlsl profile option.
@@ -957,8 +956,7 @@ void SpirvEmitter::HandleTranslationUnit(ASTContext &context) {
       !dsetbindingsToCombineImageSampler.empty() ||
       spirvOptions.signaturePacking;
   needsLegalizationSsaRewrite =
-      needsLegalizationSsaRewrite ||
-      !dsetbindingsToCombineImageSampler.empty();
+      needsLegalizationSsaRewrite || !dsetbindingsToCombineImageSampler.empty();
 
   // Run legalization passes
   if (spirvOptions.codeGenHighLevel) {
@@ -16679,9 +16677,9 @@ bool SpirvEmitter::spirvToolsLegalize(std::vector<uint32_t> *mod,
   } else if (needsLegalizationSsaRewrite) {
     legalizationSsaRewriteMode = spvtools::SSARewriteMode::OpaqueOnly;
   }
-  optimizer.RegisterLegalizationPasses(
-      spirvOptions.preserveInterface, needsLegalizationLoopUnroll,
-      legalizationSsaRewriteMode);
+  optimizer.RegisterLegalizationPasses(spirvOptions.preserveInterface,
+                                       needsLegalizationLoopUnroll,
+                                       legalizationSsaRewriteMode);
   // Add flattening of resources if needed.
   if (spirvOptions.flattenResourceArrays) {
     optimizer.RegisterPass(
  • Check this box to apply formatting changes to this branch.

@AnastaZIuk AnastaZIuk marked this pull request as ready for review March 20, 2026 18:18
@AnastaZIuk AnastaZIuk changed the title Signal when SPIR-V legalization needs loop unroll Signal when SPIR-V legalization needs targeted cleanup Mar 20, 2026
Copy link
Collaborator

@s-perron s-perron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we will be able to accept these changes. I understand the place this is coming from. Optimization take time, and if we do lots of unrolling it creates lots of extra code that needs to be optimized.

The problem is that we have to generate legal code, and there are cases where these optimizations have proven necessary.

As a simple example, this PR changes the expected behaviour of on of our example from the SPIR-V cookbook.

We can try to work on alternatives, but they have to be done in a way that we can still legalize as many cases as we currently do. If not, we will break someone.

Things that cannot happen:

  1. The front-end is not the place to determine if an optimization can be skipped. It has too little information and cannot account for how other optimizations will change things. For example, can we correctly legalize the code if we limit ssa rewrite to opaque objects only? Not very likely. That pass is needed to enable all sort of things like scalar replacement. It might work in your code, but I can easily come up with cases where it will not.

  2. Unrolling can significantly impact runtime performance, and the impact is often unpredictable. There are some drivers that I have see that do not do much unrolling in their drivers, and having spir-v do the unrolling is important for runtime performance. We cannot change the unrolling pass in the performance passes.

Potential ideas:

  1. We could modify the unrolling in the legalization passes to look for illegal code in the loop. This could leave many of the loops in your code.
  2. You can disable the performance passs, or replace them with your own set of passes to run for the optimization phase. See here.
  3. If you want even more control, you can dump the code before legalization (-fcgl), and call the optimizer yourself with the passes you want.

These are ideas for limiting the unrolling. I don't think there is a way to limit the ssa rewrite pass the way you want. However, I don't think that is a good idea. It could impact too many other passes.

On a side note, do you see any changes in the time it takes drivers to compile the shaders/pipelines if you skip all of the unrolling in sirv-opt?

// CHECK-NEXT: [[ptr:%[0-9]+]] = OpAccessChain %_ptr_Uniform_S %gRWSBuffer
// CHECK-NEXT: OpStore [[ptr]] [[val]]
// CHECK: OpLoopMerge {{%[0-9]+}} {{%[0-9]+}} Unroll
// CHECK: [[which:%[0-9]+]] = OpSelect %_ptr_Uniform_type_StructuredBuffer_S {{%[0-9]+}} %gSBuffer1 %gSBuffer2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not legal. We have to unroll in this case.

Copy link

@devshgraphicsprogramming devshgraphicsprogramming Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, it would only be legal with SPV_KHR_variable_pointers if the gSBuffer pointers were of Shader Storage Class.

However I could go out on a limb and claim that it was never supposed to be legal without [[unroll]] on the loop ( I know the example has unroll so in this case SPIR-V codegen shouldn't change)

@github-project-automation github-project-automation bot moved this from New to In progress in HLSL Roadmap Mar 26, 2026
@devshgraphicsprogramming
Copy link

devshgraphicsprogramming commented Mar 26, 2026

We can try to work on alternatives, but they have to be done in a way that we can still legalize as many cases as we currently do. If not, we will break someone.

Yes my first suggestion was probably do it as a flag like the #7996 PR which also changes previous behaviour, but takes care to avoid changing the old default behaviour.

Unrolling can significantly impact runtime performance, and the impact is often unpredictable. There are some drivers that I have see that do not do much unrolling in their drivers, and having spir-v do the unrolling is important for runtime performance. We cannot change the unrolling pass in the performance passes.

We know that not every driver is Mesa and doesn't optimize fully and are currently running benchmarks for performance regressions.

However I am of the opinion that:

  1. if something is marked up explicitly as [loop] or the equivalent SPIR-V hint it should never unroll even with -O3 or if that breaks legalization
  2. things without [unroll] shouldn't unroll unless: their body codegen is illegal or that SPIR-V optimizer pass has been explicitly called for

TL;DR the performance passes should only be allowed to unroll loops without [loop], but thats a thing to address in SPIR-V Tools.

There's obviously the problem of if-statements, switches, constrant propagation, etc. which can allow turning illegal code legal and they have no attributes to control that.

There's a few examples which in my option are a bit messed up and wouldn't make people cry if a flag disabled them
https://github.com/microsoft/DirectXShaderCompiler/blob/main/docs/SPIRV-Cookbook.rst#multiple-calls-to-a-function

We could modify the unrolling in the legalization passes to look for illegal code in the loop. This could leave many of the loops in your code.

Love the idea, I guess that moves the problem and our impl into SPIR-V Tools ?

The front-end is not the place to determine if an optimization can be skipped. It has too little information and cannot account for how other optimizations will change things. For example, can we correctly legalize the code if we limit ssa rewrite to opaque objects only? Not very likely. That pass is needed to enable all sort of things like scalar replacement. It might work in your code, but I can easily come up with cases where it will not.

Actually if we step back and look at your idea above, an attribute like [[vk::allow_legalization]] and [[vk::disallow_legalization]] plus a flag to control the default for anything unannotated could be a neat design.

If this could percolate to the SPIR-V Spec as an OpDecorate or similar it could be a really powerful hint what to skip analyzing or speculatively unrolling, constant propagate, etc. for legalization within SPIR-V Tools

At the end of the day analyzing whether to unroll or legalize might be more expensive than just unrolling and legalizing, since you can't exaclty and in leess time, know if all the transformations will end up making the code legal before you try them.

My intuition tells me Halting Problem will rear its ugly head at some point if we try this without annotations and some help from the frontend.

DXC is still the producer that knows when a specific lowering path genuinely needs those expensive legalization transforms for correctness, so the optimizer should not have to guess that globally.

@AnastaZIuk is correct here, unrolling for legalization should not kick in without [unroll]

Stuff like

  [unroll]
  for( int j = 0; j < 2; j++ ) {
    if (constant > j) {
      lSBuffer = gSBuffer1;
    } else {
      lSBuffer = gSBuffer2;
    }
    gRWSBuffer[j] = lSBuffer[j];
  }

should legalize fine, but things like

  [loop]
for( int j = 0; j < 2; j++ ) {
  if (constant > j) {
    lSBuffer = gSBuffer1;
  } else {
    lSBuffer = gSBuffer2;
  }
  gRWSBuffer[j] = lSBuffer[j];
}

really shouldn't IMHO

On a side note, do you see any changes in the time it takes drivers to compile the shaders/pipelines if you skip all of the unrolling in spirv-opt?

We'll take our worst shader and let you know.

The main problem for us is hot-reload/iteration time, and majority of the time is spent in SPIR-V Tools. Whereas with a 58k LoC shader with template hell like ours we'd expect to spend majority of the time in Clang's Sema.

Mind you the difference between SPIR-V ResultID upper bound is only 10k vs 16k for the -fcgl - Vd case and -O3, and the SPIR-V is 12k vs 20k opcodes but the compile time difference is an order of magnitude.

We're open for ideas

#s-perron how would you have us solve the problem of "too much SPIR-V spam".

Sidequests

This branch also fixes one separate producer-side correctness bug: vector splat of bool spec constants was being emitted as OpConstantComposite instead of the legal OpSpecConstantComposite.

@s-perron was that done correctly, or is that something you'd want fixed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

3 participants