Narrow blanket SPIR-V loop unroll in optimizer recipes by AnastaZIuk · Pull Request #2 · Devsh-Graphics-Programming/SPIRV-Tools

AnastaZIuk · 2026-03-20T17:53:38Z

Summary

narrow legalization-time full loop unroll behind an explicit opt-in overload
narrow legalization-time SSA rewrite behind an explicit opt-in overload
stop materializing blanket full loop unroll in the default performance recipe
replace the two heavy global redundancy elimination passes in the default performance recipe with local redundancy elimination
remove several additional blanket cleanup steps from the legalization tail when the generic path does not need them
add a dedicated -O1experimental fast-path cleanup that trims stale VariablePointers / VariablePointersStorageBuffer capabilities without changing the shared default trim path
use the DXC trunk Godbolt reproducer, which is the preprocessed output of our path tracer at about 58k LoC

Root cause

The current SPIR-V optimizer recipes still carry two old blanket unroll decisions:

9fbcce4ca17d added full loop unroll to legalization passes on 2018-09-19
3c47dac28208 added full loop unroll to performance passes on 2020-05-20

On a large preprocessed HLSL payload with many small [unroll] loops this inflates the SPIR-V module far more than necessary and then pays for expensive cleanup over that self-inflated IR.

LoopControl::Unroll as an IR hint is not the problem. The expensive part is treating that hint as a blanket request to immediately materialize full unroll in the generic optimizer path even when legality does not require it.

A similar issue existed in the legalization tail. Some cleanup passes were effectively historical safety hammers rather than semantically required defaults. Narrowing them keeps the generic path correct while removing a large amount of unnecessary work.

A separate follow-up issue showed up in the dedicated -O1experimental fast path: the final module could still carry explicit VariablePointers / VariablePointersStorageBuffer declarations even after the optimized IR no longer contained the pointer forms that require them. In the failing EX37 sampler shader the final SPIR-V still had only scalar OpSelect %float, with no pointer OpSelect and no pointer OpPhi, so the module remained validator-legal. Removing only those stale capability lines fixed the downstream runtime regression. This follow-up is intentionally isolated to a dedicated fast-path pass so the shared default trim behavior remains unchanged.

DXC has the producer-side lowering context and knows when a specific HLSL pattern still requires materialized loop unroll or legalize-time SSA rewrite for correctness. The companion DXC patch in microsoft/DirectXShaderCompiler#8283 supplies that narrower signal, and its current branch head also materializes the companion SPIR-V submodule pointers.

Validation

reproducer: godbolt.org/z/o5xf1hq36 (note: Compiler Explorer cache can make repeated runs look much faster than a cold compile)
shader payload: preprocessed output of our path tracer at about 58k LoC
local machine: AMD Ryzen 5 5600G with Radeon Graphics, 6 physical cores, 12 logical processors, Windows-reported max clock 3901 MHz
on the same payload and the same machine, SPIRV-Tools@487ff843bd8a + DXC@bd9a8b1c5365 reduced the workload from 19.161 s to 6.042 s
with SPIRV-Tools@57007cf46bb4 + DXC@b02b772e0b50, the same payload measured 4.702 s
with the current branch pair SPIRV-Tools@f5339a9dd2e2 + DXC@4c5fbdc9c1b9, the same payload still keeps the reduced hot-path compile cost while fixing the EX37 -O1experimental sampler regression
local EX37 validation under source-built DXC on RelWithDebInfo now reports both All sampling concept tests passed. and All sampling tests PASSED.
full local CodeGenSPIRV lit/FileCheck passes with the companion DXC branch: 1403 expected passes, 2 expected failures, 0 unexpected

Companion DXC PR:
microsoft/DirectXShaderCompiler#8283

devshgraphicsprogramming · 2026-03-26T19:04:01Z

Here's my $0.02, instead of wholesale deciding if we're going to unroll and what version of SSA rewrite we'll perform...

We could track this in a bitfield per each OpBranch and OpFunctionCall and propagate those down the IR, so that for every control flow, loop and function call you'd know what sort of inlining, SSA rewriting and unrolling you're allowed to try.

a lot of this we can already infer from LoopControl and Function Control enums of the parent blocks and opcodes contained within, e.g. Const and Pure would propagate up, while Unroll, Inline and DontInline would be rigidly honoured.

For example I would not want a legalization pass to inline a function so that a loop can be unrolled because the invariant is a function parameter.

It could be that an additional decoration is needed for switches, other conditionals (aside from OpSelectionMerge control) and functions to control whether legalization should be attempted or not.

97% of the codegen shouldn't be made to pay for the 3% that needs full inlining, unroll, optimization and constant propagation to become legal.

devshgraphicsprogramming · 2026-03-26T19:09:59Z

I haven't read the code that closely but it seems like loop-unroll is not touching Loop controlled loops, only Unroll and None. Please correct me if I'm wrong.

Then the battle would be for Legalization passes without performance passes (so O0 instead of O3 in DXC) for debug builds of shaders to not do loop unrolls on Loop Control None.

And Similar behavour w.r.t None for Inline Function Control and None for Selection Flatten control.

The O1experimental fast performance path can leave explicit VariablePointers / VariablePointersStorageBuffer declarations in the final module even after the final IR no longer contains the pointer forms that require them. In our EX37 sampler workload the resulting SPIR-V remained legal and the failing shader contained only scalar OpSelect %float instructions, with no pointer OpSelect or pointer OpPhi. Removing only the stale capability lines fixed the downstream runtime corruption. Keep the shared TrimCapabilitiesPass and the default optimizer paths untouched by adding a dedicated TrimVariablePointersCapabilitiesPass and invoking it only at the end of the fast performance recipe. Preserve real Workgroup and StorageBuffer variable-pointer cases with focused tests.

AnastaZIuk mentioned this pull request Mar 20, 2026

Signal when SPIR-V legalization needs loop unroll Devsh-Graphics-Programming/DirectXShaderCompiler#15

Closed

AnastaZIuk force-pushed the unroll branch from 1eb05e5 to 487ff84 Compare March 20, 2026 17:58

AnastaZIuk mentioned this pull request Mar 20, 2026

Signal when SPIR-V legalization needs targeted cleanup microsoft/DirectXShaderCompiler#8283

Open

Narrow blanket SPIR-V legalization work in optimizer recipes

7c4d322

AnastaZIuk force-pushed the unroll branch from 7134be5 to 7c4d322 Compare March 22, 2026 07:12

Handle image texel pointers in local single-store elim

2a730e1

AnastaZIuk added 6 commits March 28, 2026 23:04

Add O1experimental fast compile recipe

0ecbcc9

Restore default performance recipe

4fce38c

Split fast compile legalization from defaults

5bc9ddf

Limit O1experimental SSA rewrite scope

f5e5c73

Tighten O1experimental legality cleanup

9d209ea

AnastaZIuk closed this Mar 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Narrow blanket SPIR-V loop unroll in optimizer recipes#2

Narrow blanket SPIR-V loop unroll in optimizer recipes#2
AnastaZIuk wants to merge 8 commits into
mainfrom
unroll

AnastaZIuk commented Mar 20, 2026 •

edited

Loading

Uh oh!

devshgraphicsprogramming commented Mar 26, 2026 •

edited

Loading

Uh oh!

devshgraphicsprogramming commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AnastaZIuk commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Validation

Uh oh!

devshgraphicsprogramming commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devshgraphicsprogramming commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AnastaZIuk commented Mar 20, 2026 •

edited

Loading

devshgraphicsprogramming commented Mar 26, 2026 •

edited

Loading