Signal when SPIR-V legalization needs targeted cleanup#8283
Signal when SPIR-V legalization needs targeted cleanup#8283AnastaZIuk wants to merge 2 commits intomicrosoft:mainfrom
Conversation
You can test this locally with the following command:git-clang-format --diff 54afd2c6a5f1a5bcbef66967515fa69f5a3650fd 9ac57982725c4264348b20ad3fb271acc0993754 -- tools/clang/lib/Frontend/Rewrite/RewriteObjC.cpp tools/clang/lib/SPIRV/SpirvEmitter.cpp tools/clang/lib/SPIRV/SpirvEmitter.hView the diff from clang-format here.diff --git a/tools/clang/lib/SPIRV/SpirvEmitter.cpp b/tools/clang/lib/SPIRV/SpirvEmitter.cpp
index 4a616a00..0920fc1c 100644
--- a/tools/clang/lib/SPIRV/SpirvEmitter.cpp
+++ b/tools/clang/lib/SPIRV/SpirvEmitter.cpp
@@ -593,8 +593,7 @@ SpirvEmitter::SpirvEmitter(CompilerInstance &ci)
constEvaluator(astContext, spvBuilder), entryFunction(nullptr),
curFunction(nullptr), curThis(nullptr), seenPushConstantAt(),
isSpecConstantMode(false), needsLegalization(false),
- needsLegalizationLoopUnroll(false),
- needsLegalizationSsaRewrite(false),
+ needsLegalizationLoopUnroll(false), needsLegalizationSsaRewrite(false),
beforeHlslLegalization(false), mainSourceFile(nullptr) {
// Get ShaderModel from command line hlsl profile option.
@@ -957,8 +956,7 @@ void SpirvEmitter::HandleTranslationUnit(ASTContext &context) {
!dsetbindingsToCombineImageSampler.empty() ||
spirvOptions.signaturePacking;
needsLegalizationSsaRewrite =
- needsLegalizationSsaRewrite ||
- !dsetbindingsToCombineImageSampler.empty();
+ needsLegalizationSsaRewrite || !dsetbindingsToCombineImageSampler.empty();
// Run legalization passes
if (spirvOptions.codeGenHighLevel) {
@@ -16679,9 +16677,9 @@ bool SpirvEmitter::spirvToolsLegalize(std::vector<uint32_t> *mod,
} else if (needsLegalizationSsaRewrite) {
legalizationSsaRewriteMode = spvtools::SSARewriteMode::OpaqueOnly;
}
- optimizer.RegisterLegalizationPasses(
- spirvOptions.preserveInterface, needsLegalizationLoopUnroll,
- legalizationSsaRewriteMode);
+ optimizer.RegisterLegalizationPasses(spirvOptions.preserveInterface,
+ needsLegalizationLoopUnroll,
+ legalizationSsaRewriteMode);
// Add flattening of resources if needed.
if (spirvOptions.flattenResourceArrays) {
optimizer.RegisterPass(
|
s-perron
left a comment
There was a problem hiding this comment.
I don't think we will be able to accept these changes. I understand the place this is coming from. Optimization take time, and if we do lots of unrolling it creates lots of extra code that needs to be optimized.
The problem is that we have to generate legal code, and there are cases where these optimizations have proven necessary.
As a simple example, this PR changes the expected behaviour of on of our example from the SPIR-V cookbook.
We can try to work on alternatives, but they have to be done in a way that we can still legalize as many cases as we currently do. If not, we will break someone.
Things that cannot happen:
-
The front-end is not the place to determine if an optimization can be skipped. It has too little information and cannot account for how other optimizations will change things. For example, can we correctly legalize the code if we limit ssa rewrite to opaque objects only? Not very likely. That pass is needed to enable all sort of things like scalar replacement. It might work in your code, but I can easily come up with cases where it will not.
-
Unrolling can significantly impact runtime performance, and the impact is often unpredictable. There are some drivers that I have see that do not do much unrolling in their drivers, and having spir-v do the unrolling is important for runtime performance. We cannot change the unrolling pass in the performance passes.
Potential ideas:
- We could modify the unrolling in the legalization passes to look for illegal code in the loop. This could leave many of the loops in your code.
- You can disable the performance passs, or replace them with your own set of passes to run for the optimization phase. See here.
- If you want even more control, you can dump the code before legalization (-fcgl), and call the optimizer yourself with the passes you want.
These are ideas for limiting the unrolling. I don't think there is a way to limit the ssa rewrite pass the way you want. However, I don't think that is a good idea. It could impact too many other passes.
On a side note, do you see any changes in the time it takes drivers to compile the shaders/pipelines if you skip all of the unrolling in sirv-opt?
| // CHECK-NEXT: [[ptr:%[0-9]+]] = OpAccessChain %_ptr_Uniform_S %gRWSBuffer | ||
| // CHECK-NEXT: OpStore [[ptr]] [[val]] | ||
| // CHECK: OpLoopMerge {{%[0-9]+}} {{%[0-9]+}} Unroll | ||
| // CHECK: [[which:%[0-9]+]] = OpSelect %_ptr_Uniform_type_StructuredBuffer_S {{%[0-9]+}} %gSBuffer1 %gSBuffer2 |
There was a problem hiding this comment.
This is not legal. We have to unroll in this case.
There was a problem hiding this comment.
yep, it would only be legal with SPV_KHR_variable_pointers if the gSBuffer pointers were of Shader Storage Class.
However I could go out on a limb and claim that it was never supposed to be legal without [[unroll]] on the loop ( I know the example has unroll so in this case SPIR-V codegen shouldn't change)
Yes my first suggestion was probably do it as a flag like the #7996 PR which also changes previous behaviour, but takes care to avoid changing the old default behaviour.
We know that not every driver is Mesa and doesn't optimize fully and are currently running benchmarks for performance regressions. However I am of the opinion that:
TL;DR the performance passes should only be allowed to unroll loops without There's obviously the problem of if-statements, switches, constrant propagation, etc. which can allow turning illegal code legal and they have no attributes to control that. There's a few examples which in my option are a bit messed up and wouldn't make people cry if a flag disabled them
Love the idea, I guess that moves the problem and our impl into SPIR-V Tools ?
Actually if we step back and look at your idea above, an attribute like If this could percolate to the SPIR-V Spec as an At the end of the day analyzing whether to unroll or legalize might be more expensive than just unrolling and legalizing, since you can't exaclty and in leess time, know if all the transformations will end up making the code legal before you try them. My intuition tells me Halting Problem will rear its ugly head at some point if we try this without annotations and some help from the frontend.
@AnastaZIuk is correct here, unrolling for legalization should not kick in without Stuff like [unroll]
for( int j = 0; j < 2; j++ ) {
if (constant > j) {
lSBuffer = gSBuffer1;
} else {
lSBuffer = gSBuffer2;
}
gRWSBuffer[j] = lSBuffer[j];
}should legalize fine, but things like [loop]
for( int j = 0; j < 2; j++ ) {
if (constant > j) {
lSBuffer = gSBuffer1;
} else {
lSBuffer = gSBuffer2;
}
gRWSBuffer[j] = lSBuffer[j];
}really shouldn't IMHO
We'll take our worst shader and let you know. The main problem for us is hot-reload/iteration time, and majority of the time is spent in SPIR-V Tools. Whereas with a 58k LoC shader with template hell like ours we'd expect to spend majority of the time in Clang's Sema. Mind you the difference between SPIR-V ResultID upper bound is only 10k vs 16k for the We're open for ideas#s-perron how would you have us solve the problem of "too much SPIR-V spam". Sidequests
@s-perron was that done correctly, or is that something you'd want fixed? |
Summary
SpirvEmitterbits for legalization-time loop unroll and legalization-time SSA rewriteexternal/SPIRV-Headersandexternal/SPIRV-Toolssubmodule pointers to the matching branch stateSPIRV-Toolslegalization API and the current MSVCgetFunctionTypesignatureCodeGenSPIRVchecks to match the accepted SPIR-V shapeRoot cause
The expensive blanket policy lives in the SPIR-V optimizer recipe, not in HLSL parsing or the DXC CLI front-end.
DXC is still the producer that knows when a specific lowering path genuinely needs those expensive legalization transforms for correctness, so the optimizer should not have to guess that globally.
Two confirmed correctness-sensitive cases are:
ConstOffsetThis branch also fixes one separate producer-side correctness bug: vector splat of bool spec constants was being emitted as
OpConstantCompositeinstead of the legalOpSpecConstantComposite.The same narrowing also exposed one existing image-atomic cleanup dependency. The matching
LocalSingleStoreElimfollow-up lives in the companion KhronosGroup/SPIRV-Tools#6612 branch and is rolled here through the updated submodule pointer.Spec basis
The hard legality rules here are much narrower than the old blanket cleanup.
Spec:
The involved types are explicitly opaque:
Spec:
For the image-offset case, the relevant SPIR-V rule is also explicit:
Spec:
For the retained-loop case,
OpSelectis also explicitly allowed on pointer-typed objects:Spec:
For the image-atomic follow-up,
OpImageTexelPointeris also explicit:Spec:
For the spec-constant splat fix, the composite rules are also explicit:
Spec:
Vulkan tightens this further:
Spec:
Benchmark
19.161 s -> 3.02282 s(median of 3 runs)Validation
CodeGenSPIRVon this branch pair passes with1438expected passes,2expected failures, and0unexpected failuresUpdated test notes
vk.spec-constant.reuse.hlsl: the old check expectedOpConstantCompositefrom a bool spec-constant splat. That shape is invalid. This branch now emits the legalOpSpecConstantComposite.15-loop-var-unroll-ok.hlsl: the old check implicitly depended on eager optimizer-side full unroll. This branch keepsLoopControl::Unrollin the IR and uses a legal pointerOpSelectpath instead.cast.flat-conversion.matrix.hlsl: the old check assumed the matrix-to-struct path stayed as pre-folded constant composites. This branch rebuilds the target values from the loaded matrix viaOpCompositeExtractandOpCompositeConstruct. The stored value is the same and the row-major or column-major decoration still carries the memory layout rule.type.rwstructured-buffer.array.counter.indirect.hlslandtype.rwstructured-buffer.array.counter.indirect2.hlsl: the old checks assumed a single flattened access chain directly to the counter or data element. This branch keeps an intermediate structured access chain and then indexes the member. The resource, binding, atomic, and data access semantics are unchanged.vk.binding.global-struct-of-resources.optimized.hlsl: the old regex overconstrained generated ids. This branch preserves the same resource names and bindings but uses different legal ids, so the test now checks the stable semantic parts instead.max_id.hlsl: the old check overconstrained an exactBoundvalue. This branch still exercises the same larger-limit success path, but the exact bound is no longer stable after the accepted optimizer recipe changes.Companion SPIR-V optimizer PR:
KhronosGroup/SPIRV-Tools#6612