JIT: extend OptimizeConstCompare to remove TYP_BYTE casts#129361
JIT: extend OptimizeConstCompare to remove TYP_BYTE casts#129361AndyAyersMS wants to merge 1 commit into
Conversation
For xarch, mirror the existing TYP_UBYTE branch when the const fits in INT8. Lets `(sbyte)x < -64` lower to `cmp cl, 0xC0; setl` instead of `movsx rax, cl; cmp eax, -64; setl`. Skip the transform for op2 == 0 with GT_LT/GT_GE: codegen's sign-bit-shift trick assumes the operand is sign-extended to full register width. Fixes dotnet#10337.
|
@dotnet/jit-contrib PTAL |
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Pull request overview
This PR extends CoreCLR xarch JIT lowering so Lowering::OptimizeConstCompare can remove TYP_BYTE casts when comparing against constants that fit in INT8, enabling byte-sized compares (avoiding an extra sign-extension) while preserving correctness for the x < 0 / x >= 0 zero-compare codegen special-case. It also adds a JIT regression test covering the relevant signed-byte comparison patterns.
Changes:
- Extend
Lowering::OptimizeConstCompare(xarch) with a newTYP_BYTEcast-removal path gated onFitsIn<INT8>(const)and excludingconst == 0withLT/GE. - Add a new JIT regression test (
Runtime_10337) that validates semantics for various signed-byte constant compare patterns (including memory and truncating sources). - Wire the new test into the merged regression test project (
Regression_ro_2.csproj).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/coreclr/jit/lower.cpp | Adds an xarch-only TYP_BYTE cast-removal case in OptimizeConstCompare, with a correctness bailout for the LT/GE compare-to-zero sign-bit optimization. |
| src/tests/JIT/Regression/Regression_ro_2.csproj | Includes the new Runtime_10337 test source in the regression project. |
| src/tests/JIT/Regression/JitBlue/Runtime_10337/Runtime_10337.cs | New regression test validating behavior for signed-byte constant compares (including truncation and memory-load shapes). |
| [MethodImpl(MethodImplOptions.NoInlining)] | ||
| private static bool Lt_And(int x) => ((sbyte)(x & 0xF0)) < -16; | ||
|
|
||
| // Memory operand: the comparison should contain the load. |
|
I'm not actually sure this is better here, due to how Intel and AMD tend to treat operations on partial registers. This is particularly true if the small value comes from memory. I think it would be good to benchmark this and/or consult our friends at Intel to determine what the recommended codegen here is. |
tannergooding
left a comment
There was a problem hiding this comment.
The changes look correct to me, but it's probably goodness to dig a little deeper into whether or not this is actually an optimization or not.
Right. The question is really whether or not using a small register as part of a test/compare is ok (I think it is) or if it has a penalty as compared to just doing the sign extension in the first place. The rules have changed here quite a lot over time and while I think modern processors are fine for this case since it doesn't use the upper bits and isn't further modifying the register, there are many documented cases where using the 8/16-bit form can and will force a delay because the CPU functionally ends up having to insert the "merge with upper 24/16-bits" logic, which it elides if the instruction which writes the small register immediately zero/sign extends the value instead. -- This is notably a feature of APX as well, allowing the extension to be part of the original instruction and fully removing the merge semantics. |

For xarch, mirror the existing TYP_UBYTE branch when the const fits in INT8. Lets
(sbyte)x < -64lower tocmp cl, 0xC0; setlinstead ofmovsx rax, cl; cmp eax, -64; setl. Skip the transform for op2 == 0 with GT_LT/GT_GE: codegen's sign-bit-shift trick assumes the operand is sign-extended to full register width.Fixes #10337.