Skip to content

JIT: extend OptimizeConstCompare to remove TYP_BYTE casts#129361

Open
AndyAyersMS wants to merge 1 commit into
dotnet:mainfrom
AndyAyersMS:fix-10337-typ-byte-compare
Open

JIT: extend OptimizeConstCompare to remove TYP_BYTE casts#129361
AndyAyersMS wants to merge 1 commit into
dotnet:mainfrom
AndyAyersMS:fix-10337-typ-byte-compare

Conversation

@AndyAyersMS

Copy link
Copy Markdown
Member

For xarch, mirror the existing TYP_UBYTE branch when the const fits in INT8. Lets (sbyte)x < -64 lower to cmp cl, 0xC0; setl instead of movsx rax, cl; cmp eax, -64; setl. Skip the transform for op2 == 0 with GT_LT/GT_GE: codegen's sign-bit-shift trick assumes the operand is sign-extended to full register width.

Fixes #10337.

For xarch, mirror the existing TYP_UBYTE branch when the const fits in
INT8. Lets `(sbyte)x < -64` lower to `cmp cl, 0xC0; setl` instead of
`movsx rax, cl; cmp eax, -64; setl`. Skip the transform for op2 == 0
with GT_LT/GT_GE: codegen's sign-bit-shift trick assumes the operand
is sign-extended to full register width.

Fixes dotnet#10337.
Copilot AI review requested due to automatic review settings June 13, 2026 02:07
@github-actions github-actions Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 13, 2026
@AndyAyersMS

Copy link
Copy Markdown
Member Author

@dotnet/jit-contrib PTAL

@dotnet-policy-service

Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends CoreCLR xarch JIT lowering so Lowering::OptimizeConstCompare can remove TYP_BYTE casts when comparing against constants that fit in INT8, enabling byte-sized compares (avoiding an extra sign-extension) while preserving correctness for the x < 0 / x >= 0 zero-compare codegen special-case. It also adds a JIT regression test covering the relevant signed-byte comparison patterns.

Changes:

  • Extend Lowering::OptimizeConstCompare (xarch) with a new TYP_BYTE cast-removal path gated on FitsIn<INT8>(const) and excluding const == 0 with LT/GE.
  • Add a new JIT regression test (Runtime_10337) that validates semantics for various signed-byte constant compare patterns (including memory and truncating sources).
  • Wire the new test into the merged regression test project (Regression_ro_2.csproj).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/coreclr/jit/lower.cpp Adds an xarch-only TYP_BYTE cast-removal case in OptimizeConstCompare, with a correctness bailout for the LT/GE compare-to-zero sign-bit optimization.
src/tests/JIT/Regression/Regression_ro_2.csproj Includes the new Runtime_10337 test source in the regression project.
src/tests/JIT/Regression/JitBlue/Runtime_10337/Runtime_10337.cs New regression test validating behavior for signed-byte constant compares (including truncation and memory-load shapes).

[MethodImpl(MethodImplOptions.NoInlining)]
private static bool Lt_And(int x) => ((sbyte)(x & 0xF0)) < -16;

// Memory operand: the comparison should contain the load.
@tannergooding

Copy link
Copy Markdown
Member

I'm not actually sure this is better here, due to how Intel and AMD tend to treat operations on partial registers. This is particularly true if the small value comes from memory.

I think it would be good to benchmark this and/or consult our friends at Intel to determine what the recommended codegen here is.

@tannergooding tannergooding left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look correct to me, but it's probably goodness to dig a little deeper into whether or not this is actually an optimization or not.

@AndyAyersMS

Copy link
Copy Markdown
Member Author

The changes look correct to me, but it's probably goodness to dig a little deeper into whether or not this is actually an optimization or not.

All the diffs are basically variants of

image

@tannergooding

Copy link
Copy Markdown
Member

All the diffs are basically variants of

Right. The question is really whether or not using a small register as part of a test/compare is ok (I think it is) or if it has a penalty as compared to just doing the sign extension in the first place.

The rules have changed here quite a lot over time and while I think modern processors are fine for this case since it doesn't use the upper bits and isn't further modifying the register, there are many documented cases where using the 8/16-bit form can and will force a delay because the CPU functionally ends up having to insert the "merge with upper 24/16-bits" logic, which it elides if the instruction which writes the small register immediately zero/sign extends the value instead. -- This is notably a feature of APX as well, allowing the extension to be part of the original instruction and fully removing the merge semantics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Suboptimal codegen when comparing sbyte and 16-bit values

3 participants