JIT: opportunistically lower `value & ((1 << k) - 1)` to BMI2 BZHI

> [!NOTE]
> This issue was drafted with Copilot CLI assistance.

The JIT already recognizes most BMI1/BMI2 scalar patterns opportunistically (`andn`, `blsi`, `blsr`, `blsmsk`, `shlx`/`shrx`/`sarx`, `rorx`, `mulx`, `tzcnt`, `lzcnt`, `popcnt`), but the BMI2 `bzhi` instruction is only emitted when user code explicitly calls `Bmi2.ZeroHighBits`. The natural C# expression `value & ((1 << k) - 1)` — used wherever code needs to keep the low `k` bits of a value with a variable count — still lowers to a 4-instruction sequence even on hardware that has BZHI.

Current codegen for `value & ((1u << k) - 1)`:
```asm
mov      r,  1
shlx     r,  r,  k        ; (or shl r, cl without BMI2)
dec      r
and      value, r
```

Possible BZHI codegen (with a safety mask to preserve IL shift semantics when `k >= operandSize`):
```asm
and      k,  31           ; (or 63 for 64-bit; skippable if IntegralRange proves k in range)
bzhi     value, value, k
```

### Semantic note

BZHI reads only bits[7:0] of its index operand. When that value is `>= operandSize`, BZHI returns the source unchanged. The IL shift, however, masks the count to `(operandSize - 1)` bits, so `(1 << 32) - 1` evaluates to `0` on a 32-bit shift — meaning `value & ((1 << k) - 1) == 0` for `k = 32`, while `bzhi(value, 32) == value`. To preserve semantics we either need to mask `k` explicitly or prove `k` is in `[0, operandSize)` from range analysis.

### Prototype

Branch: https://github.com/AndyAyersMS/runtime/tree/fix-bzhi-recognition

Adds `Lowering::TryLowerAndOpToZeroHighBits` in `lowerxarch.cpp` that:
- Recognizes `AND(value, SUB(LSH(1, k), 1))` and the morph-canonicalized `ADD(..., -1)` form (commutatively).
- Inserts an explicit `AND(k, operandSize - 1)` unless the IR pattern `AND(k_inner, const)` with `const < operandSize` proves the count is bounded (covers the common case where `LowerShift` has already stripped morph's `& 31`/`& 63` from the shift count).
- Bails out if anything depends on flags from the original `AND`/`SUB`/`LSH`.

### SPMI x64 results (805,338 contexts; collections: aspnet2, benchmarks.run, benchmarks.run_pgo, realworld.run, libraries.pmi, libraries.crossgen2)

| Collection | Contexts | Δ bytes | Δ PerfScore in diffs |
|---|--:|--:|--:|
| benchmarks.run | 7 | −46 | +0.13% |
| realworld.run | 4 | −32 | +0.05% |
| libraries.pmi | 16 | −104 | **−3.07%** |
| libraries.crossgen2 | 9 | −56 | −0.10% |
| **total** | **36** | **−238** | net win |

All 36 diff contexts are size improvements; **0 size regressions**. Of 36 affected contexts: 33 PerfScore improvements (down to −25%, −8.93% on hot methods), 3 small PerfScore regressions (≤ +1.36%) all attributable to the JIT perf model preferring `and mem, reg` (RMW) over `bzhi mem; mov mem` (load + bzhi + store) — actual µarch behavior is essentially identical.

### Sample diffs

**`System.Collections.BitArray.ClearHighExtraBits` (realworld):**
```diff
-       mov      edx, 1
-       shlx     eax, edx, eax
-       dec      eax
-       and      dword ptr [rcx], eax
+       and      eax, 31
+       bzhi     eax, dword ptr [rcx], eax
+       mov      dword ptr [rcx], eax
```

**`RealParser.AssembleFloatingPointValue` (libraries.pmi, 64-bit BZHI):**
```diff
-       mov      ecx, 1
-       shlx     rcx, rcx, rax
-       dec      rcx
-       and      rcx, rsi
-       mov      rsi, rcx
+       and      eax, 63
+       bzhi     rsi, rsi, rax
```

**`BFloat16.RoundMidpointToEven<int>` (libraries.pmi, −8.93% PerfScore on this method):**
```diff
-       mov      r8d, 1
-       shlx     r8d, r8d, eax
-       dec      r8d
-       and      r8d, ecx
+       mov      r8d, eax
+       and      r8d, 31
+       bzhi     r8d, ecx, r8d
```

**`V8.Crypto.BigInteger.fromString` (benchmarks.run):**
```diff
-       mov      r9d, 1
-       shlx     r10d, r9d, r10d
-       dec      r10d
-       and      r10d, eax
+       and      r10d, 31
+       bzhi     r10d, eax, r10d
```

### Real-world hits
`BitArray.ClearHighExtraBits`, `BigInteger.{fromString, fromByteArray, toString, modPow, toByteArray}`, `BFloat16.RoundMidpointToEven`, `RealParser.{ConvertDecimalToFloatingPointBits, AssembleFloatingPointValue}`, `InflaterManaged.DecodeBlock`, `InputBuffer.GetBits`, `DeflaterHuffman.CompressBlock`, `RegularExpressions.Symbolic.BitVector.ClearRemainderBits`, `AsnWriter.CheckValidLastByte`, `Microsoft.CodeAnalysis.CSharp` symbol-flag helpers.

### Potential follow-ups
1. Suppress the safety `and k, 31`/`and k, 63` when `IntegralRange` can prove the count is in range (e.g., when `LowerShift` strips a morph-inserted mask that BZHI could have used).
2. Decide whether `bzhi mem; mov mem` vs `and mem, reg` (RMW) should be preferred for the few mem-RMW shapes — possibly skip the transform when the value is a contained memory operand that would otherwise be an `and mem` RMW.
3. Consider extending the recognition to the `~((-1) << k) & value` and `value << (size - k) >> (size - k)` formulations.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: opportunistically lower `value & ((1 << k) - 1)` to BMI2 BZHI #129368

Semantic note

Prototype

SPMI x64 results (805,338 contexts; collections: aspnet2, benchmarks.run, benchmarks.run_pgo, realworld.run, libraries.pmi, libraries.crossgen2)

Sample diffs

Real-world hits

Potential follow-ups

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Collection	Contexts	Δ bytes	Δ PerfScore in diffs
benchmarks.run	7	−46	+0.13%
realworld.run	4	−32	+0.05%
libraries.pmi	16	−104	−3.07%
libraries.crossgen2	9	−56	−0.10%
total	36	−238	net win

JIT: opportunistically lower value & ((1 << k) - 1) to BMI2 BZHI #129368

Description

Semantic note

Prototype

SPMI x64 results (805,338 contexts; collections: aspnet2, benchmarks.run, benchmarks.run_pgo, realworld.run, libraries.pmi, libraries.crossgen2)

Sample diffs

Real-world hits

Potential follow-ups

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

JIT: opportunistically lower `value & ((1 << k) - 1)` to BMI2 BZHI #129368