Skip to content

JIT: SupportsSettingZeroFlag xarch path should query instruction tables instead of hardcoding op lists #129346

@AndyAyersMS

Description

@AndyAyersMS

Background

In #129288 (fixed by the minimal removal of GT_ROL/GT_ROR from the
shift group), @tannergooding pointed out
that hardcoded op→flag lists in GenTree::SupportsSettingZeroFlag()
duplicate information that's already accurately tracked in the
xarch instruction tables, and risk drifting from reality:

We notably have this info in the instruction tables. For example, see
instrsxarch.h#L1258-L1260,
where ror is tracked as Undefined_OF | Writes_CF which means
other flags (like ZF) are left "as is".

It might be goodness to have most of the trivial arithmetic ops like
this rather query the instruction codegen would emit and lookup the
actual flags, rather than hardcode them all. Particularly because
that may get out of sync, be missed when we fixup the general
tables, or in the future if/when we change the emit strategy for a
given node kind.

Codegen itself typically has such a mapping of Oper -> ins already
and while there is some nuance, the places we typically care about
such queries (lowering) should know statically what codegen will
emit (as this is required for containment and other LSRA required
flags to be correct)

This issue tracks doing that refactor for the xarch path of
SupportsSettingZeroFlag().

Current state (after #129288 fix)

#if defined(TARGET_XARCH)
    if (OperIs(GT_LSH, GT_RSH, GT_RSZ))
    {
        // Shift instructions do not update the flags in case of count being zero.
        return gtGetOp2()->IsNeverZero();
    }
    // Note: GT_ROL / GT_ROR are intentionally NOT in this list. On xarch,
    // ROL/ROR/RCL/RCR only update CF (and OF in the 1-bit form); SF/ZF/AF/PF
    // are not affected for any count.

    if (OperIs(GT_AND, GT_OR, GT_XOR, GT_ADD, GT_SUB, GT_NEG))
    {
        return true;
    }

#ifdef FEATURE_HW_INTRINSICS
    if (OperIs(GT_HWINTRINSIC) && emitter::DoesWriteZeroFlag(HWIntrinsicInfo::lookupIns(AsHWIntrinsic(), nullptr)))
    {
        return true;
    }
#endif
#endif

The hardcoded scalar list (GT_AND, GT_OR, GT_XOR, GT_ADD, GT_SUB, GT_NEG) and the separate shift list still mirror data that already
exists in instrsxarch.h. The GT_HWINTRINSIC branch already shows
the cleaner pattern.

Building blocks already in tree

  • instrsxarch.h carries accurate per-instruction insFlags
    (Writes_ZF, Undefined_OF, Resets_OF, etc.).
  • emitter::DoesWriteZeroFlag(instruction ins) in
    emitxarch.cpp
    already queries it.
  • CodeGen::genGetInsForOper(genTreeOps oper, var_types type) in
    codegenxarch.cpp
    already maps GT_ADD, GT_AND, GT_LSH, GT_MUL, GT_NEG, GT_NOT, GT_OR, GT_ROL, GT_ROR, GT_RSH, GT_RSZ, GT_SUB, GT_XOR, ... to the right
    instruction. It uses no instance state, so making it static is
    straightforward.

Proposed shape

#if defined(TARGET_XARCH)
    // For arith opers whose lowering emits a single primary instruction
    // we can name statically, defer to the instruction tables.
    if (OperIs(GT_ADD, GT_AND, GT_OR, GT_XOR, GT_SUB, GT_NEG,
               GT_LSH, GT_RSH, GT_RSZ, GT_ROL, GT_ROR))
    {
        var_types type = TypeGet();
        if (!varTypeIsFloating(type) && !varTypeIsSIMD(type))
        {
            instruction ins = CodeGen::GetIns_FromOper(OperGet(), type);
            if (ins != INS_invalid && !emitter::DoesWriteZeroFlag(ins))
            {
                return false;
            }
            // Shifts only write flags when count != 0 (Intel SDM).
            // The flag tables can't express the count-dependent
            // condition; encode it here.
            if (OperIs(GT_LSH, GT_RSH, GT_RSZ))
            {
                return gtGetOp2()->IsNeverZero();
            }
            return true;
        }
    }
#ifdef FEATURE_HW_INTRINSICS
    if (OperIs(GT_HWINTRINSIC) && emitter::DoesWriteZeroFlag(HWIntrinsicInfo::lookupIns(AsHWIntrinsic(), nullptr)))
    {
        return true;
    }
#endif
#endif

Two variants for getting the Oper -> ins mapping callable from
gentree.cpp:

  • 2a: Refactor genGetInsForOper into a static helper
    (CodeGen::GetIns_FromOper) since it uses no instance state.
  • 2b: Add a small local GetXArchArithIns(genTreeOps, var_types)
    in instr.cpp covering just the arith opers we care about,
    returning INS_invalid otherwise. Avoids the codegen.h API churn at
    the cost of a small duplicated table.

Note on shift semantics

The instruction tables don't model the Intel SDM "if count is 0 the
flags are unchanged" rule for shl/shr/sar — they show
Writes_ZF unconditionally. The existing
gtGetOp2()->IsNeverZero() guard must be preserved for the shift
case; the table query gates on DoesWriteZeroFlag(ins) being true,
but the count-dependent special case still requires the IsNeverZero
check. This is the only non-trivial nuance in the proposed refactor.

Outcome

  • Removes the hardcoded GT_AND/GT_OR/GT_XOR/GT_ADD/GT_SUB/GT_NEG
    scalar list (and the shift list) in favor of a single
    table-driven check.
  • Catches future drift automatically when the instruction tables
    change.
  • Aligns the scalar path with the already-existing GT_HWINTRINSIC
    pattern in the same function.

cc @tannergooding @jakobbotsch @EgorBo (last touched
SupportsSettingZeroFlag in #117960)

Metadata

Metadata

Assignees

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions