Skip to content

fix: harden reassociation barriers for fast-math#1276

Merged
serge-sans-paille merged 1 commit intoxtensor-stack:masterfrom
DiamonDinoia:fix/nearbyint-fastmath
Apr 3, 2026
Merged

fix: harden reassociation barriers for fast-math#1276
serge-sans-paille merged 1 commit intoxtensor-stack:masterfrom
DiamonDinoia:fix/nearbyint-fastmath

Conversation

@DiamonDinoia
Copy link
Copy Markdown
Contributor

@DiamonDinoia DiamonDinoia commented Mar 19, 2026

There is a bug in nearbyint -fassociative-math breaks it as it does not define __FAST_MATH__.
Also the barrier used was causing a stack spill.

I centralized a barrier function that we can use everywhere in the code and used it in the places I know it helps.

Now we can just use reassociation_barrier to avoid compiler reordering of instructions.

Let me know if you like the internal API and if you need changes to it. I find that this is the solution that minimizes ifdef boilerplate and allows to dispatch to all archs. (With c++17 this will be simpler).

Cheers,
Marco

@DiamonDinoia DiamonDinoia force-pushed the fix/nearbyint-fastmath branch 5 times, most recently from f9a9992 to 2f9a431 Compare March 23, 2026 19:55
@DiamonDinoia DiamonDinoia changed the title fix: harden reassociation barriers for fast-math nearbyint fix: harden reassociation barriers for fast-math Mar 23, 2026
@DiamonDinoia
Copy link
Copy Markdown
Contributor Author

@serge-sans-paille this is also ready for review.

@DiamonDinoia DiamonDinoia force-pushed the fix/nearbyint-fastmath branch 5 times, most recently from 3685ff0 to c4dec73 Compare March 27, 2026 19:57
@DiamonDinoia
Copy link
Copy Markdown
Contributor Author

I think I can simplify this a bit more. A is not needed anymore if I we go this route.

@DiamonDinoia DiamonDinoia force-pushed the fix/nearbyint-fastmath branch 2 times, most recently from 6d6bc61 to d571f2f Compare March 28, 2026 22:42
template <class T>
XSIMD_INLINE void reassociation_barrier(T& x, const char*) noexcept
{
#if XSIMD_WITH_INLINE_ASM
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we make this empty if we're not under fast-math?

@DiamonDinoia
Copy link
Copy Markdown
Contributor Author

DiamonDinoia commented Mar 29, 2026 via email

@serge-sans-paille
Copy link
Copy Markdown
Contributor

Well, I don't think we can say that

__asm__ volatile("" : : "r"(&x) : "memory");

is a no-op. Should we allow for user-defined XSIMD_REASSOCIATIVE which would be forced by __FAST_MATH__ but could also be set by the user?
I don't want fast-math to impact non-fast math code generation.

@DiamonDinoia
Copy link
Copy Markdown
Contributor Author

DiamonDinoia commented Mar 29, 2026

Let me think about this for a second. MSVC does not allow inline asm. I should update the comment there. There might be better alternatives for scalars, WASM, and RISC-V that don't impact performance.

@DiamonDinoia DiamonDinoia force-pushed the fix/nearbyint-fastmath branch 4 times, most recently from d95d1d3 to edba47b Compare March 30, 2026 16:57
@DiamonDinoia
Copy link
Copy Markdown
Contributor Author

DiamonDinoia commented Mar 30, 2026

I tried to have the no op in most cases and the spill asm is guarded with macros. I think is the best solution as users should not worry about FP reordering in practice, except corner cases. We could document those if needed.

@DiamonDinoia
Copy link
Copy Markdown
Contributor Author

Hi @serge-sans-paille what do you think now?

@DiamonDinoia
Copy link
Copy Markdown
Contributor Author

This also supersedes #550 and #551

void test_exp()
{
value_type val(2);
#ifdef __FAST_MATH__
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or XSIMD_REASSOCIATIVE_MATH ?

// emitted when the compiler can actually reassociate.
template <class T>
XSIMD_INLINE void reassociation_barrier(T& x, const char*) noexcept
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really would rather have the whole body be guarded by XSIMD_REASSOCIATIVE_MATH.

I would love having a section on fast-math in the manual, where the documentation of that variable would be showcased. Can be in a follow-up commit though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this idea. But okay.

@DiamonDinoia DiamonDinoia force-pushed the fix/nearbyint-fastmath branch from edba47b to 604eb61 Compare April 3, 2026 16:51
- Add zero-cost register constraints for all supported architectures:
  x86 "+x", ARM NEON/SVE "+w", PPC VSX "+wa", RISC-V scalar "+f",
  RISC-V RVV "+vr" (GCC 15+ / Clang 20+).
- Replace old "r"(&x):"memory" fallback with "+m" guarded by new
  XSIMD_REASSOCIATIVE_MATH macro so unknown targets only spill when
  the compiler can actually reassociate.
- Add XSIMD_REASSOCIATIVE_MATH config macro, auto-detected from
  __FAST_MATH__ / __ASSOCIATIVE_MATH__, user-overridable for Clang
  with standalone -fassociative-math.
- Add std::array overload so emulated batches get per-element
  barriers instead of spilling the whole array.
- Add missing barriers in exp/exp2/exp10 range reduction (float and
  double) after nearbyint() to prevent compensated subtraction
  reordering.
- Add missing barriers in log2 (float and double) after to_float(k)
  to protect Kahan compensated summation.
- guard reassociation_barrier entirely by XSIMD_REASSOCIATIVE_MATH
@DiamonDinoia DiamonDinoia force-pushed the fix/nearbyint-fastmath branch from 604eb61 to 70f7493 Compare April 3, 2026 16:53
@serge-sans-paille serge-sans-paille merged commit 4550332 into xtensor-stack:master Apr 3, 2026
70 of 71 checks passed
@serge-sans-paille
Copy link
Copy Markdown
Contributor

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants