fix: harden reassociation barriers for fast-math by DiamonDinoia · Pull Request #1276 · xtensor-stack/xsimd

DiamonDinoia · 2026-03-19T20:06:39Z

There is a bug in nearbyint -fassociative-math breaks it as it does not define __FAST_MATH__.
Also the barrier used was causing a stack spill.

I centralized a barrier function that we can use everywhere in the code and used it in the places I know it helps.

Now we can just use reassociation_barrier to avoid compiler reordering of instructions.

Let me know if you like the internal API and if you need changes to it. I find that this is the solution that minimizes ifdef boilerplate and allows to dispatch to all archs. (With c++17 this will be simpler).

Cheers,
Marco

DiamonDinoia · 2026-03-23T20:29:52Z

@serge-sans-paille this is also ready for review.

DiamonDinoia · 2026-03-27T20:33:07Z

I think I can simplify this a bit more. A is not needed anymore if I we go this route.

serge-sans-paille · 2026-03-29T06:54:25Z

+            template <class T>
+            XSIMD_INLINE void reassociation_barrier(T& x, const char*) noexcept
+            {
+#if XSIMD_WITH_INLINE_ASM


Shouldn't we make this empty if we're not under fast-math?

DiamonDinoia · 2026-03-29T15:00:47Z

Well, if we are not under fast math this should essentially be a no op. Also, fast math does not detect if only associative math is enabled . Which also breaks it. Now it becames checking all flags of the compiler that can reorder floating points.

…

On Sunday, March 29, 2026, serge-sans-paille ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In include/xsimd/arch/common/xsimd_common_details.hpp <https://urldefense.com/v3/__https://github.com/xtensor-stack/xsimd/pull/1276?email_source=notifications&email_token=ACGKNQMMCG3APXMEDVTJKC34TDCDPA5CNFSNUABKM5UWIORPF5TWS5BNNB2WEL2QOVWGYUTFOF2WK43UKJSXM2LFO4XTIMBSGY2DINRZGMZ2M4TFMFZW63VGMF2XI2DPOKSWK5TFNZ2K64DSL5ZGK5TJMV3V6Y3MNFRWW*discussion_r3005807424__;Iw!!DSb-azq1wVFtOg!WiXH_0OrBz6JCIv7x-DiGYBuUBKXVDKcuEENC8ic8K5UQ3HRuKK4aVV6FfZKWDQfRzIQGM5ZEcg7sPaZKJwzfq1tKY2y44AK$> : > + // exists at each call site; it is unused at runtime. + // + // Two overloads: + // reassociation_barrier(reg, reason) – raw register + // reassociation_barrier(batch, reason) – extracts .data + // + // Uses the tightest register-class constraint for the target so + // the value stays in its native SIMD register (no spill): + // x86 (SSE/AVX/AVX-512) : "+x" – XMM / YMM / ZMM + // ARM (NEON / SVE) : "+w" – vector / SVE Z-reg + // PPC (VSX) : "+wa" – VS register + // other / MSVC : address + memory clobber (fallback) + template <class T> + XSIMD_INLINE void reassociation_barrier(T& x, const char*) noexcept + { +#if XSIMD_WITH_INLINE_ASM Shouldn't we make this empty if we're not under fast-math? — Reply to this email directly, view it on GitHub <https://urldefense.com/v3/__https://github.com/xtensor-stack/xsimd/pull/1276?email_source=notifications&email_token=ACGKNQPGDROZDUYFFC3B7XL4TDCDPA5CNFSNUABKM5UWIORPF5TWS5BNNB2WEL2QOVWGYUTFOF2WK43UKJSXM2LFO4XTIMBSGY2DINRZGMZ2M4TFMFZW63VGMF2XI2DPOKSWK5TFNZ2L24DSL5ZGK5TJMV3V63TPORUWM2LDMF2GS33OONPWG3DJMNVQ*pullrequestreview-4026446933__;Iw!!DSb-azq1wVFtOg!WiXH_0OrBz6JCIv7x-DiGYBuUBKXVDKcuEENC8ic8K5UQ3HRuKK4aVV6FfZKWDQfRzIQGM5ZEcg7sPaZKJwzfq1tKYc70ZNS$>, or unsubscribe <https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ACGKNQN7OHPQ2TPQJDWPYTL4TDCDPAVCNFSM6AAAAACWYMMMSGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHM2DAMRWGQ2DMOJTGM__;!!DSb-azq1wVFtOg!WiXH_0OrBz6JCIv7x-DiGYBuUBKXVDKcuEENC8ic8K5UQ3HRuKK4aVV6FfZKWDQfRzIQGM5ZEcg7sPaZKJwzfq1tKdMsdfC6$> . You are receiving this because you authored the thread.Message ID: ***@***.***>

serge-sans-paille · 2026-03-29T18:12:39Z

Well, I don't think we can say that

__asm__ volatile("" : : "r"(&x) : "memory");

is a no-op. Should we allow for user-defined XSIMD_REASSOCIATIVE which would be forced by __FAST_MATH__ but could also be set by the user?
I don't want fast-math to impact non-fast math code generation.

DiamonDinoia · 2026-03-29T20:47:05Z

Let me think about this for a second. MSVC does not allow inline asm. I should update the comment there. There might be better alternatives for scalars, WASM, and RISC-V that don't impact performance.

DiamonDinoia · 2026-03-30T17:37:43Z

I tried to have the no op in most cases and the spill asm is guarded with macros. I think is the best solution as users should not worry about FP reordering in practice, except corner cases. We could document those if needed.

DiamonDinoia · 2026-04-03T14:16:18Z

Hi @serge-sans-paille what do you think now?

DiamonDinoia · 2026-04-03T14:20:23Z

This also supersedes #550 and #551

serge-sans-paille · 2026-04-03T16:23:47Z

    void test_exp()
    {
        value_type val(2);
+#ifdef __FAST_MATH__


or XSIMD_REASSOCIATIVE_MATH ?

serge-sans-paille · 2026-04-03T16:25:41Z

+            // emitted when the compiler can actually reassociate.
+            template <class T>
+            XSIMD_INLINE void reassociation_barrier(T& x, const char*) noexcept
+            {


I really would rather have the whole body be guarded by XSIMD_REASSOCIATIVE_MATH.

I would love having a section on fast-math in the manual, where the documentation of that variable would be showcased. Can be in a follow-up commit though.

I don't love this idea. But okay.

- Add zero-cost register constraints for all supported architectures: x86 "+x", ARM NEON/SVE "+w", PPC VSX "+wa", RISC-V scalar "+f", RISC-V RVV "+vr" (GCC 15+ / Clang 20+). - Replace old "r"(&x):"memory" fallback with "+m" guarded by new XSIMD_REASSOCIATIVE_MATH macro so unknown targets only spill when the compiler can actually reassociate. - Add XSIMD_REASSOCIATIVE_MATH config macro, auto-detected from __FAST_MATH__ / __ASSOCIATIVE_MATH__, user-overridable for Clang with standalone -fassociative-math. - Add std::array overload so emulated batches get per-element barriers instead of spilling the whole array. - Add missing barriers in exp/exp2/exp10 range reduction (float and double) after nearbyint() to prevent compensated subtraction reordering. - Add missing barriers in log2 (float and double) after to_float(k) to protect Kahan compensated summation. - guard reassociation_barrier entirely by XSIMD_REASSOCIATIVE_MATH

serge-sans-paille · 2026-04-03T17:33:23Z

Thanks!

DiamonDinoia force-pushed the fix/nearbyint-fastmath branch 5 times, most recently from f9a9992 to 2f9a431 Compare March 23, 2026 19:55

DiamonDinoia changed the title ~~fix: harden reassociation barriers for fast-math nearbyint~~ fix: harden reassociation barriers for fast-math Mar 23, 2026

DiamonDinoia mentioned this pull request Mar 23, 2026

test(s) fail for x86_64 musl #1244

Closed

serge-sans-paille requested changes Mar 27, 2026

View reviewed changes

Comment thread .github/workflows/windows.yml Outdated

Comment thread .github/workflows/windows.yml Outdated

Comment thread .github/workflows/windows.yml Outdated

Comment thread include/xsimd/arch/xsimd_common_fwd.hpp Outdated

DiamonDinoia force-pushed the fix/nearbyint-fastmath branch 5 times, most recently from 3685ff0 to c4dec73 Compare March 27, 2026 19:57

DiamonDinoia mentioned this pull request Mar 27, 2026

ci: add clang-cl Windows CI jobs #1283

Merged

DiamonDinoia force-pushed the fix/nearbyint-fastmath branch 2 times, most recently from 6d6bc61 to d571f2f Compare March 28, 2026 22:42

serge-sans-paille reviewed Mar 29, 2026

View reviewed changes

DiamonDinoia force-pushed the fix/nearbyint-fastmath branch 4 times, most recently from d95d1d3 to edba47b Compare March 30, 2026 16:57

DiamonDinoia requested a review from serge-sans-paille March 30, 2026 17:36

serge-sans-paille reviewed Apr 3, 2026

View reviewed changes

DiamonDinoia force-pushed the fix/nearbyint-fastmath branch from edba47b to 604eb61 Compare April 3, 2026 16:51

DiamonDinoia force-pushed the fix/nearbyint-fastmath branch from 604eb61 to 70f7493 Compare April 3, 2026 16:53

serge-sans-paille merged commit 4550332 into xtensor-stack:master Apr 3, 2026
70 of 71 checks passed

Conversation

DiamonDinoia commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DiamonDinoia commented Mar 23, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DiamonDinoia commented Mar 27, 2026

Uh oh!

serge-sans-paille Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

DiamonDinoia commented Mar 29, 2026 via email

Uh oh!

serge-sans-paille commented Mar 29, 2026

Uh oh!

DiamonDinoia commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DiamonDinoia commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DiamonDinoia commented Apr 3, 2026

Uh oh!

DiamonDinoia commented Apr 3, 2026

Uh oh!

serge-sans-paille Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

serge-sans-paille Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

DiamonDinoia Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

serge-sans-paille commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DiamonDinoia commented Mar 19, 2026 •

edited

Loading

DiamonDinoia commented Mar 29, 2026 •

edited

Loading

DiamonDinoia commented Mar 30, 2026 •

edited

Loading