You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: make AVX-512 conditional; fix std::round→nearbyint for numpy banker's rounding
- Makefile: -mavx512f -mfma now conditional (make AVX512=1), -msse4.1 added
unconditionally for einsum SSE intrinsics
- module.cpp: add _has_avx512_svml() compile-time detection
- test_all.py: use compile-time flag to skip ALL transcendental tests
when SVML bridge not compiled; not just large sizes
- core.h: std::round → std::nearbyint to match numpy's round-half-to-even
(banker's rounding), fixing float32 mismatch at exact .5 values
- README: updated compiler flags section, test count 449→460
Fixes CI SIGILL on non-AVX-512 runners (GitHub Actions ubuntu-22.04).
Without -mavx512f, __AVX512F__ is not defined → SVML bridge uses std::
fallbacks → no AVX-512 intrinsics → safe on any x86_64 CPU.
Transcendental tests auto-skip when SVML is unavailable.
Copy file name to clipboardExpand all lines: README.md
+17-8Lines changed: 17 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,7 @@ We created `numpycpp` to keep NumPy's familiar usage patterns while letting C++
15
15
16
16
`numpycpp` is a **header-only C++ library** implementing numpy's core API (`numpy.*`, `numpy.linalg.*`, `numpy.einsum`) with **bit-level precision alignment**. Raw pointer + size interface. Zero external dependencies — pure C++17 standard library.
17
17
18
-
All APIs are tested against Python numpy under strict bit-level comparison: every IEEE 754 float bit must match exactly (449 tests, float64 + float32).
18
+
All APIs are tested against Python numpy under strict bit-level comparison: every IEEE 754 float bit must match exactly (460 tests, float64 + float32).
19
19
20
20
**Bit-exact math** is achieved via an SVML bridge that resolves numpy's own transcendental functions (`__svml_exp8`, `__svml_sin8`, etc.) from the loaded `_multiarray_umath.so` at runtime. This guarantees that `exp`, `log`, `sin`, `cos`, `tan`, and all other transcendental functions produce the exact same bits as numpy. On platforms without AVX-512, the bridge falls back to `std::` (1‑ULP).
21
21
@@ -80,12 +80,12 @@ Add `-Ipath/to/numpycpp` to your compiler flags and include the headers directly
80
80
### Testing
81
81
82
82
The test suite verifies **bit-level precision alignment** between every C++ function and Python numpy.
83
-
No tolerance, no `atol`/`rtol` — raw IEEE 754 bits must match exactly. 449 tests, float64 + float32.
83
+
No tolerance, no `atol`/`rtol` — raw IEEE 754 bits must match exactly. 460 tests, float64 + float32.
84
84
85
85
```bash
86
86
cd tests
87
87
make # compile C++ test module
88
-
make test# run all 449 tests (silent mode: only failures print)
88
+
make test# run all 460 tests (silent mode: only failures print)
89
89
```
90
90
91
91
To run with verbose output:
@@ -101,34 +101,43 @@ The Makefile applies the following flags:
101
101
102
102
```makefile
103
103
CXXFLAGS ?= -std=c++17 -O2 -fPIC -fopenmp \
104
-
-ffp-contract=off -ffloat-store \
105
-
-mavx512f -mfma \
104
+
-ffp-contract=off -ffloat-store -msse4.1 \
106
105
-fno-builtin-exp -fno-builtin-log \
107
106
-fno-builtin-sin -fno-builtin-cos \
108
107
-fno-builtin-tan -fno-builtin-pow \
109
108
-fno-builtin-sqrt -fno-builtin-atan2 \
110
109
-fno-builtin-log2 -fno-builtin-log10 \
111
110
-fno-builtin-asin -fno-builtin-acos \
112
111
-fno-builtin-atan -fno-builtin-exp2
112
+
# Optional: enable AVX-512 for bit-exact transcendental math.
113
+
# Requires AVX-512 hardware. Usage: make AVX512=1
114
+
ifdefAVX512
115
+
CXXFLAGS += -mavx512f -mfma
116
+
endif
113
117
LDFLAGS = -shared -ldl
114
118
```
115
119
116
120
| Flag | Purpose |
117
121
|------|---------|
118
122
|`-ffp-contract=off`| Disable FMA contraction — numpy does not contract |
119
123
|`-ffloat-store`| Prevent excess x87 precision in registers |
120
-
|`-mavx512f -mfma`| Enable AVX-512 so the SVML bridge resolves numpy's own vector math library |
124
+
|`-msse4.1`| Required for einsum SSE intrinsics (`_mm_hadd_pd`, `_mm_insert_epi32`) |
125
+
|`-mavx512f -mfma`|**Optional** (`make AVX512=1`): enable SVML bridge for bit-exact transcendental math. Requires AVX-512 hardware. Without this, transcendental functions fall back to `std::` (1‑ULP difference) |
121
126
|`-fno-builtin-<func>`|**Critical**: prevent GCC from replacing math calls with its built-in implementations. Without these, `exp`/`log`/`sin`/etc. will use libm instead of the SVML bridge, breaking bit-exact alignment |
122
127
|`-ldl`| Required for `dlsym` at runtime to resolve SVML symbols from `_multiarray_umath.so`|
123
128
124
129
> **Why `-fno-builtin` matters**: GCC's built-in math functions produce different last-bits than numpy's SVML.
125
130
> Even `std::exp` vs `__svml_exp8` differ by 1‑2 ULP for some inputs.
126
131
> These flags ensure the SVML bridge intercepts every transcendental call, guaranteeing bit-identical output.
132
+
>
133
+
> **AVX-512 is optional**: The test suite auto-detects whether the module was compiled with `-mavx512f`
134
+
> and skips transcendental tests when it was not. Non-AVX-512 builds are safe for CI and machines
135
+
> without AVX-512 hardware — only the transcendental tests are skipped; all other 350+ tests still run.
127
136
128
137
### Alignment status
129
138
130
139
The table below reflects the current bit-level parity between `numpycpp` C++ and Python numpy.
131
-
All 449 tests pass under strict IEEE 754 bit comparison (float64 + float32).
140
+
All 460 tests pass under strict IEEE 754 bit comparison (float64 + float32).
132
141
133
142
✅ = bit-exact on AVX-512 (SVML bridge active).
134
143
🔶 = 1-ULP on non-AVX-512 (falls back to `std::` math).
@@ -176,7 +185,7 @@ numpycpp/
176
185
│ └── einsum_py.h
177
186
├── tests/ # bit-level precision tests + test module
178
187
│ ├── module.cpp # pybind11 module for testing
179
-
│ ├── test_all.py # single entry — all APIs, 449 tests, float64+float32
188
+
│ ├── test_all.py # single entry — all APIs, 460 tests, float64+float32
0 commit comments