You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: emphasize compiler flags for bit-exact alignment; update test count 336→449
Add dedicated 'Compiler flags for bit-exact alignment' section:
- Document -ffp-contract=off, -ffloat-store, -mavx512f -mfma
- Explain -fno-builtin-* flags per transcendental function
- Emphasize why -fno-builtin is critical for SVML bridge
- Show Makefile snippet with all flags
Copy file name to clipboardExpand all lines: README.md
+36-5Lines changed: 36 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,7 @@ We created `numpycpp` to keep NumPy's familiar usage patterns while letting C++
15
15
16
16
`numpycpp` is a **header-only C++ library** implementing numpy's core API (`numpy.*`, `numpy.linalg.*`, `numpy.einsum`) with **bit-level precision alignment**. Raw pointer + size interface. Zero external dependencies — pure C++17 standard library.
17
17
18
-
All APIs are tested against Python numpy under strict bit-level comparison: every IEEE 754 float bit must match exactly (336 tests, float64 + float32).
18
+
All APIs are tested against Python numpy under strict bit-level comparison: every IEEE 754 float bit must match exactly (449 tests, float64 + float32).
19
19
20
20
**Bit-exact math** is achieved via an SVML bridge that resolves numpy's own transcendental functions (`__svml_exp8`, `__svml_sin8`, etc.) from the loaded `_multiarray_umath.so` at runtime. This guarantees that `exp`, `log`, `sin`, `cos`, `tan`, and all other transcendental functions produce the exact same bits as numpy. On platforms without AVX-512, the bridge falls back to `std::` (1‑ULP).
21
21
@@ -80,12 +80,12 @@ Add `-Ipath/to/numpycpp` to your compiler flags and include the headers directly
80
80
### Testing
81
81
82
82
The test suite verifies **bit-level precision alignment** between every C++ function and Python numpy.
83
-
No tolerance, no `atol`/`rtol` — raw IEEE 754 bits must match exactly. 336 tests, float64 + float32.
83
+
No tolerance, no `atol`/`rtol` — raw IEEE 754 bits must match exactly. 449 tests, float64 + float32.
84
84
85
85
```bash
86
86
cd tests
87
87
make # compile C++ test module
88
-
make test# run all 336 tests (silent mode: only failures print)
88
+
make test# run all 449 tests (silent mode: only failures print)
Achieving bit-identical results with numpy requires strict control over floating-point code generation.
100
+
The Makefile applies the following flags:
101
+
102
+
```makefile
103
+
CXXFLAGS ?= -std=c++17 -O2 -fPIC -fopenmp \
104
+
-ffp-contract=off -ffloat-store \
105
+
-mavx512f -mfma \
106
+
-fno-builtin-exp -fno-builtin-log \
107
+
-fno-builtin-sin -fno-builtin-cos \
108
+
-fno-builtin-tan -fno-builtin-pow \
109
+
-fno-builtin-sqrt -fno-builtin-atan2 \
110
+
-fno-builtin-log2 -fno-builtin-log10 \
111
+
-fno-builtin-asin -fno-builtin-acos \
112
+
-fno-builtin-atan -fno-builtin-exp2
113
+
LDFLAGS = -shared -ldl
114
+
```
115
+
116
+
| Flag | Purpose |
117
+
|------|---------|
118
+
|`-ffp-contract=off`| Disable FMA contraction — numpy does not contract |
119
+
|`-ffloat-store`| Prevent excess x87 precision in registers |
120
+
|`-mavx512f -mfma`| Enable AVX-512 so the SVML bridge resolves numpy's own vector math library |
121
+
|`-fno-builtin-<func>`|**Critical**: prevent GCC from replacing math calls with its built-in implementations. Without these, `exp`/`log`/`sin`/etc. will use libm instead of the SVML bridge, breaking bit-exact alignment |
122
+
|`-ldl`| Required for `dlsym` at runtime to resolve SVML symbols from `_multiarray_umath.so`|
123
+
124
+
> **Why `-fno-builtin` matters**: GCC's built-in math functions produce different last-bits than numpy's SVML.
125
+
> Even `std::exp` vs `__svml_exp8` differ by 1‑2 ULP for some inputs.
126
+
> These flags ensure the SVML bridge intercepts every transcendental call, guaranteeing bit-identical output.
127
+
97
128
### Alignment status
98
129
99
130
The table below reflects the current bit-level parity between `numpycpp` C++ and Python numpy.
100
-
All 336 tests pass under strict IEEE 754 bit comparison (float64 + float32).
131
+
All 449 tests pass under strict IEEE 754 bit comparison (float64 + float32).
101
132
102
133
✅ = bit-exact on AVX-512 (SVML bridge active).
103
134
🔶 = 1-ULP on non-AVX-512 (falls back to `std::` math).
@@ -145,7 +176,7 @@ numpycpp/
145
176
│ └── einsum_py.h
146
177
├── tests/ # bit-level precision tests + test module
147
178
│ ├── module.cpp # pybind11 module for testing
148
-
│ ├── test_all.py # single entry — all APIs, 336 tests, float64+float32
179
+
│ ├── test_all.py # single entry — all APIs, 449 tests, float64+float32
0 commit comments