You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+90-60Lines changed: 90 additions & 60 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,7 @@ We created `numpycpp` to keep NumPy's familiar usage patterns while letting C++
15
15
16
16
`numpycpp` is a **header-only C++ library** implementing numpy's core API (`numpy.*`, `numpy.linalg.*`, `numpy.einsum`) with **bit-level precision alignment**. Raw pointer + size interface. Zero external dependencies — pure C++17 standard library.
17
17
18
-
All APIs are tested against Python numpy under strict bit-level comparison: every IEEE 754 float bit must match exactly (500 tests, float64 + float32).
18
+
All APIs are tested against Python numpy under strict bit-level comparison: every IEEE 754 float bit must match exactly (754 tests, float64 + float32, including NaN passthrough, signed-zero, ±∞, and domain-error cases).
19
19
20
20
**Bit-exact math** is achieved by resolving numpy's own math functions from `_multiarray_umath.so` at runtime. The SVML bridge auto-detects your CPU and selects the same path numpy uses: AVX‑512 SVML (`__svml_exp8`) when available, or scalar `npy_exp`/`npy_log`/etc. otherwise. AVX‑512 intrinsics are isolated behind `__attribute__((target))` — the binary is safe on any x86_64 CPU (no SIGILL). Every transcendental function produces the exact same IEEE 754 bits as numpy on **all architectures**.
21
21
@@ -35,8 +35,9 @@ All APIs are tested against Python numpy under strict bit-level comparison: ever
35
35
#include"numpy/einsum.h"// numpy.einsum
36
36
```
37
37
38
-
> `numpy/svml_bridge.h` and `numpy/npy_math_float.h` are **internal** — they are
39
-
> automatically pulled in by `core.h`. Do not include them directly.
38
+
> `numpy/detail/` headers are **internal** — automatically pulled in by the
39
+
> public headers. Do not include them directly; a compile-time `#error` fires
40
+
> if you try.
40
41
41
42
```cpp
42
43
std::vector<double> data = {1.0, 4.0, 9.0};
@@ -89,62 +90,86 @@ Add `-Ipath/to/numpycpp` to your compiler flags and include the headers directly
89
90
### Testing
90
91
91
92
The test suite verifies **bit-level precision alignment** between every C++ function and Python numpy.
92
-
No tolerance, no `atol`/`rtol` — raw IEEE 754 bits must match exactly. 500 tests, float64 + float32.
93
+
No tolerance, no `atol`/`rtol` — raw IEEE 754 bits must match exactly.
94
+
754 tests: float64 + float32, including NaN passthrough, signed-zero, ±∞, domain errors, and AVX-512 boundary sizes.
93
95
94
96
```bash
95
-
cd tests
96
-
make # compile C++ test module
97
-
make test# run all 500 tests (silent mode: only failures print)
97
+
# build
98
+
cmake -S tests -B tests/build
99
+
cmake --build tests/build -j$(nproc)
100
+
101
+
# run (silent on pass — failures print hex diff)
102
+
cd tests && python3 -m pytest test_all.py -q --tb=short --no-header
|`-ffp-contract=off`| Prevents the compiler from silently fusing `a*b + c` into a single FMA instruction. numpycpp's einsum accumulation loops must use the same multiply-then-add order as numpy's BLAS kernels. | 36 einsum tests fail with ±1 ULP differences. |
124
+
|`-mavx512f -mfma`| The SVML bridge declares fast scalar wrappers (`exp_svml_f64`, etc.) inside `#ifdef __AVX512F__`. Without this flag the preprocessor omits those declarations and the dispatcher fails to compile. AVX-512 intrinsics are runtime-guarded via `__builtin_cpu_supports` — the binary is safe on non-AVX-512 CPUs. | Hard compile error: `'exp_svml_f64' was not declared in this scope`. |
125
+
|`-ldl`|`dlsym` / `dlopen` are used at startup to locate numpy's `_multiarray_umath.so` and resolve `npy_exp`, `__svml_exp8`, etc. | Link error: `undefined reference to 'dlsym'`. |
126
+
127
+
#### Recommended (defensive) flags
128
+
129
+
These flags produced **no test failures** when removed individually (all 754
130
+
tests still passed), but are kept in `tests/CMakeLists.txt` as a safety net:
107
131
108
-
Achieving bit-identical results with numpy requires strict control over floating-point code generation.
109
-
The Makefile applies the following flags:
110
-
111
-
```makefile
112
-
CXXFLAGS ?= -std=c++17 -O2 -fPIC -fopenmp \
113
-
-ffp-contract=off -ffloat-store -msse4.1 \
114
-
-mavx512f -mfma \
115
-
-fno-builtin-exp -fno-builtin-log \
116
-
-fno-builtin-sin -fno-builtin-cos \
117
-
-fno-builtin-tan -fno-builtin-pow \
118
-
-fno-builtin-sqrt -fno-builtin-atan2 \
119
-
-fno-builtin-log2 -fno-builtin-log10 \
120
-
-fno-builtin-asin -fno-builtin-acos \
121
-
-fno-builtin-atan -fno-builtin-exp2 \
122
-
-fno-builtin-cbrt -fno-builtin-expm1 \
123
-
-fno-builtin-log1p
124
-
LDFLAGS = -shared -ldl
132
+
```cmake
133
+
target_compile_options(<target> PRIVATE
134
+
-msse4.1 # baseline SSE4.1 (good practice; not currently needed)
135
+
-fno-builtin-exp # \
136
+
-fno-builtin-log # |
137
+
-fno-builtin-sin # | prevent GCC from replacing direct math calls
138
+
-fno-builtin-cos # | with builtins — numpycpp never calls exp()/sin()
139
+
-fno-builtin-tan # | directly, so these have no measurable effect
140
+
-fno-builtin-pow # | today, but guard against accidental future regressions
141
+
-fno-builtin-sqrt # |
142
+
-fno-builtin-atan2 # |
143
+
-fno-builtin-log2 # |
144
+
-fno-builtin-log10 # |
145
+
-fno-builtin-asin # |
146
+
-fno-builtin-acos # |
147
+
-fno-builtin-atan # |
148
+
-fno-builtin-exp2 # |
149
+
-fno-builtin-cbrt # |
150
+
-fno-builtin-expm1 # |
151
+
-fno-builtin-log1p # /
152
+
)
125
153
```
126
154
127
-
| Flag | Purpose |
128
-
|------|---------|
129
-
|`-ffp-contract=off`| Disable FMA contraction — numpy does not contract |
130
-
|`-ffloat-store`| Prevent excess x87 precision in registers |
131
-
|`-msse4.1`| Required for einsum SSE intrinsics (`_mm_hadd_pd`, `_mm_insert_epi32`) |
132
-
|`-mavx512f -mfma`| Enable AVX‑512 compilation for SVML bridge. Intrinsics are runtime‑guarded via `__attribute__((target))` — safe on any x86_64 CPU (no SIGILL) |
133
-
|`-fno-builtin-<func>`| Prevent GCC from replacing math calls with built‑ins, ensuring the SVML bridge intercepts every call |
134
-
|`-ldl`| Required for `dlsym` at runtime to resolve numpy's math functions from `_multiarray_umath.so`|
155
+
> **Why `-fno-builtin-*` doesn't matter today**: numpycpp never calls `exp()`,
156
+
> `sin()`, etc. from `<cmath>` directly. Every transcendental is routed
157
+
> through the SVML bridge's custom-named wrappers (`exp_npy_f64`,
158
+
> `exp_svml_f64`, …) so GCC has no opportunity to substitute its own builtin.
159
+
> The flags are retained for defensive clarity.
135
160
136
161
> **Runtime CPU dispatch**: The SVML bridge auto‑detects AVX‑512 at runtime
137
-
> (`__builtin_cpu_supports`). On AVX‑512 hardware it calls numpy's SVML vector functions
138
-
> (`__svml_exp8`, etc.); otherwise it falls back to numpy's scalar math functions
139
-
> (`npy_exp`, `npy_log`, etc.). Both paths are resolved from the loaded
140
-
> `_multiarray_umath.so` via `dlsym`. AVX‑512 intrinsics are isolated behind
141
-
> `__attribute__((target("avx512f")))`so the binary runs safely on ANY
142
-
> x86_64 CPU — no SIGILL.
162
+
> (`__builtin_cpu_supports("avx512f")`). On AVX‑512 hardware it calls numpy's
163
+
> SVML vector functions (`__svml_exp8`, etc.); otherwise it falls back to
164
+
> numpy's scalar math functions (`npy_exp`, `npy_log`, etc.). Both paths are
165
+
> resolved from the loaded `_multiarray_umath.so` via `dlsym`. AVX‑512
166
+
> intrinsics are isolated behind `__attribute__((target("avx512f")))`— the
167
+
> binary compiles and runs safely on **any**x86_64 CPU without SIGILL.
143
168
144
169
### Alignment status
145
170
146
171
The table below reflects the current bit-level parity between `numpycpp` C++ and Python numpy.
147
-
All 500 tests pass under strict IEEE 754 bit comparison (float64 + float32).
172
+
All 754 tests pass under strict IEEE 754 bit comparison (float64 + float32).
148
173
149
174
✅ = bit-exact on ALL architectures (SVML bridge with runtime CPU dispatch).
150
175
@@ -167,35 +192,40 @@ All 500 tests pass under strict IEEE 754 bit comparison (float64 + float32).
> **SVML bridge**: At runtime, `numpycpp` detects CPU features (`__builtin_cpu_supports("avx512f")`) and selects the same math path numpy uses — AVX‑512 SVML vector functions (`__svml_exp8`, etc.) on supported hardware, or scalar `npy_exp`/`npy_log`/etc. otherwise. Both are resolved from the loaded `_multiarray_umath.so` via `dlsym`. AVX‑512 intrinsics are isolated behind `__attribute__((target("avx512f")))` — the binary compiles and runs safely on ANY x86_64 CPU without SIGILL.
176
201
>
177
-
> **Reductions**: All reductions use numpy's pairwise summation algorithm (recursive split, 8-accumulator unrolled). This matches `np.sum` exactly. Dot products and norms build on pairwise_sum, not BLAS — matching `np.sum(a*b)` and `np.sqrt(np.sum(a*a))` respectively.
202
+
> **Reductions**: All reductions use numpy's pairwise summation algorithm (recursive split, 8-accumulator unrolled). This matches `np.sum` exactly.
203
+
>
204
+
> **Dot / Norm / Einsum**: Use BLAS (`cblas_sdot`, `cblas_sgemv`, `cblas_sgemm`) — the same kernels numpy delegates to — so results are bit-identical.
178
205
179
206
## Project Structure
180
207
181
208
```
182
209
numpycpp/
183
-
├── numpy/ # native C++ headers
184
-
│ ├── core.h # [PUBLIC] numpy.* equivalents
185
-
│ ├── linalg.h # [PUBLIC] numpy.linalg.*
186
-
│ ├── einsum.h # [PUBLIC] numpy.einsum
187
-
│ ├── svml_bridge.h # [INTERNAL] do not include directly
188
-
│ └── npy_math_float.h # [INTERNAL] do not include directly
189
-
├── pycpp/ # pybind11 wrappers (optional)
210
+
├── numpy/ # native C++ headers
211
+
│ ├── core.h # [PUBLIC] numpy.* equivalents
212
+
│ ├── linalg.h # [PUBLIC] numpy.linalg.*
213
+
│ ├── einsum.h # [PUBLIC] numpy.einsum
214
+
│ └── detail/ # [INTERNAL] do not include directly — #error guard
215
+
│ ├── svml_bridge.h # SVML / npy_* scalar math dispatch
0 commit comments