44[ ![ License: MIT] ( https://img.shields.io/badge/License-MIT-yellow.svg )] ( https://opensource.org/licenses/MIT )
55[ ![ C++17] ( https://img.shields.io/badge/C%2B%2B-17-blue.svg )] ( https://en.cppreference.com/w/cpp/17 )
66[ ![ CMake] ( https://img.shields.io/badge/CMake-%3E%3D3.16-green.svg )] ( https://cmake.org/ )
7- [ ![ Tests] ( https://img.shields.io/badge/tests-900 %20bit--exact-brightgreen.svg )] ( tests/test_all.py )
7+ [ ![ Tests] ( https://img.shields.io/badge/tests-961 %20bit--exact-brightgreen.svg )] ( tests/test_all.py )
88[ ![ PRs Welcome] ( https://img.shields.io/badge/PRs-welcome-brightgreen.svg )] ( CONTRIBUTING.md )
99
1010## Background
@@ -17,7 +17,7 @@ We created `numpycpp` to keep NumPy's familiar usage patterns while letting C++
1717
1818` numpycpp ` is a ** header-only C++ library** implementing numpy's core API (` numpy.* ` , ` numpy.linalg.* ` , ` numpy.einsum ` ) with ** bit-level precision alignment** . Raw pointer + size interface. Zero external dependencies — pure C++17 standard library.
1919
20- All APIs are tested against Python numpy under strict bit-level comparison: every IEEE 754 float bit must match exactly (900 tests, float64 + float32, including NaN passthrough, signed-zero, ±∞, domain-error cases, and advanced indexing).
20+ All APIs are tested against Python numpy under strict bit-level comparison: every IEEE 754 float bit must match exactly (961 tests, float64 + float32, including NaN passthrough, signed-zero, ±∞, domain-error cases, and advanced indexing).
2121
2222** Bit-exact math** is achieved by resolving numpy's own math functions from ` _multiarray_umath.so ` at runtime. The SVML bridge auto-detects your CPU and selects the same path numpy uses: AVX‑512 SVML (` __svml_exp8 ` ) when available, or scalar ` npy_exp ` /` npy_log ` /etc. otherwise. AVX‑512 intrinsics are isolated behind ` __attribute__((target)) ` — the binary is safe on any x86_64 CPU (no SIGILL). Every transcendental function produces the exact same IEEE 754 bits as numpy on ** all architectures** .
2323
@@ -32,23 +32,22 @@ All APIs are tested against Python numpy under strict bit-level comparison: ever
3232** Public headers** — include the umbrella or individual modules:
3333
3434``` cpp
35- #include " numpy /numpy.h" // ← single entry point (recommended)
35+ #include < numpycpp /numpy.h> // ← single entry point (recommended)
3636
3737// or include only what you need:
38- #include " numpy /init.h" // zeros_like, ones_like, full
39- #include " numpy /elementwise.h" // sqrt, exp, sin, astype, …
40- #include " numpy /reduce.h" // sum, mean, std, var, cumsum, …
41- #include " numpy /manipulation.h" // transpose, take, slice, putmask, …
42- #include " numpy /io.h" // isin, interp, unwrap, …
43- #include " numpy /linalg.h" // dot, norm, matmul, einsum
38+ #include < numpycpp /init.h> // zeros_like, ones_like, full, arange, …
39+ #include < numpycpp /elementwise.h> // sqrt, exp, sin, astype, …
40+ #include < numpycpp /reduce.h> // sum, mean, std, var, cumsum, …
41+ #include < numpycpp /manipulation.h> // transpose, take, slice, putmask, …
42+ #include < numpycpp /io.h> // isin, interp, unwrap, …
43+ #include < numpycpp /linalg.h> // dot, norm, matmul, einsum
4444```
4545
46- > ` numpy/detail/ ` headers are ** internal** — automatically pulled in by the
47- > public headers. Do not include them directly; a compile-time ` #error ` fires
48- > if you try.
46+ > ` numpycpp/detail/ ` headers are ** internal** — automatically pulled in by the
47+ > public headers. Do not include them directly.
4948>
50- > Legacy single-file headers ` numpy/core.h ` and ` numpy/einsum.h ` are kept as
51- > backward-compatible shims that simply ` #include " numpy/numpy.h" ` .
49+ > ** pybind11 users ** — include ` <numpycpp/numpy_py.h> ` instead to get the full
50+ > set of pybind11 wrapper functions ( ` numpy::sum(py::array_t<T>) ` etc.) .
5251
5352``` cpp
5453std::vector<double > data = {1.0, 4.0, 9.0};
@@ -118,8 +117,7 @@ Add `-Ipath/to/numpycpp` to your compiler flags and include the headers directly
118117
119118The test suite verifies ** bit-level precision alignment** between every C++ function and Python numpy.
120119No tolerance, no ` atol ` /` rtol ` — raw IEEE 754 bits must match exactly.
121- 900 tests: float64 + float32, including NaN passthrough, signed-zero, ±∞, domain errors, advanced indexing, and AVX-512 boundary sizes.
122- In std mode ~ 399 precision-independent tests run (structural, reduction, manipulation, io, comparison, astype, advanced indexing).
120+ 961 tests: float64 + float32, including NaN passthrough, signed-zero, ±∞, domain errors, advanced indexing, and AVX-512 boundary sizes.
123121
124122``` bash
125123# build
@@ -157,7 +155,7 @@ cmake -DNUMPYCPP_STD_ONLY=ON .. # std / performance-first backend
157155#### Compiler flags — bitexact backend (` NUMPYCPP_STD_ONLY=OFF ` )
158156
159157The minimum set was determined empirically: each flag was removed in isolation
160- and the full 900 -test suite was re-run. Only flags whose removal caused at
158+ and the full 961 -test suite was re-run. Only flags whose removal caused at
161159least one test failure are marked ** required** .
162160
163161``` cmake
@@ -251,41 +249,49 @@ Two backends, same API — choose with `cmake -DNUMPYCPP_STD_ONLY=ON/OFF`.
251249
252250```
253251numpycpp/
254- ├── numpy/ # native C++ headers
255- │ ├── numpy.h # [PUBLIC] umbrella — #includes everything below
256- │ ├── init.h # [PUBLIC] zeros_like, ones_like, full
252+ ├── numpycpp/ # header-only library (all public + internal headers)
253+ │ ├── numpy.h # [PUBLIC] umbrella — includes all core modules below
254+ │ ├── numpy_py.h # [PUBLIC] umbrella — includes all pybind11 wrappers below
255+ │ ├── init.h # [PUBLIC] zeros_like, ones_like, full, arange, linspace, …
256+ │ ├── init_py.h # [PUBLIC] pybind11 wrappers for init.h
257257│ ├── elementwise.h # [PUBLIC] sqrt/exp/sin/…, comparison, logical, astype
258+ │ ├── elementwise_py.h # [PUBLIC] pybind11 wrappers for elementwise.h
258259│ ├── reduce.h # [PUBLIC] sum/mean/std/var/cumsum, axis reductions
260+ │ ├── reduce_py.h # [PUBLIC] pybind11 wrappers for reduce.h
259261│ ├── manipulation.h # [PUBLIC] transpose/take/slice/put/putmask/argsort/…
260- │ ├── io.h # [PUBLIC] isin, interp, unwrap, safe_divide
262+ │ ├── manipulation_py.h # [PUBLIC] pybind11 wrappers for manipulation.h
263+ │ ├── io.h # [PUBLIC] isin, interp, unwrap, safe_divide, …
264+ │ ├── io_py.h # [PUBLIC] pybind11 wrappers for io.h
261265│ ├── linalg.h # [PUBLIC] dot, norm, matmul, einsum
262- │ ├── core.h # [SHIM] backward-compat → #include "numpy.h"
263- │ ├── einsum.h # [SHIM] backward-compat → #include "numpy.h"
264- │ └── detail/ # [INTERNAL] do not include directly — #error guard
266+ │ ├── linalg_py.h # [PUBLIC] pybind11 wrappers for linalg.h
267+ │ └── detail/ # [INTERNAL] do not include directly
265268│ ├── macros.h # NUMPY_UNROLL4, NUMPY_SMALL_STACK
266- │ ├── math_backend.h # selector: STD_ONLY → std_math_backend, else svml_bridge
267269│ ├── svml_bridge.h # bitexact: SVML / npy_* scalar math (dlsym)
268270│ ├── std_math_backend.h # std: pure <cmath> std::exp/log/sin/… (no deps)
269- │ ├── npy_math_float.h # bitexact: npy_* float32 wrappers
270- │ ├── linalg_backend.h # selector: STD_ONLY → std_linalg_backend, else blas_bridge
271271│ ├── blas_bridge.h # bitexact: OpenBLAS ILP64 cblas wrappers (dlsym)
272272│ ├── std_linalg_backend.h# std: pure C++ loop dot/gemm (no deps)
273- │ └── avx512_loops.h # bitexact: AVX-512 vectorised exp/sin/cos loops
274- ├── pycpp/ # pybind11 wrappers (optional)
275- │ ├── pycpp.h # [PUBLIC] umbrella — #includes everything below
276- │ ├── init_py.h # [PUBLIC] zeros_like, ones_like, full
277- │ ├── elementwise_py.h # [PUBLIC] sqrt/exp/sin/…, comparison, logical, astype
278- │ ├── reduce_py.h # [PUBLIC] sum/mean/std/var/cumsum
279- │ ├── manipulation_py.h # [PUBLIC] transpose/take/slice/put/putmask/…
280- │ ├── io_py.h # [PUBLIC] isin, interp, unwrap, asarray, …
281- │ ├── linalg_py.h # [PUBLIC] dot, norm, matmul, einsum
282- │ ├── core_py.h # [SHIM] backward-compat → #include "pycpp.h"
283- │ └── einsum_py.h # [SHIM] backward-compat → #include "pycpp.h"
273+ │ ├── avx512_loops.h # bitexact: AVX-512 vectorised exp/sin/cos loops
274+ │ └── npy_math_float.h # bitexact: npy_* float32 wrappers
275+ ├── bench/ # performance benchmarks
276+ │ ├── CMakeLists.txt
277+ │ ├── bench_core.cpp # C++ benchmark driver
278+ │ ├── bench.py # pybind11-based benchmark runner
279+ │ └── bench_numpy.py # pure-numpy baseline
284280├── tests/ # bit-level precision tests + test module
285281│ ├── module.cpp # pybind11 module for testing
286- │ ├── test_all.py # single entry — all APIs, 900 tests, float64+float32
282+ │ ├── test_all.py # single entry — all APIs, 961 tests, float64+float32
287283│ ├── conftest.py # silent-mode output suppression
284+ │ ├── make_csv.py # ULP precision CSV generator
285+ │ ├── diagnose_numpy.py # numpy internal diagnostic tool
286+ │ ├── ulp_precision.csv # per-function ULP comparison data
288287│ └── CMakeLists.txt # test-module build
288+ ├── example/ # minimal usage examples
289+ │ ├── CMakeLists.txt
290+ │ └── main.cpp
291+ ├── cmake/
292+ │ └── preinst # DEB pre-install script (clean old headers)
293+ ├── issue/ # issue tracking & root-cause analysis
294+ │ └── 001-mean_pairwise_sum_vs_sequential.md
289295├── CMakeLists.txt # build & .deb packaging
290296└── README.md
291297```
0 commit comments