@@ -184,41 +184,47 @@ target_compile_options(<target> PRIVATE
184184
185185### Alignment status
186186
187- The table below reflects the current bit-level parity between ` numpycpp ` C++ and Python numpy.
188- All 900 tests pass under strict IEEE 754 bit comparison (float64 + float32).
189-
190- ✅ = bit-exact on ALL architectures (SVML bridge with runtime CPU dispatch).
191-
192- | API group | float64 | float32 | Notes |
193- | -------------------| :-------:| :-------:| -------|
194- | Creation | ✅ | ✅ | zeros_like, ones_like, full_like, zeros, ones |
195- | Astype | ✅ | ✅ | astype int/bool, truncate float32 |
196- | Comparison | ✅ | ✅ | greater, less, equal, not_equal, etc. |
197- | Logical | ✅ | ✅ | bool-only (and/or/not/xor) |
198- | Special values | ✅ | ✅ | isnan, isinf, isfinite |
199- | Manipulation | ✅ | ✅ | diff, stack, concatenate, transpose, slice, roll, flip, repeat, tile, where |
200- | ** Advanced indexing** | ✅ | ✅ | take (any axis), compress (bool mask gather), nd_slice (step/reverse), put, boolean_assign, nd_slice_assign |
201- | Sorting | ✅ | ✅ | argsort, argmax, argmin |
202- | Setops / interp | ✅ | ✅ | isin, intersect1d, interp, safe_divide |
203- | Access / convert | ✅ | ✅ | array_get, asarray, to_vector |
204- | ** Math — element-wise** (sqrt, abs, sign, clip, round, floor, ceil, degrees, radians) | ✅ | ✅ | Pure C++, no libm dependency |
205- | ** Math — transcendental** (exp, log, sin, cos, tan, asin, acos, atan, log10, log2, exp2, cbrt, expm1, log1p) | ✅ | ✅ | dlsym npy_ * or SVML via bridge, bit-exact on all archs |
206- | ** Math — power** | ✅ | ✅ | npy_pow / npy_powf via SVML bridge |
207- | ** Math — hypot** | ✅ | ✅ | std::hypot — bit-exact (numpy matches libm) |
208- | ** Math — atan2** | ✅ | ✅ | npy_atan2 / npy_atan2f via SVML bridge |
209- | ** Reduction** (sum, mean, max, min, any, all) | ✅ | ✅ | pairwise_sum matches numpy exactly |
210- | Statistical (std, var) | ✅ | ✅ | pairwise_sum + sqrt |
211- | Binary (maximum, minimum) | ✅ | ✅ | std::max/min, deterministic |
212- | ** Dot product** | ✅ | ✅ | BLAS (` cblas_sdot ` /` cblas_ddot ` ) — matches ` np.dot ` |
213- | ** Norm** | ✅ | ✅ | BLAS dot + sqrt — matches ` np.linalg.norm ` |
214- | ** Norm (axis)** | ✅ | ✅ | BLAS dot per fiber + sqrt |
215- | ** Einsum** | ✅ | ✅ | All patterns (ij,ij→i, ij,jk→ik, bij,bjk→bik, etc.) |
216-
217- > ** SVML bridge** : At runtime, ` numpycpp ` detects CPU features (` __builtin_cpu_supports("avx512f") ` ) and selects the same math path numpy uses — AVX‑512 SVML vector functions (` __svml_exp8 ` , etc.) on supported hardware, or scalar ` npy_exp ` /` npy_log ` /etc. otherwise. Both are resolved from the loaded ` _multiarray_umath.so ` via ` dlsym ` . AVX‑512 intrinsics are isolated behind ` __attribute__((target("avx512f"))) ` — the binary compiles and runs safely on ANY x86_64 CPU without SIGILL.
187+ Two backends, same API — choose with ` cmake -DNUMPYCPP_STD_ONLY=ON/OFF ` .
188+
189+ | Legend | Meaning |
190+ | --------| ---------|
191+ | ✅ | IEEE 754 bit-identical to numpy (float64 + float32) |
192+ | 〜 | Correct result, 0–2 ULP from numpy (not bit-exact) |
193+
194+ | Category | Functions | ` bitexact ` (` STD_ONLY=OFF ` ) | ` std ` (` STD_ONLY=ON ` ) |
195+ | ----------| -----------| :---------------------------:| :---------------------:|
196+ | ** Creation** | ` zeros_like ` ` ones_like ` ` full_like ` ` empty_like ` ` zeros ` ` ones ` ` full ` | ✅ | ✅ |
197+ | ** Type conversion** | ` astype ` (int/float/bool/int64) ` truncate_to_float32 ` | ✅ | ✅ |
198+ | ** Comparison** | ` greater ` ` less ` ` equal ` ` not_equal ` ` greater_equal ` ` less_equal ` | ✅ | ✅ |
199+ | ** Logical** | ` logical_and ` ` logical_or ` ` logical_not ` ` logical_xor ` | ✅ | ✅ |
200+ | ** Special values** | ` isnan ` ` isinf ` ` isfinite ` | ✅ | ✅ |
201+ | ** Manipulation** | ` diff ` ` stack ` ` vstack ` ` hstack ` ` concatenate ` ` transpose ` ` flatten ` ` squeeze ` ` roll ` ` flip ` ` repeat ` ` tile ` ` where ` | ✅ | ✅ |
202+ | ** Advanced indexing** | ` take ` ` compress ` ` slice ` (N-D + step) ` put ` ` putmask ` ` slice_assign ` | ✅ | ✅ |
203+ | ** Sorting** | ` argsort ` ` argmax ` ` argmin ` | ✅ | ✅ |
204+ | ** Set / interp** | ` isin ` ` intersect1d ` ` interp ` ` unwrap ` ` flatnonzero ` ` safe_divide ` | ✅ | ✅ |
205+ | ** Reduction** | ` sum ` ` mean ` ` max ` ` min ` ` any ` ` all ` ` std ` ` var ` ` cumsum ` ` mean ` (axis) | ✅ | ✅ |
206+ | ** Math — pure C++** | ` sqrt ` ` abs ` ` sign ` ` clip ` ` round ` ` floor ` ` ceil ` ` degrees ` ` radians ` ` maximum ` ` minimum ` | ✅ | ✅ |
207+ | ** Math — transcendental** | ` exp ` ` log ` ` sin ` ` cos ` ` tan ` ` arcsin ` ` arccos ` ` arctan ` ` log10 ` ` log2 ` ` exp2 ` ` cbrt ` ` expm1 ` ` log1p ` | ✅ | 〜 0–1 ULP |
208+ | ** Math — power / atan2** | ` power ` ` arctan2 ` | ✅ | 〜 0–1 ULP |
209+ | ** Math — hypot** | ` hypot ` | ✅ | ✅ |
210+ | ** Dot product** | ` numpy.dot ` (1-D) | ✅ | 〜 0–1 ULP |
211+ | ** Norm** | ` numpy.linalg.norm ` (scalar + axis) | ✅ | 〜 0–1 ULP |
212+ | ** Matmul** | ` numpy.matmul ` (2-D, 1-D×2-D, 2-D×1-D, batched 3-D) | ✅ | 〜 0–2 ULP |
213+ | ** Einsum** | ` ij,ij→i ` ` ij,jk→ik ` ` bij,bjk→bik ` and all 2-operand patterns | ✅ | 〜 0–2 ULP |
214+
215+ > ** bitexact backend** : transcendentals resolved via ` dlsym ` from numpy's
216+ > ` _multiarray_umath.so ` — same ` npy_exp ` /` npy_log ` kernels numpy uses, with
217+ > AVX‑512 SVML vector path (` __svml_exp8 ` etc.) when available.
218+ > Dot/matmul/einsum use OpenBLAS ILP64 (` cblas_sgemm64_ ` ) — the same BLAS
219+ > numpy delegates to. Results are IEEE 754 bit-identical on ** all architectures** .
218220>
219- > ** Reductions** : All reductions use numpy's pairwise summation algorithm (recursive split, 8-accumulator unrolled). This matches ` np.sum ` exactly.
221+ > ** std backend** : transcendentals use ` std::exp ` /` std::sin ` /… from ` <cmath> `
222+ > (glibc, typically 0–1 ULP). Dot/matmul/einsum use plain C++ loops
223+ > (compiler auto-vectorises with ` -O3 -march=native ` ). No external dependencies.
220224>
221- > ** Dot / Norm / Einsum** : Use BLAS (` cblas_sdot ` , ` cblas_sgemv ` , ` cblas_sgemm ` ) — the same kernels numpy delegates to — so results are bit-identical.
225+ > ** Reductions** (both backends): pairwise summation algorithm (recursive split,
226+ > 8-accumulator unrolled) — matches ` np.sum ` exactly.
227+ > ** hypot** (both backends): ` std::hypot ` — numpy delegates to the same libm call.
222228
223229## Project Structure
224230
0 commit comments