Skip to content

Commit 00aeb78

Browse files
author
peng.li24
committed
docs: update alignment table — 4 columns (category, functions, bitexact, std)
1 parent e2e0c39 commit 00aeb78

1 file changed

Lines changed: 39 additions & 33 deletions

File tree

README.md

Lines changed: 39 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -184,41 +184,47 @@ target_compile_options(<target> PRIVATE
184184
185185
### Alignment status
186186

187-
The table below reflects the current bit-level parity between `numpycpp` C++ and Python numpy.
188-
All 900 tests pass under strict IEEE 754 bit comparison (float64 + float32).
189-
190-
✅ = bit-exact on ALL architectures (SVML bridge with runtime CPU dispatch).
191-
192-
| API group | float64 | float32 | Notes |
193-
|-------------------|:-------:|:-------:|-------|
194-
| Creation ||| zeros_like, ones_like, full_like, zeros, ones |
195-
| Astype ||| astype int/bool, truncate float32 |
196-
| Comparison ||| greater, less, equal, not_equal, etc. |
197-
| Logical ||| bool-only (and/or/not/xor) |
198-
| Special values ||| isnan, isinf, isfinite |
199-
| Manipulation ||| diff, stack, concatenate, transpose, slice, roll, flip, repeat, tile, where |
200-
| **Advanced indexing** ||| take (any axis), compress (bool mask gather), nd_slice (step/reverse), put, boolean_assign, nd_slice_assign |
201-
| Sorting ||| argsort, argmax, argmin |
202-
| Setops / interp ||| isin, intersect1d, interp, safe_divide |
203-
| Access / convert ||| array_get, asarray, to_vector |
204-
| **Math — element-wise** (sqrt, abs, sign, clip, round, floor, ceil, degrees, radians) ||| Pure C++, no libm dependency |
205-
| **Math — transcendental** (exp, log, sin, cos, tan, asin, acos, atan, log10, log2, exp2, cbrt, expm1, log1p) ||| dlsym npy_* or SVML via bridge, bit-exact on all archs |
206-
| **Math — power** ||| npy_pow / npy_powf via SVML bridge |
207-
| **Math — hypot** ||| std::hypot — bit-exact (numpy matches libm) |
208-
| **Math — atan2** ||| npy_atan2 / npy_atan2f via SVML bridge |
209-
| **Reduction** (sum, mean, max, min, any, all) ||| pairwise_sum matches numpy exactly |
210-
| Statistical (std, var) ||| pairwise_sum + sqrt |
211-
| Binary (maximum, minimum) ||| std::max/min, deterministic |
212-
| **Dot product** ||| BLAS (`cblas_sdot`/`cblas_ddot`) — matches `np.dot` |
213-
| **Norm** ||| BLAS dot + sqrt — matches `np.linalg.norm` |
214-
| **Norm (axis)** ||| BLAS dot per fiber + sqrt |
215-
| **Einsum** ||| All patterns (ij,ij→i, ij,jk→ik, bij,bjk→bik, etc.) |
216-
217-
> **SVML bridge**: At runtime, `numpycpp` detects CPU features (`__builtin_cpu_supports("avx512f")`) and selects the same math path numpy uses — AVX‑512 SVML vector functions (`__svml_exp8`, etc.) on supported hardware, or scalar `npy_exp`/`npy_log`/etc. otherwise. Both are resolved from the loaded `_multiarray_umath.so` via `dlsym`. AVX‑512 intrinsics are isolated behind `__attribute__((target("avx512f")))` — the binary compiles and runs safely on ANY x86_64 CPU without SIGILL.
187+
Two backends, same API — choose with `cmake -DNUMPYCPP_STD_ONLY=ON/OFF`.
188+
189+
| Legend | Meaning |
190+
|--------|---------|
191+
|| IEEE 754 bit-identical to numpy (float64 + float32) |
192+
|| Correct result, 0–2 ULP from numpy (not bit-exact) |
193+
194+
| Category | Functions | `bitexact` (`STD_ONLY=OFF`) | `std` (`STD_ONLY=ON`) |
195+
|----------|-----------|:---------------------------:|:---------------------:|
196+
| **Creation** | `zeros_like` `ones_like` `full_like` `empty_like` `zeros` `ones` `full` |||
197+
| **Type conversion** | `astype` (int/float/bool/int64) `truncate_to_float32` |||
198+
| **Comparison** | `greater` `less` `equal` `not_equal` `greater_equal` `less_equal` |||
199+
| **Logical** | `logical_and` `logical_or` `logical_not` `logical_xor` |||
200+
| **Special values** | `isnan` `isinf` `isfinite` |||
201+
| **Manipulation** | `diff` `stack` `vstack` `hstack` `concatenate` `transpose` `flatten` `squeeze` `roll` `flip` `repeat` `tile` `where` |||
202+
| **Advanced indexing** | `take` `compress` `slice` (N-D + step) `put` `putmask` `slice_assign` |||
203+
| **Sorting** | `argsort` `argmax` `argmin` |||
204+
| **Set / interp** | `isin` `intersect1d` `interp` `unwrap` `flatnonzero` `safe_divide` |||
205+
| **Reduction** | `sum` `mean` `max` `min` `any` `all` `std` `var` `cumsum` `mean` (axis) |||
206+
| **Math — pure C++** | `sqrt` `abs` `sign` `clip` `round` `floor` `ceil` `degrees` `radians` `maximum` `minimum` |||
207+
| **Math — transcendental** | `exp` `log` `sin` `cos` `tan` `arcsin` `arccos` `arctan` `log10` `log2` `exp2` `cbrt` `expm1` `log1p` || 〜 0–1 ULP |
208+
| **Math — power / atan2** | `power` `arctan2` || 〜 0–1 ULP |
209+
| **Math — hypot** | `hypot` |||
210+
| **Dot product** | `numpy.dot` (1-D) || 〜 0–1 ULP |
211+
| **Norm** | `numpy.linalg.norm` (scalar + axis) || 〜 0–1 ULP |
212+
| **Matmul** | `numpy.matmul` (2-D, 1-D×2-D, 2-D×1-D, batched 3-D) || 〜 0–2 ULP |
213+
| **Einsum** | `ij,ij→i` `ij,jk→ik` `bij,bjk→bik` and all 2-operand patterns || 〜 0–2 ULP |
214+
215+
> **bitexact backend**: transcendentals resolved via `dlsym` from numpy's
216+
> `_multiarray_umath.so` — same `npy_exp`/`npy_log` kernels numpy uses, with
217+
> AVX‑512 SVML vector path (`__svml_exp8` etc.) when available.
218+
> Dot/matmul/einsum use OpenBLAS ILP64 (`cblas_sgemm64_`) — the same BLAS
219+
> numpy delegates to. Results are IEEE 754 bit-identical on **all architectures**.
218220
>
219-
> **Reductions**: All reductions use numpy's pairwise summation algorithm (recursive split, 8-accumulator unrolled). This matches `np.sum` exactly.
221+
> **std backend**: transcendentals use `std::exp`/`std::sin`/… from `<cmath>`
222+
> (glibc, typically 0–1 ULP). Dot/matmul/einsum use plain C++ loops
223+
> (compiler auto-vectorises with `-O3 -march=native`). No external dependencies.
220224
>
221-
> **Dot / Norm / Einsum**: Use BLAS (`cblas_sdot`, `cblas_sgemv`, `cblas_sgemm`) — the same kernels numpy delegates to — so results are bit-identical.
225+
> **Reductions** (both backends): pairwise summation algorithm (recursive split,
226+
> 8-accumulator unrolled) — matches `np.sum` exactly.
227+
> **hypot** (both backends): `std::hypot` — numpy delegates to the same libm call.
222228
223229
## Project Structure
224230

0 commit comments

Comments
 (0)