Skip to content

Commit ccebb6a

Browse files
author
peng.li24
committed
docs: emphasize compiler flags for bit-exact alignment; update test count 336→449
Add dedicated 'Compiler flags for bit-exact alignment' section: - Document -ffp-contract=off, -ffloat-store, -mavx512f -mfma - Explain -fno-builtin-* flags per transcendental function - Emphasize why -fno-builtin is critical for SVML bridge - Show Makefile snippet with all flags
1 parent 8e94e90 commit ccebb6a

1 file changed

Lines changed: 36 additions & 5 deletions

File tree

README.md

Lines changed: 36 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ We created `numpycpp` to keep NumPy's familiar usage patterns while letting C++
1515

1616
`numpycpp` is a **header-only C++ library** implementing numpy's core API (`numpy.*`, `numpy.linalg.*`, `numpy.einsum`) with **bit-level precision alignment**. Raw pointer + size interface. Zero external dependencies — pure C++17 standard library.
1717

18-
All APIs are tested against Python numpy under strict bit-level comparison: every IEEE 754 float bit must match exactly (336 tests, float64 + float32).
18+
All APIs are tested against Python numpy under strict bit-level comparison: every IEEE 754 float bit must match exactly (449 tests, float64 + float32).
1919

2020
**Bit-exact math** is achieved via an SVML bridge that resolves numpy's own transcendental functions (`__svml_exp8`, `__svml_sin8`, etc.) from the loaded `_multiarray_umath.so` at runtime. This guarantees that `exp`, `log`, `sin`, `cos`, `tan`, and all other transcendental functions produce the exact same bits as numpy. On platforms without AVX-512, the bridge falls back to `std::` (1‑ULP).
2121

@@ -80,12 +80,12 @@ Add `-Ipath/to/numpycpp` to your compiler flags and include the headers directly
8080
### Testing
8181

8282
The test suite verifies **bit-level precision alignment** between every C++ function and Python numpy.
83-
No tolerance, no `atol`/`rtol` — raw IEEE 754 bits must match exactly. 336 tests, float64 + float32.
83+
No tolerance, no `atol`/`rtol` — raw IEEE 754 bits must match exactly. 449 tests, float64 + float32.
8484

8585
```bash
8686
cd tests
8787
make # compile C++ test module
88-
make test # run all 336 tests (silent mode: only failures print)
88+
make test # run all 449 tests (silent mode: only failures print)
8989
```
9090

9191
To run with verbose output:
@@ -94,10 +94,41 @@ To run with verbose output:
9494
PYTHONPATH=tests:$PYTHONPATH python3 -m pytest tests/test_all.py -v
9595
```
9696

97+
### Compiler flags for bit-exact alignment
98+
99+
Achieving bit-identical results with numpy requires strict control over floating-point code generation.
100+
The Makefile applies the following flags:
101+
102+
```makefile
103+
CXXFLAGS ?= -std=c++17 -O2 -fPIC -fopenmp \
104+
-ffp-contract=off -ffloat-store \
105+
-mavx512f -mfma \
106+
-fno-builtin-exp -fno-builtin-log \
107+
-fno-builtin-sin -fno-builtin-cos \
108+
-fno-builtin-tan -fno-builtin-pow \
109+
-fno-builtin-sqrt -fno-builtin-atan2 \
110+
-fno-builtin-log2 -fno-builtin-log10 \
111+
-fno-builtin-asin -fno-builtin-acos \
112+
-fno-builtin-atan -fno-builtin-exp2
113+
LDFLAGS = -shared -ldl
114+
```
115+
116+
| Flag | Purpose |
117+
|------|---------|
118+
| `-ffp-contract=off` | Disable FMA contraction — numpy does not contract |
119+
| `-ffloat-store` | Prevent excess x87 precision in registers |
120+
| `-mavx512f -mfma` | Enable AVX-512 so the SVML bridge resolves numpy's own vector math library |
121+
| `-fno-builtin-<func>` | **Critical**: prevent GCC from replacing math calls with its built-in implementations. Without these, `exp`/`log`/`sin`/etc. will use libm instead of the SVML bridge, breaking bit-exact alignment |
122+
| `-ldl` | Required for `dlsym` at runtime to resolve SVML symbols from `_multiarray_umath.so` |
123+
124+
> **Why `-fno-builtin` matters**: GCC's built-in math functions produce different last-bits than numpy's SVML.
125+
> Even `std::exp` vs `__svml_exp8` differ by 1‑2 ULP for some inputs.
126+
> These flags ensure the SVML bridge intercepts every transcendental call, guaranteeing bit-identical output.
127+
97128
### Alignment status
98129

99130
The table below reflects the current bit-level parity between `numpycpp` C++ and Python numpy.
100-
All 336 tests pass under strict IEEE 754 bit comparison (float64 + float32).
131+
All 449 tests pass under strict IEEE 754 bit comparison (float64 + float32).
101132

102133
✅ = bit-exact on AVX-512 (SVML bridge active).
103134
🔶 = 1-ULP on non-AVX-512 (falls back to `std::` math).
@@ -145,7 +176,7 @@ numpycpp/
145176
│ └── einsum_py.h
146177
├── tests/ # bit-level precision tests + test module
147178
│ ├── module.cpp # pybind11 module for testing
148-
│ ├── test_all.py # single entry — all APIs, 336 tests, float64+float32
179+
│ ├── test_all.py # single entry — all APIs, 449 tests, float64+float32
149180
│ ├── conftest.py # silent-mode output suppression
150181
│ └── Makefile
151182
├── CMakeLists.txt # build & .deb packaging

0 commit comments

Comments
 (0)