Skip to content

Commit 49647b7

Browse files
author
peng.li24
committed
feat: add cbrt, expm1, log1p — bit-exact with numpy via SVML bridge
SVML bridge auto-dispatches __svml_cbrt8/__svml_expm18/__svml_log1p8 on AVX-512, or falls back to dlsym npy_* scalars otherwise. Same pattern as exp/log/sin/cos/etc. - svml_bridge.h: SVML + npy_* fallback + dispatchers + svml_impl - core.h: cbrt/expm1/log1p array functions (NUMPY_UNROLL4) - pycpp: DEF_ELEMWISE wrappers - tests: 24 new tests (3 funcs × 4 sizes × 2 dtypes) - test count: 476 → 500 - Makefile: -fno-builtin-cbrt -fno-builtin-expm1 -fno-builtin-log1p Verified: 0/100000 random diffs for all 3 funcs in f64 and f32.
1 parent 9fe0a2b commit 49647b7

8 files changed

Lines changed: 64 additions & 9 deletions

File tree

.github/workflows/ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ on:
88
branches: [master]
99

1010
jobs:
11-
# ---- Test: build module + run 476 precision tests --------------------------
11+
# ---- Test: build module + run 500 precision tests --------------------------
1212
test:
1313
runs-on: ubuntu-22.04
1414
steps:

README.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ We created `numpycpp` to keep NumPy's familiar usage patterns while letting C++
1515

1616
`numpycpp` is a **header-only C++ library** implementing numpy's core API (`numpy.*`, `numpy.linalg.*`, `numpy.einsum`) with **bit-level precision alignment**. Raw pointer + size interface. Zero external dependencies — pure C++17 standard library.
1717

18-
All APIs are tested against Python numpy under strict bit-level comparison: every IEEE 754 float bit must match exactly (476 tests, float64 + float32).
18+
All APIs are tested against Python numpy under strict bit-level comparison: every IEEE 754 float bit must match exactly (500 tests, float64 + float32).
1919

2020
**Bit-exact math** is achieved by resolving numpy's own math functions from `_multiarray_umath.so` at runtime. The SVML bridge auto-detects your CPU and selects the same path numpy uses: AVX‑512 SVML (`__svml_exp8`) when available, or scalar `npy_exp`/`npy_log`/etc. otherwise. AVX‑512 intrinsics are isolated behind `__attribute__((target))` — the binary is safe on any x86_64 CPU (no SIGILL). Every transcendental function produces the exact same IEEE 754 bits as numpy on **all architectures**.
2121

@@ -89,12 +89,12 @@ Add `-Ipath/to/numpycpp` to your compiler flags and include the headers directly
8989
### Testing
9090

9191
The test suite verifies **bit-level precision alignment** between every C++ function and Python numpy.
92-
No tolerance, no `atol`/`rtol` — raw IEEE 754 bits must match exactly. 476 tests, float64 + float32.
92+
No tolerance, no `atol`/`rtol` — raw IEEE 754 bits must match exactly. 500 tests, float64 + float32.
9393

9494
```bash
9595
cd tests
9696
make # compile C++ test module
97-
make test # run all 476 tests (silent mode: only failures print)
97+
make test # run all 500 tests (silent mode: only failures print)
9898
```
9999

100100
To run with verbose output:
@@ -118,7 +118,9 @@ CXXFLAGS ?= -std=c++17 -O2 -fPIC -fopenmp \
118118
-fno-builtin-sqrt -fno-builtin-atan2 \
119119
-fno-builtin-log2 -fno-builtin-log10 \
120120
-fno-builtin-asin -fno-builtin-acos \
121-
-fno-builtin-atan -fno-builtin-exp2
121+
-fno-builtin-atan -fno-builtin-exp2 \
122+
-fno-builtin-cbrt -fno-builtin-expm1 \
123+
-fno-builtin-log1p
122124
LDFLAGS = -shared -ldl
123125
```
124126

@@ -142,7 +144,7 @@ LDFLAGS = -shared -ldl
142144
### Alignment status
143145

144146
The table below reflects the current bit-level parity between `numpycpp` C++ and Python numpy.
145-
All 476 tests pass under strict IEEE 754 bit comparison (float64 + float32).
147+
All 500 tests pass under strict IEEE 754 bit comparison (float64 + float32).
146148

147149
✅ = bit-exact on ALL architectures (SVML bridge with runtime CPU dispatch).
148150

@@ -158,7 +160,7 @@ All 476 tests pass under strict IEEE 754 bit comparison (float64 + float32).
158160
| Setops / interp ||| isin, intersect1d, interp, safe_divide |
159161
| Access / convert ||| array_get, asarray, to_vector |
160162
| **Math — element-wise** (sqrt, abs, sign, clip, round, floor, ceil, degrees, radians) ||| Pure C++, no libm dependency |
161-
| **Math — transcendental** (exp, log, sin, cos, tan, asin, acos, atan, log10, log2, exp2) ||| npy_* scalar functions via dlsym, bit-exact on all archs |
163+
| **Math — transcendental** (exp, log, sin, cos, tan, asin, acos, atan, log10, log2, exp2, cbrt, expm1, log1p) ||| dlsym npy_* or SVML via bridge, bit-exact on all archs |
162164
| **Math — power** ||| npy_pow / npy_powf via SVML bridge |
163165
| **Math — hypot** ||| std::hypot — bit-exact (numpy matches libm) |
164166
| **Math — atan2** ||| npy_atan2 / npy_atan2f via SVML bridge |
@@ -190,7 +192,7 @@ numpycpp/
190192
│ └── einsum_py.h
191193
├── tests/ # bit-level precision tests + test module
192194
│ ├── module.cpp # pybind11 module for testing
193-
│ ├── test_all.py # single entry — all APIs, 476 tests, float64+float32
195+
│ ├── test_all.py # single entry — all APIs, 500 tests, float64+float32
194196
│ ├── conftest.py # silent-mode output suppression
195197
│ └── Makefile
196198
├── CMakeLists.txt # build & .deb packaging

numpy/core.h

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,24 @@ inline void tan(const T* src, T* dst, size_t n) {
113113
NUMPY_UNROLL4(i, dst[i] = detail::tan(src[i]));
114114
}
115115

116+
/// numpy.cbrt(x, /, out=None, *, where=True, ...)
117+
template<typename T>
118+
inline void cbrt(const T* src, T* dst, size_t n) {
119+
NUMPY_UNROLL4(i, dst[i] = detail::cbrt(src[i]));
120+
}
121+
122+
/// numpy.expm1(x, /, out=None, *, where=True, ...)
123+
template<typename T>
124+
inline void expm1(const T* src, T* dst, size_t n) {
125+
NUMPY_UNROLL4(i, dst[i] = detail::expm1(src[i]));
126+
}
127+
128+
/// numpy.log1p(x, /, out=None, *, where=True, ...)
129+
template<typename T>
130+
inline void log1p(const T* src, T* dst, size_t n) {
131+
NUMPY_UNROLL4(i, dst[i] = detail::log1p(src[i]));
132+
}
133+
116134
/// numpy.power(x1, x2, /, out=None, *, where=True, ...)
117135
template<typename T>
118136
inline void power(const T* src, T* dst, size_t n, T exponent) {

numpy/svml_bridge.h

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,9 @@ NUMPY_SVML_F64(atan, "__svml_atan8", "npy_atan")
125125
NUMPY_SVML_F64(log10, "__svml_log108", "npy_log10")
126126
NUMPY_SVML_F64(log2, "__svml_log28", "npy_log2")
127127
NUMPY_SVML_F64(exp2, "__svml_exp28", "npy_exp2")
128+
NUMPY_SVML_F64(cbrt, "__svml_cbrt8", "npy_cbrt")
129+
NUMPY_SVML_F64(expm1, "__svml_expm18", "npy_expm1")
130+
NUMPY_SVML_F64(log1p, "__svml_log1p8", "npy_log1p")
128131

129132
NUMPY_SVML_F32(tan, "__svml_tanf16", "npy_tanf")
130133
NUMPY_SVML_F32(asin, "__svml_asinf16", "npy_asinf")
@@ -133,6 +136,9 @@ NUMPY_SVML_F32(atan, "__svml_atanf16", "npy_atanf")
133136
NUMPY_SVML_F32(log10, "__svml_log10f16","npy_log10f")
134137
NUMPY_SVML_F32(log2, "__svml_log2f16", "npy_log2f")
135138
NUMPY_SVML_F32(exp2, "__svml_exp2f16", "npy_exp2f")
139+
NUMPY_SVML_F32(cbrt, "__svml_cbrtf16", "npy_cbrtf")
140+
NUMPY_SVML_F32(expm1, "__svml_expm1f16","npy_expm1f")
141+
NUMPY_SVML_F32(log1p, "__svml_log1pf16","npy_log1pf")
136142

137143
// pow / atan2 — SVML 2-arg
138144
__attribute__((target("avx512f")))
@@ -195,6 +201,9 @@ NUMPY_NPY_F64(atan, std::atan(x))
195201
NUMPY_NPY_F64(log10, std::log10(x))
196202
NUMPY_NPY_F64(log2, std::log2(x))
197203
NUMPY_NPY_F64(exp2, std::exp2(x))
204+
NUMPY_NPY_F64(cbrt, std::cbrt(x))
205+
NUMPY_NPY_F64(expm1, std::expm1(x))
206+
NUMPY_NPY_F64(log1p, std::log1p(x))
198207

199208
// f32: fallback via numpy's own polynomial approximations
200209
// f32 exp/log/sin/cos: numpy's own polynomial approximations (npy_math_float.h)
@@ -211,6 +220,9 @@ NUMPY_NPY_F32(atan, std::atan(x))
211220
NUMPY_NPY_F32(log10, std::log10(x))
212221
NUMPY_NPY_F32(log2, std::log2(x))
213222
NUMPY_NPY_F32(exp2, std::exp2(x))
223+
NUMPY_NPY_F32(cbrt, std::cbrt(x))
224+
NUMPY_NPY_F32(expm1, std::expm1(x))
225+
NUMPY_NPY_F32(log1p, std::log1p(x))
214226

215227
// hypot — numpy matches libm bit-exact for both f32 and f64
216228
inline double hypot_f64(double x, double y) { return std::hypot(x, y); }
@@ -271,13 +283,19 @@ DISPATCH_F64(atan)
271283
DISPATCH_F64(log10)
272284
DISPATCH_F64(log2)
273285
DISPATCH_F64(exp2)
286+
DISPATCH_F64(cbrt)
287+
DISPATCH_F64(expm1)
288+
DISPATCH_F64(log1p)
274289
DISPATCH_F32(tan)
275290
DISPATCH_F32(asin)
276291
DISPATCH_F32(acos)
277292
DISPATCH_F32(atan)
278293
DISPATCH_F32(log10)
279294
DISPATCH_F32(log2)
280295
DISPATCH_F32(exp2)
296+
DISPATCH_F32(cbrt)
297+
DISPATCH_F32(expm1)
298+
DISPATCH_F32(log1p)
281299

282300
// f32 exp/log/sin/cos: numpy uses its own polynomial approximations
283301
// (npy_math_float.h), NOT SVML. These are bit-exact on all architectures.
@@ -328,6 +346,9 @@ template<> struct svml_impl<T> { \
328346
static T log10(T x){ return log10_##suff(x); } \
329347
static T log2(T x) { return log2_##suff(x); } \
330348
static T exp2(T x) { return exp2_##suff(x); } \
349+
static T cbrt(T x) { return cbrt_##suff(x); } \
350+
static T expm1(T x){ return expm1_##suff(x); } \
351+
static T log1p(T x){ return log1p_##suff(x); } \
331352
static T sqrt(T x) { return sqrt_##suff(x); } \
332353
static T pow(T x, T e) { return pow_##suff(x, e); } \
333354
static T atan2(T y, T x) { return atan2_##suff(y, x); } \
@@ -353,6 +374,9 @@ NUMPY_SVML_D1(atan)
353374
NUMPY_SVML_D1(log10)
354375
NUMPY_SVML_D1(log2)
355376
NUMPY_SVML_D1(exp2)
377+
NUMPY_SVML_D1(cbrt)
378+
NUMPY_SVML_D1(expm1)
379+
NUMPY_SVML_D1(log1p)
356380
NUMPY_SVML_D1(sqrt)
357381
#undef NUMPY_SVML_D1
358382

pycpp/core_py.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -282,6 +282,12 @@ DEF_ELEMWISE(sin)
282282
DEF_ELEMWISE(cos)
283283
/// numpy.tan(x, /, out=None, *, where=True, ...)
284284
DEF_ELEMWISE(tan)
285+
/// numpy.cbrt(x, /, out=None, *, where=True, ...)
286+
DEF_ELEMWISE(cbrt)
287+
/// numpy.expm1(x, /, out=None, *, where=True, ...)
288+
DEF_ELEMWISE(expm1)
289+
/// numpy.log1p(x, /, out=None, *, where=True, ...)
290+
DEF_ELEMWISE(log1p)
285291
/// numpy.log10(x, /, out=None, *, where=True, ...)
286292
DEF_ELEMWISE(log10)
287293
/// numpy.log2(x, /, out=None, *, where=True, ...)

tests/Makefile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,8 @@ CXXFLAGS ?= -std=c++17 -O2 -fPIC -fopenmp -ffp-contract=off \
1010
-fno-builtin-cos -fno-builtin-tan -fno-builtin-pow \
1111
-fno-builtin-sqrt -fno-builtin-atan2 -fno-builtin-log2 \
1212
-fno-builtin-log10 -fno-builtin-asin -fno-builtin-acos \
13-
-fno-builtin-atan -fno-builtin-exp2
13+
-fno-builtin-atan -fno-builtin-exp2 \
14+
-fno-builtin-cbrt -fno-builtin-expm1 -fno-builtin-log1p
1415
INCLUDES = -I.. -I../pycpp $(shell python3 -m pybind11 --includes) $(shell pkg-config --cflags eigen3 2>/dev/null || echo)
1516
LDFLAGS = -shared -ldl
1617

tests/module.cpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,7 @@ PYBIND11_MODULE(numpycpp, m) {
7676
// -- Element-wise math -------------------------------------------------
7777
BIND_F1(sqrt); BIND_F1(abs); BIND_F1(exp); BIND_F1(log);
7878
BIND_F1(sin); BIND_F1(cos); BIND_F1(tan);
79+
BIND_F1(cbrt); BIND_F1(expm1); BIND_F1(log1p);
7980
BIND_F1(log10); BIND_F1(log2); BIND_F1(arcsin); BIND_F1(arccos); BIND_F1(arctan);
8081
BIND_F1(round); BIND_F1(floor); BIND_F1(ceil);
8182
BIND_F1(degrees); BIND_F1(radians); BIND_F1(sign);

tests/test_all.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -155,6 +155,9 @@ def dtype(request):
155155
("sin", np.sin, None, [(100, 42), (1000, 7), (10000, 7), (100000, 7)]),
156156
("cos", np.cos, None, [(100, 42), (1000, 7), (10000, 7), (100000, 7)]),
157157
("tan", np.tan, lambda a: a * 0.5, [(100, 42), (1000, 7), (10000, 7), (100000, 7)]),
158+
("cbrt", np.cbrt, None, [(100, 42), (1000, 7), (10000, 7), (100000, 7)]),
159+
("expm1", np.expm1, lambda a: a * 2.0, [(100, 42), (1000, 7), (10000, 7), (100000, 7)]),
160+
("log1p", np.log1p, lambda a: np.abs(a) + 0.1, [(100, 42), (1000, 7), (10000, 7), (100000, 7)]),
158161
("log10", np.log10, lambda a: np.abs(a) + 0.1, [(100, 42), (1000, 7), (10000, 7), (100000, 7)]),
159162
("log2", np.log2, lambda a: np.abs(a) + 0.1, [(100, 42), (1000, 7), (10000, 7), (100000, 7)]),
160163
("arcsin", np.arcsin, lambda a: np.clip(a * 0.5, -1, 1), [(100, 42), (1000, 7), (10000, 7), (100000, 7)]),

0 commit comments

Comments
 (0)