Skip to content

Commit 8745fda

Browse files
author
peng.li24
committed
feat: add np.cumsum, np.squeeze, dtype-flexible creation templates
- Add cumsum to core.h (native 1D cumulative sum) - Add cumsum, squeeze pycpp wrappers to core_py.h - Add zeros_t<T>/ones_t<T>/full_t<T> template creation wrappers for dtype flexibility (pybind11 modules can bind e.g. zeros_f32) - Wire bindings in module.cpp - Add tests: test_cumsum (3 cases), test_squeeze (3 cases) - Test count: 466 → 468
1 parent a508bb4 commit 8745fda

6 files changed

Lines changed: 80 additions & 6 deletions

File tree

.github/workflows/ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ on:
88
branches: [master]
99

1010
jobs:
11-
# ---- Test: build module + run 460 precision tests --------------------------
11+
# ---- Test: build module + run 468 precision tests --------------------------
1212
test:
1313
runs-on: ubuntu-22.04
1414
steps:

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ We created `numpycpp` to keep NumPy's familiar usage patterns while letting C++
1515

1616
`numpycpp` is a **header-only C++ library** implementing numpy's core API (`numpy.*`, `numpy.linalg.*`, `numpy.einsum`) with **bit-level precision alignment**. Raw pointer + size interface. Zero external dependencies — pure C++17 standard library.
1717

18-
All APIs are tested against Python numpy under strict bit-level comparison: every IEEE 754 float bit must match exactly (460 tests, float64 + float32).
18+
All APIs are tested against Python numpy under strict bit-level comparison: every IEEE 754 float bit must match exactly (468 tests, float64 + float32).
1919

2020
**Bit-exact math** is achieved by resolving numpy's own math functions from `_multiarray_umath.so` at runtime. The SVML bridge auto-detects your CPU and selects the same path numpy uses: AVX‑512 SVML (`__svml_exp8`) when available, or scalar `npy_exp`/`npy_log`/etc. otherwise. AVX‑512 intrinsics are isolated behind `__attribute__((target))` — the binary is safe on any x86_64 CPU (no SIGILL). Every transcendental function produces the exact same IEEE 754 bits as numpy on **all architectures**.
2121

@@ -89,12 +89,12 @@ Add `-Ipath/to/numpycpp` to your compiler flags and include the headers directly
8989
### Testing
9090

9191
The test suite verifies **bit-level precision alignment** between every C++ function and Python numpy.
92-
No tolerance, no `atol`/`rtol` — raw IEEE 754 bits must match exactly. 460 tests, float64 + float32.
92+
No tolerance, no `atol`/`rtol` — raw IEEE 754 bits must match exactly. 468 tests, float64 + float32.
9393

9494
```bash
9595
cd tests
9696
make # compile C++ test module
97-
make test # run all 460 tests (silent mode: only failures print)
97+
make test # run all 468 tests (silent mode: only failures print)
9898
```
9999

100100
To run with verbose output:
@@ -142,7 +142,7 @@ LDFLAGS = -shared -ldl
142142
### Alignment status
143143

144144
The table below reflects the current bit-level parity between `numpycpp` C++ and Python numpy.
145-
All 460 tests pass under strict IEEE 754 bit comparison (float64 + float32).
145+
All 468 tests pass under strict IEEE 754 bit comparison (float64 + float32).
146146

147147
✅ = bit-exact on ALL architectures (SVML bridge with runtime CPU dispatch).
148148

@@ -189,7 +189,7 @@ numpycpp/
189189
│ └── einsum_py.h
190190
├── tests/ # bit-level precision tests + test module
191191
│ ├── module.cpp # pybind11 module for testing
192-
│ ├── test_all.py # single entry — all APIs, 460 tests, float64+float32
192+
│ ├── test_all.py # single entry — all APIs, 468 tests, float64+float32
193193
│ ├── conftest.py # silent-mode output suppression
194194
│ └── Makefile
195195
├── CMakeLists.txt # build & .deb packaging

numpy/core.h

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -738,6 +738,17 @@ inline void unwrap(const T* src, T* dst, size_t n, T discont = T(M_PI)) {
738738
}
739739
}
740740

741+
/// numpy.cumsum(a, axis=None, dtype=None, out=None)
742+
/// 1D cumulative sum: dst[i] = sum_{j=0}^{i} src[j]
743+
template<typename T>
744+
inline void cumsum(const T* src, T* dst, size_t n) {
745+
if (n == 0) return;
746+
dst[0] = src[0];
747+
for (size_t i = 1; i < n; ++i) {
748+
dst[i] = dst[i-1] + src[i];
749+
}
750+
}
751+
741752
// ============================================================================
742753
// astype conversions
743754
// ============================================================================

pycpp/core_py.h

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@ py::array_t<T> empty_like(const py::array_t<T>& arr) {
5454
}
5555

5656
/// numpy.zeros(shape, dtype=float, order='C', *, like=None)
57+
/// NOTE: convenient double default. For dtype flexibility, use zeros_t<T>(shape).
5758
inline py::array_t<double> zeros(const std::vector<py::ssize_t>& shape) {
5859
py::array_t<double> result(shape);
5960
zeros_like(static_cast<double*>(result.request().ptr), result.request().size);
@@ -74,6 +75,29 @@ inline py::array_t<double> full(const std::vector<py::ssize_t>& shape, double fi
7475
return result;
7576
}
7677

78+
// Template counterparts — dtype-flexible creation for pybind11 modules
79+
// that need float32 or other dtypes. Bind as e.g. m.def("zeros_f32", &numpy::zeros_t<float>);
80+
template<typename T>
81+
py::array_t<T> zeros_t(const std::vector<py::ssize_t>& shape) {
82+
py::array_t<T> result(shape);
83+
zeros_like(static_cast<T*>(result.request().ptr), result.request().size);
84+
return result;
85+
}
86+
87+
template<typename T>
88+
py::array_t<T> ones_t(const std::vector<py::ssize_t>& shape) {
89+
py::array_t<T> result(shape);
90+
ones_like(static_cast<T*>(result.request().ptr), result.request().size);
91+
return result;
92+
}
93+
94+
template<typename T>
95+
py::array_t<T> full_t(const std::vector<py::ssize_t>& shape, T fill_value) {
96+
py::array_t<T> result(shape);
97+
numpy::full(static_cast<T*>(result.request().ptr), result.request().size, fill_value);
98+
return result;
99+
}
100+
77101
// Bool specializations
78102
// NOTE: _bool suffix — dtype-specific wrappers; pybind11 cannot deduce template
79103
// argument from a Python dtype keyword, so each dtype needs its own binding.
@@ -913,6 +937,27 @@ inline py::array_t<double> unwrap(const py::array_t<double>& arr, double discont
913937
return result;
914938
}
915939

940+
/// numpy.cumsum(a, axis=None) — 1D cumulative sum
941+
inline py::array_t<double> cumsum(const py::array_t<double>& arr) {
942+
auto buf = arr.request();
943+
py::array_t<double> result(buf.shape);
944+
numpy::cumsum(static_cast<const double*>(buf.ptr),
945+
static_cast<double*>(result.request().ptr), buf.size);
946+
return result;
947+
}
948+
949+
/// numpy.squeeze(a, axis=None) — remove axes of length 1
950+
inline py::array_t<double> squeeze(const py::array_t<double>& arr) {
951+
auto buf = arr.request();
952+
std::vector<py::ssize_t> new_shape;
953+
for (auto s : buf.shape)
954+
if (s != 1) new_shape.push_back(s);
955+
if (new_shape.empty()) new_shape.push_back(1);
956+
py::array_t<double> result(new_shape);
957+
std::memcpy(result.request().ptr, buf.ptr, buf.size * sizeof(double));
958+
return result;
959+
}
960+
916961
/// numpy.intersect1d(ar1, ar2, assume_unique=False, return_indices=False)
917962
inline py::array_t<double> intersect1d(const py::array_t<double>& a, const py::array_t<double>& b) {
918963
auto ba = a.request(), bb = b.request();

tests/module.cpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -218,6 +218,8 @@ PYBIND11_MODULE(numpycpp, m) {
218218
m.def("intersect1d", static_cast<py::array_t<double>(*)(const py::array_t<double>&, const py::array_t<double>&)>(&numpy::intersect1d));
219219
m.def("flatnonzero", static_cast<py::array_t<py::ssize_t>(*)(const py::array_t<double>&)>(&numpy::flatnonzero));
220220
m.def("unwrap", static_cast<py::array_t<double>(*)(const py::array_t<double>&, double)>(&numpy::unwrap), py::arg("arr"), py::arg("discont") = M_PI);
221+
m.def("cumsum", static_cast<py::array_t<double>(*)(const py::array_t<double>&)>(&numpy::cumsum));
222+
m.def("squeeze", static_cast<py::array_t<double>(*)(const py::array_t<double>&)>(&numpy::squeeze));
221223

222224
// -- Interpolation -----------------------------------------------------
223225
m.def("interp", static_cast<py::array_t<double>(*)(const py::array_t<double>&, const py::array_t<double>&, const py::array_t<double>&)>(&numpy::interp));

tests/test_all.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -710,6 +710,22 @@ def test_unwrap(cpp):
710710
a2 = np.array([0.0, 2.5, 5.0, -2.5, -5.0]) * np.pi
711711
assert_bit_aligned(cpp.unwrap(a2), np.unwrap(a2), "unwrap_large")
712712

713+
def test_cumsum(cpp):
714+
a = np.array([1.0, 2.0, 3.0, 4.0, 5.0])
715+
assert_bit_aligned(cpp.cumsum(a), np.cumsum(a), "cumsum")
716+
a2 = np.array([0.1, 0.2, 0.3, 0.4, 0.5])
717+
assert_bit_aligned(cpp.cumsum(a2), np.cumsum(a2), "cumsum_frac")
718+
a3 = np.array([-1.0, 2.0, -3.0, 4.0])
719+
assert_bit_aligned(cpp.cumsum(a3), np.cumsum(a3), "cumsum_neg")
720+
721+
def test_squeeze(cpp):
722+
a = np.array([1.0, 2.0, 3.0]).reshape(3, 1)
723+
assert_bit_aligned(cpp.squeeze(a), np.squeeze(a), "squeeze_col")
724+
a2 = np.array([1.0, 2.0, 3.0]).reshape(1, 3)
725+
assert_bit_aligned(cpp.squeeze(a2), np.squeeze(a2), "squeeze_row")
726+
a3 = np.array([1.0, 2.0, 3.0, 4.0]).reshape(1, 2, 1, 2, 1)
727+
assert_bit_aligned(cpp.squeeze(a3), np.squeeze(a3), "squeeze_multi")
728+
713729
def test_intersect1d(cpp):
714730
a, b = np.array([1.0, 2.0, 3.0, 4.0]), np.array([3.0, 4.0, 5.0, 6.0])
715731
cpp_r = np.sort(np.asarray(cpp.intersect1d(a, b)))

0 commit comments

Comments
 (0)