Adaptive Pauli-Lindbladian shim + adaptive-evolution demo#98
Adaptive Pauli-Lindbladian shim + adaptive-evolution demo#98AlexSchuckert wants to merge 26 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds an adjoint Pauli-Lindbladian “shim” (Rust + PyO3) and a Python-facing wrapper to enable direct Heisenberg-picture time evolution of observables in an adaptive Pauli-string basis, plus a demo and cross-checking tests.
Changes:
- Introduces
LindbladSpec(Rust) implementingaction,leakage, and sparsegeneratorover bit-packed Pauli words with caching + Rayon parallelism. - Adds
ppvm.Lindbladian(Python) wrapper with both ndarray-native (*_arr) and string-keyed convenience APIs, with lazy SciPy import for generator construction. - Adds an adaptive-evolution demo and a new test suite cross-checking shim results against a dense reference implementation.
Reviewed changes
Copilot reviewed 8 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| ppvm-python/uv.lock | Adds NumPy as a declared dependency in the lock. |
| ppvm-python/test/test_lindblad.py | New tests validating action/leakage/generator vs a dense reference (uses SciPy). |
| ppvm-python/src/ppvm/lindblad.py | New Python wrapper for the native Lindbladian shim (string + ndarray APIs; lazy SciPy import). |
| ppvm-python/src/ppvm/init.py | Re-exports Lindbladian from the new module. |
| ppvm-python/pyproject.toml | Declares NumPy as a runtime dependency for the wrapper. |
| ppvm-python/demo/lindblad_adaptive.py | New end-to-end demo of adaptive basis growth + expm-based stepping. |
| crates/ppvm-python-native/src/lindblad.rs | New Rust implementation of the Lindbladian primitives with caching and parallel basis loops. |
| crates/ppvm-python-native/src/lib.rs | Exposes LindbladSpec from the PyO3 module. |
| crates/ppvm-python-native/Cargo.toml | Adds native dependencies (dashmap/fxhash/numpy/rayon) needed by the shim. |
| Cargo.lock | Locks new Rust dependencies introduced by the native shim. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def string_to_codes(s: str, n_qubits: int) -> np.ndarray: | ||
| """Encode a Pauli string ``"IXYZ..."`` as a length-``n_qubits`` uint8 array.""" | ||
| if len(s) != n_qubits: | ||
| raise ValueError(f"Pauli string {s!r} has length {len(s)} != n_qubits {n_qubits}") | ||
| return np.array([_PAULI_CODE[c] for c in s], dtype=np.uint8) |
There was a problem hiding this comment.
Resolved: _string_to_codes now strips _ and raises a ValueError naming the bad character and expected alphabet (matching the Rust parser), instead of letting a KeyError propagate.
| //! Rust shim for direct Pauli-Lindbladian time evolution on an adaptive | ||
| //! Pauli-string basis. See `xy-experiments/main_k_adaptive.py` for the | ||
| //! Python driver that owns the linear algebra and basis adaptation. |
There was a problem hiding this comment.
Resolved: the stale xy-experiments/main_k_adaptive.py reference is gone; the in-tree driver is ppvm-python/demo/lindblad_adaptive.py.
| let local: Vec<FxHashMap<Word, f64>> = basis_words | ||
| .par_iter() | ||
| .zip(coeffs_view.par_iter()) | ||
| .map(|(p, &c)| { | ||
| let mut m: FxHashMap<Word, f64> = FxHashMap::default(); | ||
| let mut s1 = Vec::new(); | ||
| let mut s2 = Vec::new(); | ||
| accumulate_action(self, p, c, &mut m, &mut s1, &mut s2); | ||
| m | ||
| }) | ||
| .collect(); | ||
|
|
||
| let mut merged: FxHashMap<Word, f64> = FxHashMap::default(); | ||
| for m in local { | ||
| for (k, v) in m { | ||
| if in_basis.contains_key(&k) || protected_set.contains_key(&k) { | ||
| continue; | ||
| } | ||
| *merged.entry(k).or_insert(0.0) += v; | ||
| } | ||
| } |
There was a problem hiding this comment.
Addressed in b07f75d: leakage/leakage_with_prune now fold into thread-local FxHashMaps and merge via Rayon reduce, so the Vec<HashMap> of per-element intermediates is no longer materialized. Keys and summed values are identical (FP reduction order can differ at ~1e-16, well within the test tolerances).
| let basis_words = decode_basis(&basis_view, self.n_qubits)?; | ||
| let index: FxHashMap<Word, u32> = basis_words | ||
| .iter() | ||
| .enumerate() | ||
| .map(|(i, w)| (*w, i as u32)) | ||
| .collect(); |
There was a problem hiding this comment.
Resolved: the generator() binding (and the pc_step paths) now call assert_basis_unique, which returns a ValueError naming the duplicate row indices before any matrix is built.
| requires-python = ">=3.10" | ||
| dependencies = [ | ||
| "bloqade-circuit>=0.14.1", | ||
| "kirin>=0.0.15", | ||
| "numpy>=1.26", | ||
| "ppvm-python-native", | ||
| ] |
There was a problem hiding this comment.
this is valid concern, please address
There was a problem hiding this comment.
Resolved: scipy>=1.13 is now an explicit entry in the dev dependency group in ppvm-python/pyproject.toml (runtime ppvm stays scipy-free).
|
|
@copilot fix the license header for Alex :-) |
|
@AlexSchuckert wondering if we can have some information about how you verify the correctness of this to help reviewing? |
Fixed in commit 6209cb8: adjusted |
| /// then iteratively apply a degree-`m` Horner-form Taylor polynomial | ||
| /// `s` times. Inside each Horner sweep we terminate early once two | ||
| /// successive Taylor terms have norm below `opts.tol · ‖F‖`. | ||
| pub fn expm_multiply(a: &CsrMatrix, t: f64, b: &[f64], opts: ExpmOpts) -> Vec<f64> { |
There was a problem hiding this comment.
I'm wondering if this exists in a rust crate already?
There was a problem hiding this comment.
No Rust crate provides expm_multiply in the matrix-free action form (Al-Mohy & Higham, scaling-and-squaring on A · v). The expm crate exists but only handles dense matrices via the Padé approximation — not useful for our sparse Heisenberg-picture generator. SciPy and Julia (KrylovKit / Expokit ports) both have it; we follow SciPy's expm_multiply algorithm and θ_m table directly. It's ~200 LOC of self-contained numerics, verified by expm_serial_matches_parallel and cross-checked against scipy.sparse.linalg.expm_multiply in test_pc_step_rust_matches_python_pc. Happy to swap if a maintained crate shows up.
There was a problem hiding this comment.
Resolved: the matrix-exponential action is now provided by the external quspin-expm crate (QuSpin-rust), used matrix-free. The hand-rolled implementation has been removed; expm.rs is reduced to just the (m, s) selection table.
| //! the SpMV path itself scales before debugging higher-level paths. | ||
|
|
||
| use ppvm_lindblad::CsrMatrix; | ||
| use std::time::Instant; |
There was a problem hiding this comment.
can we actually turn this into benchmark like other crates instead of using std::time to measure the performance? it will be cleaner to use the same setup as other crates like those in ppvm-runtime
There was a problem hiding this comment.
Resolved: the examples/spmv_scaling.rs timing example has been removed -- the real-CSR SpMV it exercised no longer exists (evolution is matrix-free via quspin-expm).
|
|
||
| L_op.clear_cache() # action cache is per-k since each k starts a fresh basis | ||
|
|
||
| for nt in range(c.steps + 1): |
There was a problem hiding this comment.
I understand this might work... but it would be nice to split these nested loops into functions to make things easier to read
There was a problem hiding this comment.
Addressed: the demo is now factored into helper functions -- build_xy_dephasing, init_mode, add_leakage_to_basis, prune_and_cap, and run_mode -- rather than inline nested loops.
| for k in set(got_a) | set(got_b): | ||
| assert got_a.get(k, 0.0) == got_b.get(k, 0.0), ( | ||
| f"lincomb fast path mismatch at p={p!r}: {got_a} vs {got_b}" | ||
| ) |
There was a problem hiding this comment.
a 700 loc python file is a hard reject
There was a problem hiding this comment.
Resolved: the monolithic test_lindblad.py has been split into ppvm-python/test/lindblad/ (test_action_generator.py, test_adaptive_pc.py, test_non_hermitian_jumps.py, test_pc_step_rust.py, plus _helpers.py).
| requires-python = ">=3.10" | ||
| dependencies = [ | ||
| "bloqade-circuit>=0.14.1", | ||
| "kirin>=0.0.15", | ||
| "numpy>=1.26", | ||
| "ppvm-python-native", | ||
| ] |
There was a problem hiding this comment.
this is valid concern, please address
Correctness
- generator()/pc_step()/pc_step_timed(): reject duplicate basis Pauli words
at the PyO3 boundary with a clear ValueError. The underlying row-index
map silently overwrote earlier rows on collision, producing a wrong
sparse generator. ppvm-lindblad now factors the index-build with
uniqueness debug_assert into `build_basis_index(...)`, called from both
`generator` and `generator_csr`. New regression test
`test_generator_rejects_duplicate_basis`.
- `_string_to_codes`: raise `ValueError` naming the bad character; strip
`_` separators to match the Rust `parse_pauli_string` parser.
API surface
- Rename the four codec helpers (string_to_codes, codes_to_string,
basis_to_codes, codes_to_basis) to `_`-prefixed module-private names.
Dependencies — drop scipy from runtime + tests
- Lindbladian.generator{,_arr} now return COO triples (rows, cols, vals)
instead of a scipy.sparse.csc_matrix. Users wanting scipy can wrap;
see docstring.
- Tests replace `scipy.sparse.linalg.expm_multiply` with a numpy
eigendecomposition reference (`expm_mv_dense`).
- pyproject.toml: scipy removed from `dev`; moved to a new `demo` group
(demos still use it for the reference matrix exponential).
Test layout
- Split the 723-line `test/test_lindblad.py` into `test/lindblad/`:
* `_helpers.py` — shared Pauli-table reference, dense Liouvillian
reference, bilinear NN-XY+Z-dephasing solver, adaptive PC drivers
* `test_action_generator.py` — Hermitian-Pauli action/leakage/generator
* `test_non_hermitian_jumps.py` — non-Hermitian (σ-/σ+/thermal) jumps
* `test_adaptive_pc.py` — adaptive PC convergence vs bilinear ref
* `test_pc_step_rust.py` — pure-Rust pc_step parity + cubic dt-scaling
Each test file now ≤ 121 lines.
Demos
- Both `demo/lindblad_adaptive.py` and `demo/lindblad_pc_scaling.py`
rewritten as jupytext percent-format notebooks with markdown narrative
cells and end-of-notebook plots, replacing CLI arg parsing with a
top-of-file parameter cell.
- `lindblad_adaptive.py`: split the monolithic per-step body into
`init_mode`, `add_leakage_to_basis`, `prune_and_cap`, `run_mode`;
dropped the unnecessary `target_arr = basis_arr.copy()` /
`protected_arr = basis_arr.copy()` calls (nothing mutates basis_arr
in place — `vstack`/slice rebind).
- `lindblad_pc_scaling.py`: dropped the `importlib.util.spec_from_file_location`
bypass that was working around `demo/stim.py` shadowing the real `stim`
package on `python demo/foo.py` invocations.
- Rename `demo/stim.py` → `demo/stim_msd.py` (it's an MSD-circuit demo
that happened to share a name with the `stim` package).
Benches
- Convert `examples/spmv_scaling.rs` (std::time + manual print) to a
proper criterion `benches/spmv_scaling.rs` with one entry per thread
count `{1, 2, 3, 4, 6, 8, 10}`, each running SpMV inside a
`rayon::ThreadPool::install`. Added `criterion` to dev-deps +
`[[bench]]` to `ppvm-lindblad/Cargo.toml`.
CSR + expm_multiply replacement (left as-is)
- Surveyed `sprs`, `nalgebra-sparse`, `russell_sparse`, `faer-rs`: all
ship CSR + serial SpMV but none offer rayon-parallel SpMV with the
nnz-threshold gating used here. No Rust crate provides Al-Mohy & Higham
`expm_multiply` (matrix-free action). Decision documented in PR review
replies; happy to revisit if a suitable crate appears.
Verification
- cargo test -p ppvm-lindblad --lib: 10/10
- pytest test/: 115/115
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…olution Adds a LindbladSpec PyClass that exposes three primitives needed for direct Heisenberg-picture evolution of a generic Pauli Lindbladian (Hermitian Hamiltonian sum + Hermitian Pauli jumps) on an adaptive Pauli-string basis: - action(p): L*(p) for a single Pauli string - leakage(basis, coeffs): off-basis component of L*(Sum c_j p_j) - generator(basis): sparse generator matrix in COO form The hot path stores Pauli words as packed [u8; W] X/Z bit arrays and runs a single fused per-byte multiply that computes both r = h XOR p and the product phase in one pass. Active-site iteration (per-qubit inverted index of which H/jump terms touch each qubit) restricts the inner loop to non-trivially-acting terms; an action cache (DashMap, FxHash) memoises per-input contribution lists; rayon parallelises the basis loops. The numpy boundary takes uint8 code arrays (0=I 1=X 2=Z 3=Y) end-to-end, so the Python driver never builds per-row strings on the hot path. A thin Python wrapper (ppvm.Lindbladian) exposes both ndarray and string-keyed APIs for callers who don't need the ndarray fast path. End-to-end agreement with a pure-Python reference is within 1e-16 in max |Ck|. At L=51, kmax=1, dt=0.2, 20 steps, the shim runs in 17s vs 481s for the pure-Python adaptive driver (~28x), with the action cache and ndarray-native wrapper jointly responsible for the speedup over naive Rust.
- demo/lindblad_adaptive.py: adaptive Pauli-Lindbladian time evolution on a leakage-grown Pauli-string basis (all-to-all XY + single-site Z dephasing), with an optional predictor-corrector (lifts the dt-error from O(dt^2) to O(dt^3)) and a discarded-weight a-posteriori error monitor. Exercises ppvm.Lindbladian (action / leakage / generator) end to end. - test/test_lindblad.py: cross-check action / leakage / generator / the protected set against an independent dense Pauli-multiplication-table reference, on XY+Z-dephasing and TFIM+X-dephasing Lindbladians. - Declare numpy as a runtime dependency (the shim wrapper uses it) and import SciPy lazily inside generator()/generator_arr(); `import ppvm` and the action/leakage primitives no longer require SciPy -- only the sparse-matrix convenience does.
…e-Rust PC step
Split the adjoint Pauli-Lindbladian algorithm out of `ppvm-python-native`
into a standalone `ppvm-lindblad` crate backed by `PauliWord<[u64; 2]>`
from `ppvm-runtime`, and extend it with:
- Non-Hermitian Pauli-sum jumps via `JumpInput` / `JumpKind::{HermitianPauli,
General}`. The general path implements the full GKSL sandwich
`Σ_{a,b} λ_a* λ_b P_a O P_b - 1/2 {L†L, O}` with the `L†L` Pauli expansion
precomputed at construction. Same fast diagonal path for the common
Hermitian-Pauli jump case.
- Pure-Rust matrix-exponential action `expm_multiply` implementing Al-Mohy
& Higham (2011) Algorithm 3.2: degree-`m` Horner-form Taylor with
scaling-and-squaring, `(m, s)` from a precomputed `θ_m` table, optional
early termination. Backed by a minimal CSR matrix type with serial and
rayon-parallel SpMV (parallel above a configurable nnz threshold).
- `LindbladSpec::pc_step` runs the entire predictor-corrector step in Rust
(first-hop leakage expansion → predictor expm → second-hop leakage
expansion → corrector expm), so the Python driver no longer needs scipy
for the inner loop.
- `pc_step_timed` returns a per-phase microsecond breakdown
(leakage1/expand1/gencsr1/expm1/...) for profiling parallel scaling.
- `num_threads: Option<usize>` parameter wraps the entire PC step in a
freshly-built rayon `pool.install`, isolating the requested core count
for benchmarking.
- Performance: parallel-scatter CSR build (skips `from_triplets`'s
sequential count/scatter); `rayon::map_init` thread-local scratch
(hashmap + scratch vectors reused across rayon tasks instead of
per-task allocations); hash-only `in_basis`/`protected_set` membership
tables (`FxHashMap<u64, ()>` instead of `<Word, ()>` shrinks the
working set ~6× and keeps it in L2/L3 past basis 10⁶).
At L=51, long-range XY α=1, γ=1, T=1 (basis grows to 577K), 20 PC steps
runs end-to-end in ~512 s wall on 4 threads — down from ~734 s
pre-optimisation (1.43× faster).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…e tests, scaling demo
Wrapper updates and tests for the new `ppvm-lindblad` crate.
API additions on `Lindbladian`:
- `pc_step` / `pc_step_arr`: thin Python wrappers around the pure-Rust
predictor-corrector step. Expose `parallel_threshold`, `num_threads`,
and `expm_tol` knobs for benchmarking parallel scaling without
re-importing scipy.
- Accept jumps as either a Pauli string (Hermitian-Pauli fast path) or a
complex Pauli linear combination `Iterable[(str, complex)]` for
non-Hermitian dissipators.
- `sigma_plus(site, n_qubits)` and `sigma_minus(site, n_qubits)` helpers
produce the `(X ± iY)/2` Pauli sums for amplitude damping / thermal
excitation. Re-exported from the top-level `ppvm` package.
Tests:
- Amplitude damping (`test_amplitude_damping_action`,
`test_amplitude_damping_generator_and_leakage`) and thermal σ± mix
(`test_thermal_excitation_damping_action`) cross-checked against a
dense Liouvillian super-operator reference.
- Adaptive shim convergence vs the closed Jordan-Wigner bilinear ODE for
NN-XY + Z-dephasing on OBC
(`test_adaptive_converges_to_nn_xy_z_dephasing_bilinear`).
- Predictor-corrector raises the dt-scaling from quadratic to cubic
(`test_predictor_corrector_lifts_dt_scaling_to_cubic`).
- Pure-Rust pc_step matches scipy-based PC at FP precision
(`test_pc_step_rust_matches_scipy_pc`) and exhibits the same cubic
dt-scaling against the bilinear reference
(`test_pc_step_rust_dt_scaling_is_cubic`).
- Length-1 real lincomb routes to the Hermitian fast path
(`test_lincomb_single_term_matches_hermitian_fast_path`).
Demo:
- `demo/lindblad_pc_scaling.py`: end-to-end wall-time scaling sweep with
`--max-cores`, `--model {nn,long-range}`, `--alpha`, `--gamma`, `--tau`
CLI for HPC nodes. Reports first-step + steady-state ms per thread
count with the entire PC step pinned to a per-call rayon pool.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Correctness
- generator()/pc_step()/pc_step_timed(): reject duplicate basis Pauli words
at the PyO3 boundary with a clear ValueError. The underlying row-index
map silently overwrote earlier rows on collision, producing a wrong
sparse generator. ppvm-lindblad now factors the index-build with
uniqueness debug_assert into `build_basis_index(...)`, called from both
`generator` and `generator_csr`. New regression test
`test_generator_rejects_duplicate_basis`.
- `_string_to_codes`: raise `ValueError` naming the bad character; strip
`_` separators to match the Rust `parse_pauli_string` parser.
API surface
- Rename the four codec helpers (string_to_codes, codes_to_string,
basis_to_codes, codes_to_basis) to `_`-prefixed module-private names.
Dependencies — drop scipy from runtime + tests
- Lindbladian.generator{,_arr} now return COO triples (rows, cols, vals)
instead of a scipy.sparse.csc_matrix. Users wanting scipy can wrap;
see docstring.
- Tests replace `scipy.sparse.linalg.expm_multiply` with a numpy
eigendecomposition reference (`expm_mv_dense`).
- pyproject.toml: scipy removed from `dev`; moved to a new `demo` group
(demos still use it for the reference matrix exponential).
Test layout
- Split the 723-line `test/test_lindblad.py` into `test/lindblad/`:
* `_helpers.py` — shared Pauli-table reference, dense Liouvillian
reference, bilinear NN-XY+Z-dephasing solver, adaptive PC drivers
* `test_action_generator.py` — Hermitian-Pauli action/leakage/generator
* `test_non_hermitian_jumps.py` — non-Hermitian (σ-/σ+/thermal) jumps
* `test_adaptive_pc.py` — adaptive PC convergence vs bilinear ref
* `test_pc_step_rust.py` — pure-Rust pc_step parity + cubic dt-scaling
Each test file now ≤ 121 lines.
Demos
- Both `demo/lindblad_adaptive.py` and `demo/lindblad_pc_scaling.py`
rewritten as jupytext percent-format notebooks with markdown narrative
cells and end-of-notebook plots, replacing CLI arg parsing with a
top-of-file parameter cell.
- `lindblad_adaptive.py`: split the monolithic per-step body into
`init_mode`, `add_leakage_to_basis`, `prune_and_cap`, `run_mode`;
dropped the unnecessary `target_arr = basis_arr.copy()` /
`protected_arr = basis_arr.copy()` calls (nothing mutates basis_arr
in place — `vstack`/slice rebind).
- `lindblad_pc_scaling.py`: dropped the `importlib.util.spec_from_file_location`
bypass that was working around `demo/stim.py` shadowing the real `stim`
package on `python demo/foo.py` invocations.
- Rename `demo/stim.py` → `demo/stim_msd.py` (it's an MSD-circuit demo
that happened to share a name with the `stim` package).
Benches
- Convert `examples/spmv_scaling.rs` (std::time + manual print) to a
proper criterion `benches/spmv_scaling.rs` with one entry per thread
count `{1, 2, 3, 4, 6, 8, 10}`, each running SpMV inside a
`rayon::ThreadPool::install`. Added `criterion` to dev-deps +
`[[bench]]` to `ppvm-lindblad/Cargo.toml`.
CSR + expm_multiply replacement (left as-is)
- Surveyed `sprs`, `nalgebra-sparse`, `russell_sparse`, `faer-rs`: all
ship CSR + serial SpMV but none offer rayon-parallel SpMV with the
nnz-threshold gating used here. No Rust crate provides Al-Mohy & Higham
`expm_multiply` (matrix-free action). Decision documented in PR review
replies; happy to revisit if a suitable crate appears.
Verification
- cargo test -p ppvm-lindblad --lib: 10/10
- pytest test/: 115/115
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
8c277a4 to
71caf7e
Compare
Correctness
- generator()/pc_step()/pc_step_timed(): reject duplicate basis Pauli words
at the PyO3 boundary with a clear ValueError. The underlying row-index
map silently overwrote earlier rows on collision, producing a wrong
sparse generator. ppvm-lindblad now factors the index-build with
uniqueness debug_assert into `build_basis_index(...)`, called from both
`generator` and `generator_csr`. New regression test
`test_generator_rejects_duplicate_basis`.
- `_string_to_codes`: raise `ValueError` naming the bad character; strip
`_` separators to match the Rust `parse_pauli_string` parser.
API surface
- Rename the four codec helpers (string_to_codes, codes_to_string,
basis_to_codes, codes_to_basis) to `_`-prefixed module-private names.
Dependencies — drop scipy from runtime + tests
- Lindbladian.generator{,_arr} now return COO triples (rows, cols, vals)
instead of a scipy.sparse.csc_matrix. Users wanting scipy can wrap;
see docstring.
- Tests replace `scipy.sparse.linalg.expm_multiply` with a numpy
eigendecomposition reference (`expm_mv_dense`).
- pyproject.toml: scipy removed from `dev`; moved to a new `demo` group
(demos still use it for the reference matrix exponential).
Test layout
- Split the 723-line `test/test_lindblad.py` into `test/lindblad/`:
* `_helpers.py` — shared Pauli-table reference, dense Liouvillian
reference, bilinear NN-XY+Z-dephasing solver, adaptive PC drivers
* `test_action_generator.py` — Hermitian-Pauli action/leakage/generator
* `test_non_hermitian_jumps.py` — non-Hermitian (σ-/σ+/thermal) jumps
* `test_adaptive_pc.py` — adaptive PC convergence vs bilinear ref
* `test_pc_step_rust.py` — pure-Rust pc_step parity + cubic dt-scaling
Each test file now ≤ 121 lines.
Demos
- Both `demo/lindblad_adaptive.py` and `demo/lindblad_pc_scaling.py`
rewritten as jupytext percent-format notebooks with markdown narrative
cells and end-of-notebook plots, replacing CLI arg parsing with a
top-of-file parameter cell.
- `lindblad_adaptive.py`: split the monolithic per-step body into
`init_mode`, `add_leakage_to_basis`, `prune_and_cap`, `run_mode`;
dropped the unnecessary `target_arr = basis_arr.copy()` /
`protected_arr = basis_arr.copy()` calls (nothing mutates basis_arr
in place — `vstack`/slice rebind).
- `lindblad_pc_scaling.py`: dropped the `importlib.util.spec_from_file_location`
bypass that was working around `demo/stim.py` shadowing the real `stim`
package on `python demo/foo.py` invocations.
- Rename `demo/stim.py` → `demo/stim_msd.py` (it's an MSD-circuit demo
that happened to share a name with the `stim` package).
Benches
- Convert `examples/spmv_scaling.rs` (std::time + manual print) to a
proper criterion `benches/spmv_scaling.rs` with one entry per thread
count `{1, 2, 3, 4, 6, 8, 10}`, each running SpMV inside a
`rayon::ThreadPool::install`. Added `criterion` to dev-deps +
`[[bench]]` to `ppvm-lindblad/Cargo.toml`.
CSR + expm_multiply replacement (left as-is)
- Surveyed `sprs`, `nalgebra-sparse`, `russell_sparse`, `faer-rs`: all
ship CSR + serial SpMV but none offer rayon-parallel SpMV with the
nnz-threshold gating used here. No Rust crate provides Al-Mohy & Higham
`expm_multiply` (matrix-free action). Decision documented in PR review
replies; happy to revisit if a suitable crate appears.
Verification
- cargo test -p ppvm-lindblad --lib: 10/10
- pytest test/: 115/115
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
71caf7e to
e54b88c
Compare
I'm using several physics examples in test/lindblad. There it essentially builds the exact evolution using dense matrices. Of course I can't fully exclude weird edge cases, but I think these examples are non-trivial. |
`pc_step` now takes an optional `drop_tol` parameter. At the end of the
predictor-corrector step (after the corrector expm_multiply), any basis
entry whose absolute coefficient is below `drop_tol` is removed from the
basis, unless the corresponding Pauli word appears in `protected`.
Implemented as a small in-place compaction (FxHashSet lookup on
`protected`, single pass swap-and-truncate over basis+coeffs). No-op
when `drop_tol ≤ 0`, preserving the existing add-only growth behaviour.
Motivation: at γ = 0 (pure unitary Hamiltonian dynamics) there is no
physical damping of small-coefficient strings, so the basis grows
monotonically with only `tau_add` as a brake. `drop_tol > 0` lets the
caller actively prune entries whose coefficient has decayed, keeping
the basis bounded and making γ-extrapolation experiments feasible.
Plumbed through `ppvm_lindblad::Lindbladian::pc_step{,_timed}`, the
PyO3 binding (new optional kwarg, defaults to 0.0), and the Python
shim `ppvm.lindblad.Lindbladian.pc_step{,_arr}`.
Verified:
- All 10 Rust lib tests still pass.
- All 122 Python tests still pass.
- Smoke test: at L=6, NN XY, γ=0, seed = Z_2:
no drop_tol → basis saturates at 36 strings
drop_tol = 0.05 → basis stays at 9
drop_tol = 0.5 → basis stays at 1 (only protected Z_2)
drop_tol = 1.5 → basis stays at 1 (protected never dropped)
Protected words are never removed regardless of magnitude.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Empirical benchmarks (xy-experiments/bench_cache_*.py, see PR thread) showed that the per-Pauli `L*(p)` memoisation cache: - HURTS wall time for sparse-local Hamiltonians (e.g. XX ladder): hash- map lookup latency at million-entry scale matches the cost of just recomputing the few-bond commutator. cache_ON 195s vs cache_OFF 149s at L=20 ladder, |basis|=3.7M (-24%). - Helps modestly for dense long-range Hamiltonians (LR-XY, α=1): ~25% speedup, BUT at ~5 KB per cached entry. At basis sizes that matter (10⁶+), this consumes tens of GB and pushes runs into OOM well before the basis itself does. In both regimes — and especially for the L=41 sweeps that just OOM'd on Perlmutter at γ=0 / tight add_tol — the cache is a net loss. Pauli propagation is intrinsically memory-limited (basis + transient action lists), so any per-Pauli auxiliary storage just shrinks the reachable problem size. Strips: - `action_cache: DashMap` field + `cache_enabled: AtomicBool` flag - `clear_cache()`, `cache_size()`, `set_cache_enabled()` on LindbladSpec - corresponding PyO3 bindings and Python wrapper methods - `dashmap` Cargo dep (no other code in the crate uses it) - `L_op.clear_cache()` calls in demos and tests (now no-ops anyway) All 10 Rust lib tests + 122 Python tests still pass. End-to-end behaviour unchanged; just lower memory + (often) faster. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… RSS cut
Two memory optimisations on top of the adaptive-Pauli evolution paths:
1. **Switch ppvm-python-native to the mimalloc global allocator.**
The system allocator (macOS / glibc malloc) holds onto freed pages in
arena freelists, so the OS-reported RSS stays high even after Rust
has freed allocations. mimalloc returns pages to the kernel much more
aggressively. Drop-in, ~5 lines.
2. **Chunk `leakage_inner`'s parallel pass into batches of 4096 basis
rows.** Previously `basis.par_iter().map_init(...).collect()`
materialised the full `Vec<Vec<(Word, f64)>>` (~N × A × 24 B) in
memory simultaneously before merging — ~430 MB at L=51 LR-XY, N=14 K,
A≈1275. Now each chunk's `Vec<Vec>` is merged into the global
`FxHashMap` and dropped before the next chunk starts; peak in-flight
is bounded to `CHUNK_SIZE · A · 24 B`. Chunk of 4096 still gives
~400 items per rayon worker so parallelism stays healthy.
Benchmark on macOS at two configs (separate process per measurement so
ru_maxrss is clean):
```
config wall(s) peak RSS (MB)
before after before after
LR-XY L=51 γ=1 add_tol=2e-4 18.51 16.51 3160 1759 (-44%)
LR-XY L=12 γ=0.1 add_tol=2e-4 13.10 11.56 4583 1797 (-61%)
```
Memory ~halved; wall ~10-15 % faster (mimalloc is itself faster than
the system allocator). All 10 Rust lib tests + 122 Python tests pass.
This lets the L=41 / γ=0 / tight-tol sweeps that were OOM-ing at 80 GB
on Perlmutter shared nodes likely fit, and roughly doubles the largest
basis we can reach on a fixed memory budget.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Tie the leakage-rate add threshold to the post-step drop threshold via tau_add = K * drop_tol / dt with K=5 as the default — Pareto-optimal in the L=51 LR-XY γ=1 study (K=5, drop=4e-6 matches the K=1, drop=2e-5 accuracy floor at 3× lower RSS+wall, and dominates K=10 at high accuracy). Callers can still pass tau_add explicitly; the new policy only kicks in when tau_add is omitted and drop_tol > 0. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds three new code paths plus three memory optimizations on the
existing pc_step:
- matrix_free flag on pc_step: skip CSR build, do each Krylov-Taylor
SpMV by recomputing L*(p) on the fly. New spmv_matrix_free,
one_norm_matrix_free, expm_multiply_mf. Trades wall (4× at L=6) for
memory (modest at L=6, expected to dominate at L≥10).
- rk4_step: classical 4th-order Runge-Kutta on the adjoint
Lindbladian, matrix-free, with leakage-driven natural basis growth
and post-step drop_tol prune. New compute_action_sum helper. Doc
flags the dt < 2.78/||L*|| stability boundary with explicit warning
about silent norm-conserving / observable-diverging failure mode.
- leakage_with_prune: streaming Cauchy-Schwarz remaining-budget
prune. Exact (drops only candidates whose true sum cannot exceed
tau_add). Uses new max_action_coef per-operator bound.
- pc_step_inner: collapse coeffs_pre + coeffs_pre_padded clones into
a single in-place coeffs buffer, saving 2 × n × 8B per step.
- ExpmOpts.max_krylov_m: cap on Krylov-Taylor degree m_star. Trades
more outer scaling-and-squaring iterations for less Krylov scratch.
All four exposed through Python pc_step_arr and rk4_step_arr.
Correctness: CSR (new opts) and matrix-free pc_step trajectories are
bit-identical (4e-16 max delta over 50 steps at drop=1e-6, L=6).
| } | ||
|
|
||
| criterion_group!(benches, bench_spmv_scaling); | ||
| criterion_main!(benches); |
There was a problem hiding this comment.
can we post in the PR comment about the result of this benchmark for future reference?
There was a problem hiding this comment.
Obsolete: the spmv_scaling benchmark has been removed -- it benchmarked the now-deleted real-CSR SpMV. The matrix-exponential action is now provided by the external quspin-expm crate.
| for (i, yi) in y.iter_mut().enumerate() { | ||
| let mut sum = 0.0; | ||
| for k in indptr[i]..indptr[i + 1] { | ||
| sum += data[k] * x[indices[k] as usize]; | ||
| } | ||
| *yi = sum; | ||
| } |
There was a problem hiding this comment.
can we verify if this gets correctly vectorized after compilation?
There was a problem hiding this comment.
Obsolete: the hand-rolled scaling-and-squaring inner loop has been removed; the matrix-exponential action is now provided by the external quspin-expm crate.
| // mimalloc returns freed pages to the kernel more aggressively than the | ||
| // default system allocator. This materially reduces peak RSS for the | ||
| // allocation-heavy adaptive-Pauli paths (leakage + generator each | ||
| // allocate hundreds of MB of transient Vec data per pc_step). | ||
| #[global_allocator] | ||
| static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc; |
There was a problem hiding this comment.
This looks suspicious. I think we need more explanation of why we are adding this custom allocator and how much improvement this gives compared to not having it.
Ideally, the initial draft PR should not contain this type of optimization; we should add that in a separate PR. I would decline to merge this draft PR as a single unit.
It would be better to start with the basic setup and have that reviewed first, then submit a separate PR focusing on the different optimizations. This avoids forcing the reviewer to go through a giant 10,000-line PR.
| @@ -0,0 +1,400 @@ | |||
| // SPDX-FileCopyrightText: 2026 The PPVM Authors | |||
There was a problem hiding this comment.
If you're adding this as an internal implementation, it would be nice to add this as a separate crate instead of within ppvm-liblblod. I imagine this will be useful for other Rust developers as well, and it will also help clean up the dependency a little bit.
It would be nice to first submit a separate PR that creates a single crate that implements sparse matrix and exponential.
There was a problem hiding this comment.
Addressed: rather than a first-party crate, the matrix-exponential is now provided by the external quspin-expm crate (QuSpin-rust). expm.rs is reduced to just the (m, s) selection table, so the dependency surface is much smaller.
| /// `[u64; 2]` covers up to 128 qubits; the `FxBuildHasher` matches the | ||
| /// hash used by the `FxHashMap` keys we wrap with; `REHASH=true` means | ||
| /// `set()` keeps the cached hash in sync. | ||
| pub type Word = PauliWord<[u64; W_U64], FxBuildHasher, true>; |
There was a problem hiding this comment.
Is there a reason why our implementation has a hard-coded number of qubits here?
There was a problem hiding this comment.
Addressed in b07f75d: added docs explaining the fixed-width [u64; W_U64] Pauli-word storage and MAX_QUBITS = 64 * W_U64 -- chosen for cache-friendly O(1) Pauli ops; construction beyond it errors with Error::TooManyQubits.
| pub type Word = PauliWord<[u64; W_U64], FxBuildHasher, true>; | ||
|
|
||
| /// Errors raised when constructing a [`LindbladSpec`]. | ||
| #[derive(Debug, Clone)] |
There was a problem hiding this comment.
Let's put all the error handling in a separate file, so we have a clean lib.rs file.
There was a problem hiding this comment.
Addressed in b07f75d: the Error enum and its Display/Error impls moved to crates/ppvm-lindblad/src/error.rs; the public path ppvm_lindblad::Error is unchanged.
| /// inner `*mut T` (which is not) — Rust 2021's disjoint-field capture | ||
| /// would otherwise peel the wrapper. | ||
| #[derive(Clone, Copy)] | ||
| struct SendPtr<T>(*mut T); |
There was a problem hiding this comment.
This is an interesting approach. Does this have an existing Rust standard library functionality that we can reuse?
We do not accept any unsafe code in this crate in general. If we have to use unsafe, it must be separated and isolated in a separate crate.
There was a problem hiding this comment.
Resolved: all unsafe has been removed from ppvm-lindblad. The SendPtr parallel-scatter helper is gone along with the CSR generator it served -- evolution is now matrix-free via quspin-expm.
| /// `num_threads`, when set, runs the entire step inside a freshly built | ||
| /// rayon thread pool of that size — useful for benchmarking parallel | ||
| /// scaling. When `None`, the global rayon pool is used. | ||
| #[allow(clippy::too_many_arguments)] |
There was a problem hiding this comment.
We should not work around Clippy errors like this. If Clippy warns you about too many arguments, this is a code smell of an incorrect abstraction being created here.
Ideally, you would create an object that contains configurations such as:
- Drop tolerance
- Tau add
- Other relevant options
Then, you can feed that object in with the Hamiltonian and Lindblad stuff.
There was a problem hiding this comment.
Addressed in b07f75d: introduced PcStepConfig { tau_add, drop_tol, num_threads }; pc_step/pc_step_timed (and their inner helpers) now take it, and the #[allow(clippy::too_many_arguments)] has been removed. The Python-facing kwargs are unchanged -- the binding builds the config internally.
| /// breakdown (microseconds), for profiling parallel scaling and hot | ||
| /// spots. Output: `(leakage1, expand1, gencsr1, expm1, leakage2, | ||
| /// expand2, gencsr2, expm2)`. | ||
| #[allow(clippy::too_many_arguments)] |
There was a problem hiding this comment.
Addressed in b07f75d: covered by the same PcStepConfig refactor.
…shim Brings the bytemuck-based weight() speedup + early-return for uncapped MaxPauliWeight (and the rest of #126) into the Lindblad branch. Conflicts resolved: keep stim_demo.py (byte-identical to lindblad-shim's stim_msd.py rename); regenerate uv.lock against the merged pyproject. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The expm serial/parallel SpMV parity test built `ExpmOpts { tol,
parallel_threshold }` without the `max_krylov_m` field, so `cargo test`
failed to compile the ppvm-lindblad test target (pre-existing on
lindblad-shim, surfaced when running the full suite after merging #126).
The whole Rust workspace test suite now passes.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ree) Replace the hand-rolled Al-Mohy & Higham scaling-and-squaring engine on the real (f64) `pc_step`/`expm_step` path with the external `quspin-expm` crate, used fully matrix-free: a borrowed `MfOp` implements `quspin_types:: LinearOperator` over the in-basis Lindbladian action (`spmv_matrix_free`), and `ExpmOp::from_parts` supplies the shift mu = tr(M)/n, partition count s, and Taylor order m* directly -- bypassing quspin's adaptive parameter selection so only `dot` runs in the Taylor loop (no estimator, no dot_transpose). mu and the exact column 1-norm of M - mu*I are gathered in one matrix-free pass via `action()`; (m*, s) reuse the existing `select_ms` table. No CSR is materialised for the exponential action (`generator_csr` is kept for the other consumers). The old `expm` engine (expm_multiply / expm_multiply_mf / spmv / Csr) is retained but no longer on the `pc_step` path. The complex orbit-rep path is unaffected here and migrates separately on `symmetry-merging`. Validated: cargo test --workspace --exclude ppvm-python-native (618 passed) and the Python lindblad suite (13 passed, pc_step vs bilinear reference). NOTE: QuSpin-rust currently ships no LICENSE (all-rights-reserved). This pin is a placeholder pending a permissive license being added upstream; the quspin-expm authors are co-authors of the ppvm paper. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…path
Now that the real pc_step path runs entirely on quspin-expm (matrix-free,
via mf_expm), delete the dead hand-rolled engine and every remnant of the
CSR-for-real path. This branch has no complex/orbit-rep path, so the whole
old engine is dead:
* expm.rs: reduced to just `select_ms` + its `THETA` table (the only
thing the new engine reuses). Removed the Csr type, csr_from_triplets,
csr_one_norm, spmv / spmv_serial / spmv_parallel, ExpmOpts,
expm_multiply, expm_multiply_mf, and their tests/imports.
* lib.rs: removed LindbladSpec::generator_csr, one_norm_matrix_free, the
now-dead SendPtr scatter helper, and the `expm` re-export. Dropped the
`opts: ExpmOpts` and `matrix_free: bool` parameters from pc_step,
pc_step_inner, pc_step_timed (+ inner) and expm_step — the path is
unconditionally matrix-free now.
* benches/spmv_scaling.rs: removed (benched the deleted real Csr/SpMV);
drop its [[bench]] entry and the now-unused criterion dev-dep.
Python bindings + wrapper: drop the expm_tol / parallel_threshold /
matrix_free / max_krylov_m kwargs (they only configured the removed engine)
from pc_step / pc_step_timed (bindings) and pc_step_arr / pc_step (wrapper);
num_threads is retained. Updated the lindblad_pc_scaling demo.
Net -750 / +30 lines. Validated: cargo test --workspace --exclude
ppvm-python-native (all green, warning-clean) and the Python suite
(136 passed).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Maintainer-requested code-quality changes (PR #98): * PcStepConfig (new config.rs): bundle tau_add / drop_tol / num_threads into a config struct passed to pc_step / pc_step_timed (+ inner helpers), replacing the long parameter lists and the `#[allow(clippy::too_many_arguments)]`. The Python-facing kwargs are unchanged — the binding builds the config internally. * error.rs (new): move the `Error` enum + Display/Error impls out of lib.rs; re-exported as `ppvm_lindblad::Error` (public path unchanged). * leakage: merge via Rayon fold/reduce into thread-local maps instead of collecting a Vec<HashMap> and merging sequentially — lower peak memory. Keys and summed values are identical (FP reduction order differs at ~1e-16, well within test tolerances). * docs: explain the fixed-width `[u64; W_U64]` Word storage and MAX_QUBITS = 64 * W_U64. * demo: simplify the redundant position expression to `np.arange(L) - L//2`. Validated: cargo test --workspace --exclude ppvm-python-native (612 passed), cargo clippy clean (no too_many_arguments on the refactored fns), and the full Python suite (136 passed). Python public API unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per PR #98 review (Roger-luo): the global allocator is a peak-RSS optimization that doesn't belong in this feature PR. Removed here and moved to a dedicated PR off `main` so it can be reviewed/benchmarked on its own. Reverts to the system allocator; no behavior change (validated: lindblad Python suite still green). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Adds a Rust shim for direct, Trotterless Heisenberg-picture time evolution of an observable under a generic Pauli-Lindbladian (Hermitian Pauli Hamiltonian
H = Σ c_i P_i+ Hermitian Pauli jump operatorsL_kwith ratesγ_k), plus a Python driver demonstrating an adaptive Pauli-basis integrator built on it.LindbladSpec(Rust,crates/ppvm-python-native/src/lindblad.rs)A precompiled Lindbladian exposing three primitives over bit-packed Pauli words:
action(p)—L*(p)for a single Pauli string;leakage(basis, coeffs[, protected])— the off-basis component ofL*(Σ c_j p_j)(drives adaptive basis growth);generator(basis)— the Lindbladian restricted tobasisas sparse COO triplets (wrapped into a scipy CSC matrix on the Python side).Hot path: a fused per-byte commutator computing both the XOR product and its phase in one pass; per-qubit active-site index so only terms whose support meets
supp(p)are visited; aDashMapaction cache (the analogue of@lru_cachein the prototype);rayon-parallel basis loops. Numpyuint8(0=I,1=X,2=Z,3=Y) I/O end-to-end, so the driver never builds per-row strings on the hot path.ppvm.Lindbladian(Python wrapper,ppvm-python/src/ppvm/lindblad.py)Thin wrapper with both ndarray-native (
action_arr/leakage_arr/generator_arr) and string-keyed (action/leakage/generator) APIs.Demo (
ppvm-python/demo/lindblad_adaptive.py)End-to-end adaptive integrator: at each step it measures the leakage out of the current basis, adds the leaked strings above
add_tol, advancesexp(dt · M)(exact in dt within the basis — no Trotter error) viascipy.sparse.linalg.expm_multiply, and prunes belowdrop_tol(a protected set is never dropped).--predictor_corrector 1enriches the basis with the predictor's second-hop strings and redoes the step, lifting the dt-error fromO(dt²)toO(dt³). ReportsC_k(t)and a discarded-weight a-posteriori error monitor.Packaging
import ppvmno longer requires SciPy: SciPy is imported lazily insidegenerator()/generator_arr()(the only methods that need it).numpyis declared as a runtime dependency (used by the wrapper).Test plan
ppvm-python/test/test_lindblad.pycross-checksaction/leakage/generator/protected-set against an independent dense Pauli-multiplication-table reference (XY+Z-dephasing and TFIM+X-dephasing); 4 tests pass.cargo fmt --checkandcargo clippyclean onppvm-python-native.ruff checkclean on the wrapper and test.uv lock --checkconsistent.--predictor_corrector).🤖 Generated with Claude Code