Adaptive Pauli-Lindbladian shim + adaptive-evolution demo by AlexSchuckert · Pull Request #98 · QuEraComputing/ppvm

AlexSchuckert · 2026-05-27T20:13:11Z

Summary

Adds a Rust shim for direct, Trotterless Heisenberg-picture time evolution of an observable under a generic Pauli-Lindbladian (Hermitian Pauli Hamiltonian H = Σ c_i P_i + Hermitian Pauli jump operators L_k with rates γ_k), plus a Python driver demonstrating an adaptive Pauli-basis integrator built on it.

`LindbladSpec` (Rust, `crates/ppvm-python-native/src/lindblad.rs`)

A precompiled Lindbladian exposing three primitives over bit-packed Pauli words:

action(p) — L*(p) for a single Pauli string;
leakage(basis, coeffs[, protected]) — the off-basis component of L*(Σ c_j p_j) (drives adaptive basis growth);
generator(basis) — the Lindbladian restricted to basis as sparse COO triplets (wrapped into a scipy CSC matrix on the Python side).

Hot path: a fused per-byte commutator computing both the XOR product and its phase in one pass; per-qubit active-site index so only terms whose support meets supp(p) are visited; a DashMap action cache (the analogue of @lru_cache in the prototype); rayon-parallel basis loops. Numpy uint8 (0=I,1=X,2=Z,3=Y) I/O end-to-end, so the driver never builds per-row strings on the hot path.

`ppvm.Lindbladian` (Python wrapper, `ppvm-python/src/ppvm/lindblad.py`)

Thin wrapper with both ndarray-native (action_arr/leakage_arr/generator_arr) and string-keyed (action/leakage/generator) APIs.

Demo (`ppvm-python/demo/lindblad_adaptive.py`)

End-to-end adaptive integrator: at each step it measures the leakage out of the current basis, adds the leaked strings above add_tol, advances exp(dt · M) (exact in dt within the basis — no Trotter error) via scipy.sparse.linalg.expm_multiply, and prunes below drop_tol (a protected set is never dropped). --predictor_corrector 1 enriches the basis with the predictor's second-hop strings and redoes the step, lifting the dt-error from O(dt²) to O(dt³). Reports C_k(t) and a discarded-weight a-posteriori error monitor.

Packaging

import ppvm no longer requires SciPy: SciPy is imported lazily inside generator()/generator_arr() (the only methods that need it). numpy is declared as a runtime dependency (used by the wrapper).

Test plan

ppvm-python/test/test_lindblad.py cross-checks action/leakage/generator/protected-set against an independent dense Pauli-multiplication-table reference (XY+Z-dephasing and TFIM+X-dephasing); 4 tests pass.
cargo fmt --check and cargo clippy clean on ppvm-python-native.
ruff check clean on the wrapper and test.
uv lock --check consistent.
Demo runs end to end (plain and --predictor_corrector).

🤖 Generated with Claude Code

Copilot

Pull request overview

This PR adds an adjoint Pauli-Lindbladian “shim” (Rust + PyO3) and a Python-facing wrapper to enable direct Heisenberg-picture time evolution of observables in an adaptive Pauli-string basis, plus a demo and cross-checking tests.

Changes:

Introduces LindbladSpec (Rust) implementing action, leakage, and sparse generator over bit-packed Pauli words with caching + Rayon parallelism.
Adds ppvm.Lindbladian (Python) wrapper with both ndarray-native (*_arr) and string-keyed convenience APIs, with lazy SciPy import for generator construction.
Adds an adaptive-evolution demo and a new test suite cross-checking shim results against a dense reference implementation.

Reviewed changes

Copilot reviewed 8 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
ppvm-python/uv.lock	Adds NumPy as a declared dependency in the lock.
ppvm-python/test/test_lindblad.py	New tests validating action/leakage/generator vs a dense reference (uses SciPy).
ppvm-python/src/ppvm/lindblad.py	New Python wrapper for the native Lindbladian shim (string + ndarray APIs; lazy SciPy import).
ppvm-python/src/ppvm/init.py	Re-exports `Lindbladian` from the new module.
ppvm-python/pyproject.toml	Declares NumPy as a runtime dependency for the wrapper.
ppvm-python/demo/lindblad_adaptive.py	New end-to-end demo of adaptive basis growth + expm-based stepping.
crates/ppvm-python-native/src/lindblad.rs	New Rust implementation of the Lindbladian primitives with caching and parallel basis loops.
crates/ppvm-python-native/src/lib.rs	Exposes `LindbladSpec` from the PyO3 module.
crates/ppvm-python-native/Cargo.toml	Adds native dependencies (dashmap/fxhash/numpy/rayon) needed by the shim.
Cargo.lock	Locks new Rust dependencies introduced by the native shim.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

AlexSchuckert · 2026-06-17T17:01:24Z

+def string_to_codes(s: str, n_qubits: int) -> np.ndarray:
+    """Encode a Pauli string ``"IXYZ..."`` as a length-``n_qubits`` uint8 array."""
+    if len(s) != n_qubits:
+        raise ValueError(f"Pauli string {s!r} has length {len(s)} != n_qubits {n_qubits}")
+    return np.array([_PAULI_CODE[c] for c in s], dtype=np.uint8)


Resolved: _string_to_codes now strips _ and raises a ValueError naming the bad character and expected alphabet (matching the Rust parser), instead of letting a KeyError propagate.

AlexSchuckert · 2026-06-17T17:01:26Z

+//! Rust shim for direct Pauli-Lindbladian time evolution on an adaptive
+//! Pauli-string basis. See `xy-experiments/main_k_adaptive.py` for the
+//! Python driver that owns the linear algebra and basis adaptation.


Resolved: the stale xy-experiments/main_k_adaptive.py reference is gone; the in-tree driver is ppvm-python/demo/lindblad_adaptive.py.

AlexSchuckert · 2026-06-17T17:01:27Z

+        let local: Vec<FxHashMap<Word, f64>> = basis_words
+            .par_iter()
+            .zip(coeffs_view.par_iter())
+            .map(|(p, &c)| {
+                let mut m: FxHashMap<Word, f64> = FxHashMap::default();
+                let mut s1 = Vec::new();
+                let mut s2 = Vec::new();
+                accumulate_action(self, p, c, &mut m, &mut s1, &mut s2);
+                m
+            })
+            .collect();
+
+        let mut merged: FxHashMap<Word, f64> = FxHashMap::default();
+        for m in local {
+            for (k, v) in m {
+                if in_basis.contains_key(&k) || protected_set.contains_key(&k) {
+                    continue;
+                }
+                *merged.entry(k).or_insert(0.0) += v;
+            }
+        }


Addressed in b07f75d: leakage/leakage_with_prune now fold into thread-local FxHashMaps and merge via Rayon reduce, so the Vec<HashMap> of per-element intermediates is no longer materialized. Keys and summed values are identical (FP reduction order can differ at ~1e-16, well within the test tolerances).

AlexSchuckert · 2026-06-17T17:01:29Z

+        let basis_words = decode_basis(&basis_view, self.n_qubits)?;
+        let index: FxHashMap<Word, u32> = basis_words
+            .iter()
+            .enumerate()
+            .map(|(i, w)| (*w, i as u32))
+            .collect();


Resolved: the generator() binding (and the pc_step paths) now call assert_basis_unique, which returns a ValueError naming the duplicate row indices before any matrix is built.

Roger-luo · 2026-06-02T01:29:07Z

 requires-python = ">=3.10"
 dependencies = [
    "bloqade-circuit>=0.14.1",
    "kirin>=0.0.15",
+    "numpy>=1.26",
    "ppvm-python-native",
 ]


this is valid concern, please address

Resolved: scipy>=1.13 is now an explicit entry in the dev dependency group in ppvm-python/pyproject.toml (runtime ppvm stays scipy-free).

github-actions · 2026-05-27T20:26:57Z

PR Preview Action v1.8.1
🚀 View preview at https://QuEraComputing.github.io/ppvm/pr-preview/pr-98/
Built to branch `gh-pages` at 2026-06-02 01:51 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Roger-luo · 2026-06-02T01:18:22Z

@copilot fix the license header for Alex :-)

Roger-luo · 2026-06-02T01:19:13Z

@AlexSchuckert wondering if we can have some information about how you verify the correctness of this to help reviewing?

Copilot · 2026-06-02T01:21:37Z

@copilot fix the license header for Alex :-)

Fixed in commit 6209cb8: adjusted ppvm-python/src/ppvm/lindblad.py header formatting to include the required blank line after the SPDX lines so the license-header check passes.

Roger-luo · 2026-06-02T01:20:22Z

+/// then iteratively apply a degree-`m` Horner-form Taylor polynomial
+/// `s` times. Inside each Horner sweep we terminate early once two
+/// successive Taylor terms have norm below `opts.tol · ‖F‖`.
+pub fn expm_multiply(a: &CsrMatrix, t: f64, b: &[f64], opts: ExpmOpts) -> Vec<f64> {


I'm wondering if this exists in a rust crate already?

No Rust crate provides expm_multiply in the matrix-free action form (Al-Mohy & Higham, scaling-and-squaring on A · v). The expm crate exists but only handles dense matrices via the Padé approximation — not useful for our sparse Heisenberg-picture generator. SciPy and Julia (KrylovKit / Expokit ports) both have it; we follow SciPy's expm_multiply algorithm and θ_m table directly. It's ~200 LOC of self-contained numerics, verified by expm_serial_matches_parallel and cross-checked against scipy.sparse.linalg.expm_multiply in test_pc_step_rust_matches_python_pc. Happy to swap if a maintained crate shows up.

Resolved: the matrix-exponential action is now provided by the external quspin-expm crate (QuSpin-rust), used matrix-free. The hand-rolled implementation has been removed; expm.rs is reduced to just the (m, s) selection table.

Roger-luo · 2026-06-02T01:22:49Z

+//! the SpMV path itself scales before debugging higher-level paths.
+
+use ppvm_lindblad::CsrMatrix;
+use std::time::Instant;


can we actually turn this into benchmark like other crates instead of using std::time to measure the performance? it will be cleaner to use the same setup as other crates like those in ppvm-runtime

Resolved: the examples/spmv_scaling.rs timing example has been removed -- the real-CSR SpMV it exercised no longer exists (evolution is matrix-free via quspin-expm).

Roger-luo · 2026-06-02T01:24:06Z

+
+    L_op.clear_cache()  # action cache is per-k since each k starts a fresh basis
+
+    for nt in range(c.steps + 1):


I understand this might work... but it would be nice to split these nested loops into functions to make things easier to read

Addressed: the demo is now factored into helper functions -- build_xy_dephasing, init_mode, add_leakage_to_basis, prune_and_cap, and run_mode -- rather than inline nested loops.

Roger-luo · 2026-06-02T01:28:23Z

+        for k in set(got_a) | set(got_b):
+            assert got_a.get(k, 0.0) == got_b.get(k, 0.0), (
+                f"lincomb fast path mismatch at p={p!r}: {got_a} vs {got_b}"
+            )


a 700 loc python file is a hard reject

Resolved: the monolithic test_lindblad.py has been split into ppvm-python/test/lindblad/ (test_action_generator.py, test_adaptive_pc.py, test_non_hermitian_jumps.py, test_pc_step_rust.py, plus _helpers.py).

Roger-luo · 2026-06-02T01:29:07Z

 requires-python = ">=3.10"
 dependencies = [
    "bloqade-circuit>=0.14.1",
    "kirin>=0.0.15",
+    "numpy>=1.26",
    "ppvm-python-native",
 ]


this is valid concern, please address

Correctness - generator()/pc_step()/pc_step_timed(): reject duplicate basis Pauli words at the PyO3 boundary with a clear ValueError. The underlying row-index map silently overwrote earlier rows on collision, producing a wrong sparse generator. ppvm-lindblad now factors the index-build with uniqueness debug_assert into `build_basis_index(...)`, called from both `generator` and `generator_csr`. New regression test `test_generator_rejects_duplicate_basis`. - `_string_to_codes`: raise `ValueError` naming the bad character; strip `_` separators to match the Rust `parse_pauli_string` parser. API surface - Rename the four codec helpers (string_to_codes, codes_to_string, basis_to_codes, codes_to_basis) to `_`-prefixed module-private names. Dependencies — drop scipy from runtime + tests - Lindbladian.generator{,_arr} now return COO triples (rows, cols, vals) instead of a scipy.sparse.csc_matrix. Users wanting scipy can wrap; see docstring. - Tests replace `scipy.sparse.linalg.expm_multiply` with a numpy eigendecomposition reference (`expm_mv_dense`). - pyproject.toml: scipy removed from `dev`; moved to a new `demo` group (demos still use it for the reference matrix exponential). Test layout - Split the 723-line `test/test_lindblad.py` into `test/lindblad/`: * `_helpers.py` — shared Pauli-table reference, dense Liouvillian reference, bilinear NN-XY+Z-dephasing solver, adaptive PC drivers * `test_action_generator.py` — Hermitian-Pauli action/leakage/generator * `test_non_hermitian_jumps.py` — non-Hermitian (σ-/σ+/thermal) jumps * `test_adaptive_pc.py` — adaptive PC convergence vs bilinear ref * `test_pc_step_rust.py` — pure-Rust pc_step parity + cubic dt-scaling Each test file now ≤ 121 lines. Demos - Both `demo/lindblad_adaptive.py` and `demo/lindblad_pc_scaling.py` rewritten as jupytext percent-format notebooks with markdown narrative cells and end-of-notebook plots, replacing CLI arg parsing with a top-of-file parameter cell. - `lindblad_adaptive.py`: split the monolithic per-step body into `init_mode`, `add_leakage_to_basis`, `prune_and_cap`, `run_mode`; dropped the unnecessary `target_arr = basis_arr.copy()` / `protected_arr = basis_arr.copy()` calls (nothing mutates basis_arr in place — `vstack`/slice rebind). - `lindblad_pc_scaling.py`: dropped the `importlib.util.spec_from_file_location` bypass that was working around `demo/stim.py` shadowing the real `stim` package on `python demo/foo.py` invocations. - Rename `demo/stim.py` → `demo/stim_msd.py` (it's an MSD-circuit demo that happened to share a name with the `stim` package). Benches - Convert `examples/spmv_scaling.rs` (std::time + manual print) to a proper criterion `benches/spmv_scaling.rs` with one entry per thread count `{1, 2, 3, 4, 6, 8, 10}`, each running SpMV inside a `rayon::ThreadPool::install`. Added `criterion` to dev-deps + `[[bench]]` to `ppvm-lindblad/Cargo.toml`. CSR + expm_multiply replacement (left as-is) - Surveyed `sprs`, `nalgebra-sparse`, `russell_sparse`, `faer-rs`: all ship CSR + serial SpMV but none offer rayon-parallel SpMV with the nnz-threshold gating used here. No Rust crate provides Al-Mohy & Higham `expm_multiply` (matrix-free action). Decision documented in PR review replies; happy to revisit if a suitable crate appears. Verification - cargo test -p ppvm-lindblad --lib: 10/10 - pytest test/: 115/115 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 18 out of 22 changed files in this pull request and generated 1 comment.

…olution Adds a LindbladSpec PyClass that exposes three primitives needed for direct Heisenberg-picture evolution of a generic Pauli Lindbladian (Hermitian Hamiltonian sum + Hermitian Pauli jumps) on an adaptive Pauli-string basis: - action(p): L*(p) for a single Pauli string - leakage(basis, coeffs): off-basis component of L*(Sum c_j p_j) - generator(basis): sparse generator matrix in COO form The hot path stores Pauli words as packed [u8; W] X/Z bit arrays and runs a single fused per-byte multiply that computes both r = h XOR p and the product phase in one pass. Active-site iteration (per-qubit inverted index of which H/jump terms touch each qubit) restricts the inner loop to non-trivially-acting terms; an action cache (DashMap, FxHash) memoises per-input contribution lists; rayon parallelises the basis loops. The numpy boundary takes uint8 code arrays (0=I 1=X 2=Z 3=Y) end-to-end, so the Python driver never builds per-row strings on the hot path. A thin Python wrapper (ppvm.Lindbladian) exposes both ndarray and string-keyed APIs for callers who don't need the ndarray fast path. End-to-end agreement with a pure-Python reference is within 1e-16 in max |Ck|. At L=51, kmax=1, dt=0.2, 20 steps, the shim runs in 17s vs 481s for the pure-Python adaptive driver (~28x), with the action cache and ndarray-native wrapper jointly responsible for the speedup over naive Rust.

- demo/lindblad_adaptive.py: adaptive Pauli-Lindbladian time evolution on a leakage-grown Pauli-string basis (all-to-all XY + single-site Z dephasing), with an optional predictor-corrector (lifts the dt-error from O(dt^2) to O(dt^3)) and a discarded-weight a-posteriori error monitor. Exercises ppvm.Lindbladian (action / leakage / generator) end to end. - test/test_lindblad.py: cross-check action / leakage / generator / the protected set against an independent dense Pauli-multiplication-table reference, on XY+Z-dephasing and TFIM+X-dephasing Lindbladians. - Declare numpy as a runtime dependency (the shim wrapper uses it) and import SciPy lazily inside generator()/generator_arr(); `import ppvm` and the action/leakage primitives no longer require SciPy -- only the sparse-matrix convenience does.

…e-Rust PC step Split the adjoint Pauli-Lindbladian algorithm out of `ppvm-python-native` into a standalone `ppvm-lindblad` crate backed by `PauliWord<[u64; 2]>` from `ppvm-runtime`, and extend it with: - Non-Hermitian Pauli-sum jumps via `JumpInput` / `JumpKind::{HermitianPauli, General}`. The general path implements the full GKSL sandwich `Σ_{a,b} λ_a* λ_b P_a O P_b - 1/2 {L†L, O}` with the `L†L` Pauli expansion precomputed at construction. Same fast diagonal path for the common Hermitian-Pauli jump case. - Pure-Rust matrix-exponential action `expm_multiply` implementing Al-Mohy & Higham (2011) Algorithm 3.2: degree-`m` Horner-form Taylor with scaling-and-squaring, `(m, s)` from a precomputed `θ_m` table, optional early termination. Backed by a minimal CSR matrix type with serial and rayon-parallel SpMV (parallel above a configurable nnz threshold). - `LindbladSpec::pc_step` runs the entire predictor-corrector step in Rust (first-hop leakage expansion → predictor expm → second-hop leakage expansion → corrector expm), so the Python driver no longer needs scipy for the inner loop. - `pc_step_timed` returns a per-phase microsecond breakdown (leakage1/expand1/gencsr1/expm1/...) for profiling parallel scaling. - `num_threads: Option<usize>` parameter wraps the entire PC step in a freshly-built rayon `pool.install`, isolating the requested core count for benchmarking. - Performance: parallel-scatter CSR build (skips `from_triplets`'s sequential count/scatter); `rayon::map_init` thread-local scratch (hashmap + scratch vectors reused across rayon tasks instead of per-task allocations); hash-only `in_basis`/`protected_set` membership tables (`FxHashMap<u64, ()>` instead of `<Word, ()>` shrinks the working set ~6× and keeps it in L2/L3 past basis 10⁶). At L=51, long-range XY α=1, γ=1, T=1 (basis grows to 577K), 20 PC steps runs end-to-end in ~512 s wall on 4 threads — down from ~734 s pre-optimisation (1.43× faster). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…e tests, scaling demo Wrapper updates and tests for the new `ppvm-lindblad` crate. API additions on `Lindbladian`: - `pc_step` / `pc_step_arr`: thin Python wrappers around the pure-Rust predictor-corrector step. Expose `parallel_threshold`, `num_threads`, and `expm_tol` knobs for benchmarking parallel scaling without re-importing scipy. - Accept jumps as either a Pauli string (Hermitian-Pauli fast path) or a complex Pauli linear combination `Iterable[(str, complex)]` for non-Hermitian dissipators. - `sigma_plus(site, n_qubits)` and `sigma_minus(site, n_qubits)` helpers produce the `(X ± iY)/2` Pauli sums for amplitude damping / thermal excitation. Re-exported from the top-level `ppvm` package. Tests: - Amplitude damping (`test_amplitude_damping_action`, `test_amplitude_damping_generator_and_leakage`) and thermal σ± mix (`test_thermal_excitation_damping_action`) cross-checked against a dense Liouvillian super-operator reference. - Adaptive shim convergence vs the closed Jordan-Wigner bilinear ODE for NN-XY + Z-dephasing on OBC (`test_adaptive_converges_to_nn_xy_z_dephasing_bilinear`). - Predictor-corrector raises the dt-scaling from quadratic to cubic (`test_predictor_corrector_lifts_dt_scaling_to_cubic`). - Pure-Rust pc_step matches scipy-based PC at FP precision (`test_pc_step_rust_matches_scipy_pc`) and exhibits the same cubic dt-scaling against the bilinear reference (`test_pc_step_rust_dt_scaling_is_cubic`). - Length-1 real lincomb routes to the Hermitian fast path (`test_lincomb_single_term_matches_hermitian_fast_path`). Demo: - `demo/lindblad_pc_scaling.py`: end-to-end wall-time scaling sweep with `--max-cores`, `--model {nn,long-range}`, `--alpha`, `--gamma`, `--tau` CLI for HPC nodes. Reports first-step + steady-state ms per thread count with the entire PC step pinned to a per-call rayon pool. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Correctness - generator()/pc_step()/pc_step_timed(): reject duplicate basis Pauli words at the PyO3 boundary with a clear ValueError. The underlying row-index map silently overwrote earlier rows on collision, producing a wrong sparse generator. ppvm-lindblad now factors the index-build with uniqueness debug_assert into `build_basis_index(...)`, called from both `generator` and `generator_csr`. New regression test `test_generator_rejects_duplicate_basis`. - `_string_to_codes`: raise `ValueError` naming the bad character; strip `_` separators to match the Rust `parse_pauli_string` parser. API surface - Rename the four codec helpers (string_to_codes, codes_to_string, basis_to_codes, codes_to_basis) to `_`-prefixed module-private names. Dependencies — drop scipy from runtime + tests - Lindbladian.generator{,_arr} now return COO triples (rows, cols, vals) instead of a scipy.sparse.csc_matrix. Users wanting scipy can wrap; see docstring. - Tests replace `scipy.sparse.linalg.expm_multiply` with a numpy eigendecomposition reference (`expm_mv_dense`). - pyproject.toml: scipy removed from `dev`; moved to a new `demo` group (demos still use it for the reference matrix exponential). Test layout - Split the 723-line `test/test_lindblad.py` into `test/lindblad/`: * `_helpers.py` — shared Pauli-table reference, dense Liouvillian reference, bilinear NN-XY+Z-dephasing solver, adaptive PC drivers * `test_action_generator.py` — Hermitian-Pauli action/leakage/generator * `test_non_hermitian_jumps.py` — non-Hermitian (σ-/σ+/thermal) jumps * `test_adaptive_pc.py` — adaptive PC convergence vs bilinear ref * `test_pc_step_rust.py` — pure-Rust pc_step parity + cubic dt-scaling Each test file now ≤ 121 lines. Demos - Both `demo/lindblad_adaptive.py` and `demo/lindblad_pc_scaling.py` rewritten as jupytext percent-format notebooks with markdown narrative cells and end-of-notebook plots, replacing CLI arg parsing with a top-of-file parameter cell. - `lindblad_adaptive.py`: split the monolithic per-step body into `init_mode`, `add_leakage_to_basis`, `prune_and_cap`, `run_mode`; dropped the unnecessary `target_arr = basis_arr.copy()` / `protected_arr = basis_arr.copy()` calls (nothing mutates basis_arr in place — `vstack`/slice rebind). - `lindblad_pc_scaling.py`: dropped the `importlib.util.spec_from_file_location` bypass that was working around `demo/stim.py` shadowing the real `stim` package on `python demo/foo.py` invocations. - Rename `demo/stim.py` → `demo/stim_msd.py` (it's an MSD-circuit demo that happened to share a name with the `stim` package). Benches - Convert `examples/spmv_scaling.rs` (std::time + manual print) to a proper criterion `benches/spmv_scaling.rs` with one entry per thread count `{1, 2, 3, 4, 6, 8, 10}`, each running SpMV inside a `rayon::ThreadPool::install`. Added `criterion` to dev-deps + `[[bench]]` to `ppvm-lindblad/Cargo.toml`. CSR + expm_multiply replacement (left as-is) - Surveyed `sprs`, `nalgebra-sparse`, `russell_sparse`, `faer-rs`: all ship CSR + serial SpMV but none offer rayon-parallel SpMV with the nnz-threshold gating used here. No Rust crate provides Al-Mohy & Higham `expm_multiply` (matrix-free action). Decision documented in PR review replies; happy to revisit if a suitable crate appears. Verification - cargo test -p ppvm-lindblad --lib: 10/10 - pytest test/: 115/115 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 18 out of 22 changed files in this pull request and generated no new comments.

AlexSchuckert · 2026-06-02T12:13:17Z

@AlexSchuckert wondering if we can have some information about how you verify the correctness of this to help reviewing?

I'm using several physics examples in test/lindblad. There it essentially builds the exact evolution using dense matrices. Of course I can't fully exclude weird edge cases, but I think these examples are non-trivial.

`pc_step` now takes an optional `drop_tol` parameter. At the end of the predictor-corrector step (after the corrector expm_multiply), any basis entry whose absolute coefficient is below `drop_tol` is removed from the basis, unless the corresponding Pauli word appears in `protected`. Implemented as a small in-place compaction (FxHashSet lookup on `protected`, single pass swap-and-truncate over basis+coeffs). No-op when `drop_tol ≤ 0`, preserving the existing add-only growth behaviour. Motivation: at γ = 0 (pure unitary Hamiltonian dynamics) there is no physical damping of small-coefficient strings, so the basis grows monotonically with only `tau_add` as a brake. `drop_tol > 0` lets the caller actively prune entries whose coefficient has decayed, keeping the basis bounded and making γ-extrapolation experiments feasible. Plumbed through `ppvm_lindblad::Lindbladian::pc_step{,_timed}`, the PyO3 binding (new optional kwarg, defaults to 0.0), and the Python shim `ppvm.lindblad.Lindbladian.pc_step{,_arr}`. Verified: - All 10 Rust lib tests still pass. - All 122 Python tests still pass. - Smoke test: at L=6, NN XY, γ=0, seed = Z_2: no drop_tol → basis saturates at 36 strings drop_tol = 0.05 → basis stays at 9 drop_tol = 0.5 → basis stays at 1 (only protected Z_2) drop_tol = 1.5 → basis stays at 1 (protected never dropped) Protected words are never removed regardless of magnitude. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Empirical benchmarks (xy-experiments/bench_cache_*.py, see PR thread) showed that the per-Pauli `L*(p)` memoisation cache: - HURTS wall time for sparse-local Hamiltonians (e.g. XX ladder): hash- map lookup latency at million-entry scale matches the cost of just recomputing the few-bond commutator. cache_ON 195s vs cache_OFF 149s at L=20 ladder, |basis|=3.7M (-24%). - Helps modestly for dense long-range Hamiltonians (LR-XY, α=1): ~25% speedup, BUT at ~5 KB per cached entry. At basis sizes that matter (10⁶+), this consumes tens of GB and pushes runs into OOM well before the basis itself does. In both regimes — and especially for the L=41 sweeps that just OOM'd on Perlmutter at γ=0 / tight add_tol — the cache is a net loss. Pauli propagation is intrinsically memory-limited (basis + transient action lists), so any per-Pauli auxiliary storage just shrinks the reachable problem size. Strips: - `action_cache: DashMap` field + `cache_enabled: AtomicBool` flag - `clear_cache()`, `cache_size()`, `set_cache_enabled()` on LindbladSpec - corresponding PyO3 bindings and Python wrapper methods - `dashmap` Cargo dep (no other code in the crate uses it) - `L_op.clear_cache()` calls in demos and tests (now no-ops anyway) All 10 Rust lib tests + 122 Python tests still pass. End-to-end behaviour unchanged; just lower memory + (often) faster. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… RSS cut Two memory optimisations on top of the adaptive-Pauli evolution paths: 1. **Switch ppvm-python-native to the mimalloc global allocator.** The system allocator (macOS / glibc malloc) holds onto freed pages in arena freelists, so the OS-reported RSS stays high even after Rust has freed allocations. mimalloc returns pages to the kernel much more aggressively. Drop-in, ~5 lines. 2. **Chunk `leakage_inner`'s parallel pass into batches of 4096 basis rows.** Previously `basis.par_iter().map_init(...).collect()` materialised the full `Vec<Vec<(Word, f64)>>` (~N × A × 24 B) in memory simultaneously before merging — ~430 MB at L=51 LR-XY, N=14 K, A≈1275. Now each chunk's `Vec<Vec>` is merged into the global `FxHashMap` and dropped before the next chunk starts; peak in-flight is bounded to `CHUNK_SIZE · A · 24 B`. Chunk of 4096 still gives ~400 items per rayon worker so parallelism stays healthy. Benchmark on macOS at two configs (separate process per measurement so ru_maxrss is clean): ``` config wall(s) peak RSS (MB) before after before after LR-XY L=51 γ=1 add_tol=2e-4 18.51 16.51 3160 1759 (-44%) LR-XY L=12 γ=0.1 add_tol=2e-4 13.10 11.56 4583 1797 (-61%) ``` Memory ~halved; wall ~10-15 % faster (mimalloc is itself faster than the system allocator). All 10 Rust lib tests + 122 Python tests pass. This lets the L=41 / γ=0 / tight-tol sweeps that were OOM-ing at 80 GB on Perlmutter shared nodes likely fit, and roughly doubles the largest basis we can reach on a fixed memory budget. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Tie the leakage-rate add threshold to the post-step drop threshold via tau_add = K * drop_tol / dt with K=5 as the default — Pareto-optimal in the L=51 LR-XY γ=1 study (K=5, drop=4e-6 matches the K=1, drop=2e-5 accuracy floor at 3× lower RSS+wall, and dominates K=10 at high accuracy). Callers can still pass tau_add explicitly; the new policy only kicks in when tau_add is omitted and drop_tol > 0. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Adds three new code paths plus three memory optimizations on the existing pc_step: - matrix_free flag on pc_step: skip CSR build, do each Krylov-Taylor SpMV by recomputing L*(p) on the fly. New spmv_matrix_free, one_norm_matrix_free, expm_multiply_mf. Trades wall (4× at L=6) for memory (modest at L=6, expected to dominate at L≥10). - rk4_step: classical 4th-order Runge-Kutta on the adjoint Lindbladian, matrix-free, with leakage-driven natural basis growth and post-step drop_tol prune. New compute_action_sum helper. Doc flags the dt < 2.78/||L*|| stability boundary with explicit warning about silent norm-conserving / observable-diverging failure mode. - leakage_with_prune: streaming Cauchy-Schwarz remaining-budget prune. Exact (drops only candidates whose true sum cannot exceed tau_add). Uses new max_action_coef per-operator bound. - pc_step_inner: collapse coeffs_pre + coeffs_pre_padded clones into a single in-place coeffs buffer, saving 2 × n × 8B per step. - ExpmOpts.max_krylov_m: cap on Krylov-Taylor degree m_star. Trades more outer scaling-and-squaring iterations for less Krylov scratch. All four exposed through Python pc_step_arr and rk4_step_arr. Correctness: CSR (new opts) and matrix-free pc_step trajectories are bit-identical (4e-16 max delta over 50 steps at drop=1e-6, L=6).

Roger-luo · 2026-06-11T17:14:44Z

+}
+
+criterion_group!(benches, bench_spmv_scaling);
+criterion_main!(benches);


can we post in the PR comment about the result of this benchmark for future reference?

Obsolete: the spmv_scaling benchmark has been removed -- it benchmarked the now-deleted real-CSR SpMV. The matrix-exponential action is now provided by the external quspin-expm crate.

Roger-luo · 2026-06-11T17:16:05Z

+    for (i, yi) in y.iter_mut().enumerate() {
+        let mut sum = 0.0;
+        for k in indptr[i]..indptr[i + 1] {
+            sum += data[k] * x[indices[k] as usize];
+        }
+        *yi = sum;
+    }


can we verify if this gets correctly vectorized after compilation?

Obsolete: the hand-rolled scaling-and-squaring inner loop has been removed; the matrix-exponential action is now provided by the external quspin-expm crate.

Roger-luo · 2026-06-11T17:18:50Z

+// mimalloc returns freed pages to the kernel more aggressively than the
+// default system allocator. This materially reduces peak RSS for the
+// allocation-heavy adaptive-Pauli paths (leakage + generator each
+// allocate hundreds of MB of transient Vec data per pc_step).
+#[global_allocator]
+static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc;


This looks suspicious. I think we need more explanation of why we are adding this custom allocator and how much improvement this gives compared to not having it.

Ideally, the initial draft PR should not contain this type of optimization; we should add that in a separate PR. I would decline to merge this draft PR as a single unit.

It would be better to start with the basic setup and have that reviewed first, then submit a separate PR focusing on the different optimizations. This avoids forcing the reviewer to go through a giant 10,000-line PR.

Addressed: the global allocator has been removed from this PR (f14edaf) and moved to its own PR -- #129 -- so it can be reviewed and benchmarked independently, per your suggestion.

Roger-luo · 2026-06-11T17:19:56Z

@@ -0,0 +1,400 @@
+// SPDX-FileCopyrightText: 2026 The PPVM Authors


If you're adding this as an internal implementation, it would be nice to add this as a separate crate instead of within ppvm-liblblod. I imagine this will be useful for other Rust developers as well, and it will also help clean up the dependency a little bit.

It would be nice to first submit a separate PR that creates a single crate that implements sparse matrix and exponential.

Addressed: rather than a first-party crate, the matrix-exponential is now provided by the external quspin-expm crate (QuSpin-rust). expm.rs is reduced to just the (m, s) selection table, so the dependency surface is much smaller.

Roger-luo · 2026-06-11T17:20:59Z

+/// `[u64; 2]` covers up to 128 qubits; the `FxBuildHasher` matches the
+/// hash used by the `FxHashMap` keys we wrap with; `REHASH=true` means
+/// `set()` keeps the cached hash in sync.
+pub type Word = PauliWord<[u64; W_U64], FxBuildHasher, true>;


Is there a reason why our implementation has a hard-coded number of qubits here?

Addressed in b07f75d: added docs explaining the fixed-width [u64; W_U64] Pauli-word storage and MAX_QUBITS = 64 * W_U64 -- chosen for cache-friendly O(1) Pauli ops; construction beyond it errors with Error::TooManyQubits.

Roger-luo · 2026-06-11T17:21:25Z

+pub type Word = PauliWord<[u64; W_U64], FxBuildHasher, true>;
+
+/// Errors raised when constructing a [`LindbladSpec`].
+#[derive(Debug, Clone)]


Let's put all the error handling in a separate file, so we have a clean lib.rs file.

Addressed in b07f75d: the Error enum and its Display/Error impls moved to crates/ppvm-lindblad/src/error.rs; the public path ppvm_lindblad::Error is unchanged.

Roger-luo · 2026-06-11T17:22:28Z

+/// inner `*mut T` (which is not) — Rust 2021's disjoint-field capture
+/// would otherwise peel the wrapper.
+#[derive(Clone, Copy)]
+struct SendPtr<T>(*mut T);


This is an interesting approach. Does this have an existing Rust standard library functionality that we can reuse?

We do not accept any unsafe code in this crate in general. If we have to use unsafe, it must be separated and isolated in a separate crate.

Resolved: all unsafe has been removed from ppvm-lindblad. The SendPtr parallel-scatter helper is gone along with the CSR generator it served -- evolution is now matrix-free via quspin-expm.

Roger-luo · 2026-06-11T17:23:46Z

+    /// `num_threads`, when set, runs the entire step inside a freshly built
+    /// rayon thread pool of that size — useful for benchmarking parallel
+    /// scaling. When `None`, the global rayon pool is used.
+    #[allow(clippy::too_many_arguments)]


We should not work around Clippy errors like this. If Clippy warns you about too many arguments, this is a code smell of an incorrect abstraction being created here.

Ideally, you would create an object that contains configurations such as:

Drop tolerance

Tau add

Other relevant options

Then, you can feed that object in with the Hamiltonian and Lindblad stuff.

Addressed in b07f75d: introduced PcStepConfig { tau_add, drop_tol, num_threads }; pc_step/pc_step_timed (and their inner helpers) now take it, and the #[allow(clippy::too_many_arguments)] has been removed. The Python-facing kwargs are unchanged -- the binding builds the config internally.

Roger-luo · 2026-06-11T17:23:58Z

+    /// breakdown (microseconds), for profiling parallel scaling and hot
+    /// spots. Output: `(leakage1, expand1, gencsr1, expm1, leakage2,
+    /// expand2, gencsr2, expm2)`.
+    #[allow(clippy::too_many_arguments)]


Addressed in b07f75d: covered by the same PcStepConfig refactor.

…shim Brings the bytemuck-based weight() speedup + early-return for uncapped MaxPauliWeight (and the rest of #126) into the Lindblad branch. Conflicts resolved: keep stim_demo.py (byte-identical to lindblad-shim's stim_msd.py rename); regenerate uv.lock against the merged pyproject. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The expm serial/parallel SpMV parity test built `ExpmOpts { tol, parallel_threshold }` without the `max_krylov_m` field, so `cargo test` failed to compile the ppvm-lindblad test target (pre-existing on lindblad-shim, surfaced when running the full suite after merging #126). The whole Rust workspace test suite now passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ree) Replace the hand-rolled Al-Mohy & Higham scaling-and-squaring engine on the real (f64) `pc_step`/`expm_step` path with the external `quspin-expm` crate, used fully matrix-free: a borrowed `MfOp` implements `quspin_types:: LinearOperator` over the in-basis Lindbladian action (`spmv_matrix_free`), and `ExpmOp::from_parts` supplies the shift mu = tr(M)/n, partition count s, and Taylor order m* directly -- bypassing quspin's adaptive parameter selection so only `dot` runs in the Taylor loop (no estimator, no dot_transpose). mu and the exact column 1-norm of M - mu*I are gathered in one matrix-free pass via `action()`; (m*, s) reuse the existing `select_ms` table. No CSR is materialised for the exponential action (`generator_csr` is kept for the other consumers). The old `expm` engine (expm_multiply / expm_multiply_mf / spmv / Csr) is retained but no longer on the `pc_step` path. The complex orbit-rep path is unaffected here and migrates separately on `symmetry-merging`. Validated: cargo test --workspace --exclude ppvm-python-native (618 passed) and the Python lindblad suite (13 passed, pc_step vs bilinear reference). NOTE: QuSpin-rust currently ships no LICENSE (all-rights-reserved). This pin is a placeholder pending a permissive license being added upstream; the quspin-expm authors are co-authors of the ppvm paper. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…path Now that the real pc_step path runs entirely on quspin-expm (matrix-free, via mf_expm), delete the dead hand-rolled engine and every remnant of the CSR-for-real path. This branch has no complex/orbit-rep path, so the whole old engine is dead: * expm.rs: reduced to just `select_ms` + its `THETA` table (the only thing the new engine reuses). Removed the Csr type, csr_from_triplets, csr_one_norm, spmv / spmv_serial / spmv_parallel, ExpmOpts, expm_multiply, expm_multiply_mf, and their tests/imports. * lib.rs: removed LindbladSpec::generator_csr, one_norm_matrix_free, the now-dead SendPtr scatter helper, and the `expm` re-export. Dropped the `opts: ExpmOpts` and `matrix_free: bool` parameters from pc_step, pc_step_inner, pc_step_timed (+ inner) and expm_step — the path is unconditionally matrix-free now. * benches/spmv_scaling.rs: removed (benched the deleted real Csr/SpMV); drop its [[bench]] entry and the now-unused criterion dev-dep. Python bindings + wrapper: drop the expm_tol / parallel_threshold / matrix_free / max_krylov_m kwargs (they only configured the removed engine) from pc_step / pc_step_timed (bindings) and pc_step_arr / pc_step (wrapper); num_threads is retained. Updated the lindblad_pc_scaling demo. Net -750 / +30 lines. Validated: cargo test --workspace --exclude ppvm-python-native (all green, warning-clean) and the Python suite (136 passed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Maintainer-requested code-quality changes (PR #98): * PcStepConfig (new config.rs): bundle tau_add / drop_tol / num_threads into a config struct passed to pc_step / pc_step_timed (+ inner helpers), replacing the long parameter lists and the `#[allow(clippy::too_many_arguments)]`. The Python-facing kwargs are unchanged — the binding builds the config internally. * error.rs (new): move the `Error` enum + Display/Error impls out of lib.rs; re-exported as `ppvm_lindblad::Error` (public path unchanged). * leakage: merge via Rayon fold/reduce into thread-local maps instead of collecting a Vec<HashMap> and merging sequentially — lower peak memory. Keys and summed values are identical (FP reduction order differs at ~1e-16, well within test tolerances). * docs: explain the fixed-width `[u64; W_U64]` Word storage and MAX_QUBITS = 64 * W_U64. * demo: simplify the redundant position expression to `np.arange(L) - L//2`. Validated: cargo test --workspace --exclude ppvm-python-native (612 passed), cargo clippy clean (no too_many_arguments on the refactored fns), and the full Python suite (136 passed). Python public API unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Per PR #98 review (Roger-luo): the global allocator is a peak-RSS optimization that doesn't belong in this feature PR. Removed here and moved to a dedicated PR off `main` so it can be reviewed/benchmarked on its own. Reverts to the system allocator; no behavior change (validated: lindblad Python suite still green). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 27, 2026 20:13

Copilot started reviewing on behalf of AlexSchuckert May 27, 2026 20:13 View session

Copilot AI reviewed May 27, 2026

View reviewed changes

Copilot started work on behalf of Roger-luo June 2, 2026 01:18 View session

Copilot AI review requested due to automatic review settings June 2, 2026 01:21

AlexSchuckert removed the request for review from Copilot June 2, 2026 01:21

Copilot finished work on behalf of Roger-luo June 2, 2026 01:22

Copilot AI requested a review from Roger-luo June 2, 2026 01:22

Roger-luo requested changes Jun 2, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings June 2, 2026 11:54

Copilot started reviewing on behalf of AlexSchuckert June 2, 2026 11:55 View session

Copilot AI reviewed Jun 2, 2026

View reviewed changes

Comment thread ppvm-python/demo/lindblad_adaptive.py Outdated

AlexSchuckert and others added 5 commits June 2, 2026 12:58

Fix SPDX header formatting in lindblad.py

5e70242

AlexSchuckert force-pushed the lindblad-shim branch from 8c277a4 to 71caf7e Compare June 2, 2026 12:00

Copilot AI review requested due to automatic review settings June 2, 2026 12:07

AlexSchuckert force-pushed the lindblad-shim branch from 71caf7e to e54b88c Compare June 2, 2026 12:07

Copilot started reviewing on behalf of AlexSchuckert June 2, 2026 12:07 View session

Copilot AI reviewed Jun 2, 2026

View reviewed changes

AlexSchuckert marked this pull request as draft June 3, 2026 09:35

AlexSchuckert and others added 6 commits June 3, 2026 17:19

Merge remote-tracking branch 'origin/main' into lindblad-pr-fixes

dfc3204

Roger-luo reviewed Jun 11, 2026

View reviewed changes

david-pl and others added 13 commits June 17, 2026 10:24

Update Julia version and make sure benchmarks are consistent

a8e4dcc

Merge branch 'main' into david/update-julia-benches

6dca8a5

More efficient weight calculation

13aaad4

Inline cutoff

1a67b31

Hard-code weight to avoid slicing overhead

1a2b8f8

Add an early return for max_weight == usize::MAX

8ea602f

Remove svg

0072256

AlexSchuckert mentioned this pull request Jun 17, 2026

perf(ppvm-python-native): use mimalloc as the global allocator #129

Closed


		L_op.clear_cache() # action cache is per-k since each k starts a fresh basis

		for nt in range(c.steps + 1):

		@@ -0,0 +1,400 @@
		// SPDX-FileCopyrightText: 2026 The PPVM Authors

Uh oh!

Conversation

AlexSchuckert commented May 27, 2026

Summary

LindbladSpec (Rust, crates/ppvm-python-native/src/lindblad.rs)

ppvm.Lindbladian (Python wrapper, ppvm-python/src/ppvm/lindblad.py)

Demo (ppvm-python/demo/lindblad_adaptive.py)

Packaging

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-06-02 01:51 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

Roger-luo commented Jun 2, 2026

Uh oh!

Roger-luo commented Jun 2, 2026

Uh oh!

Copilot AI commented Jun 2, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

AlexSchuckert commented Jun 2, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

`LindbladSpec` (Rust, `crates/ppvm-python-native/src/lindblad.rs`)

`ppvm.Lindbladian` (Python wrapper, `ppvm-python/src/ppvm/lindblad.py`)

Demo (`ppvm-python/demo/lindblad_adaptive.py`)

github-actions Bot commented May 27, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-06-02 01:51 UTC.
Preview will be ready when the GitHub Pages deployment is complete.