Skip to content

perf(ppvm-python-native): use mimalloc as the global allocator#129

Closed
AlexSchuckert wants to merge 1 commit into
mainfrom
perf/mimalloc-allocator
Closed

perf(ppvm-python-native): use mimalloc as the global allocator#129
AlexSchuckert wants to merge 1 commit into
mainfrom
perf/mimalloc-allocator

Conversation

@AlexSchuckert

Copy link
Copy Markdown
Collaborator

What

Sets mimalloc as the #[global_allocator] for the ppvm_python_native extension module.

Why

mimalloc returns freed pages to the OS more aggressively than the default system allocator. This materially reduces peak RSS on the allocation-heavy Pauli-propagation paths, where each gate / truncation step churns large transient Vec / HashMap buffers that the system allocator tends to retain.

Split out of #98 per review feedback (@Roger-luo): the allocator is an independent optimization and shouldn't be bundled into a feature PR — this lets it be reviewed and benchmarked on its own.

Changes

  • crates/ppvm-python-native/Cargo.toml: add mimalloc = { version = "0.1", default-features = false }.
  • crates/ppvm-python-native/src/lib.rs: declare the global allocator (with a justifying comment).

Validation

  • maturin develop --release builds and links cleanly.
  • pytest ppvm-python/test/test_basics.py green (smoke).

No API or behavior change — allocator only.

🤖 Generated with Claude Code

mimalloc returns freed pages to the OS more aggressively than the default
system allocator, materially reducing peak RSS on the allocation-heavy
Pauli-propagation paths (each gate / truncation step churns large transient
Vec / HashMap buffers). Set as the `#[global_allocator]` for the Python
extension module.

Split out of #98 per review feedback so the allocator change can be
reviewed and benchmarked on its own rather than bundled into a feature PR.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 17, 2026 17:00

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR configures the ppvm-python-native (PyO3) extension module to use mimalloc as the Rust global allocator to reduce peak RSS on allocation-heavy Pauli-propagation workloads.

Changes:

  • Add mimalloc as a dependency of crates/ppvm-python-native.
  • Declare mimalloc::MiMalloc as the #[global_allocator] in the extension crate.
  • Update Cargo.lock to include mimalloc / libmimalloc-sys.

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 2 comments.

File Description
crates/ppvm-python-native/src/lib.rs Sets mimalloc as the global allocator for the extension module.
crates/ppvm-python-native/Cargo.toml Adds the mimalloc dependency needed by the allocator change.
Cargo.lock Locks mimalloc and its transitive libmimalloc-sys dependency.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 11 to 14
[dependencies]
bnum = "0.13.0"
mimalloc = { version = "0.1", default-features = false }
num = "0.4.3"
Comment on lines +4 to +9
// Use mimalloc as the global allocator. It returns freed pages to the OS
// more aggressively than the default system allocator, which materially
// reduces peak RSS on the allocation-heavy Pauli-propagation paths — each
// gate / truncation step churns large transient `Vec` / `HashMap` buffers.
#[global_allocator]
static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc;
@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-06-22 04:51 UTC

Copy link
Copy Markdown
Collaborator

Ran an allocator A/B for this PR locally on Linux x86_64.

Setup:

  • Compared PR base ea75072a against PR head 4dd7925.
  • CPython 3.12.3.
  • Release build of ppvm-python-native with RUSTFLAGS='-C target-feature=+aes,+sse2 -L native=/usr/lib/python3.12/config-3.12-x86_64-linux-gnu'.
  • Imported the built ppvm_python_native extension directly and exercised the native PauliSumIndexMapFxHash* classes. The PR .so contains mimalloc strings; the baseline .so does not, so the comparison did exercise the allocator change.

Results:

Workload Baseline PR/mimalloc Takeaway
12q Ising/Trotter, 200 rounds, 10 warmups, hyperfine mean 1.9168s ± 0.0529 1.9165s ± 0.0637 no meaningful throughput difference
Same workload, RSS ~19.6 MiB ~23.4 MiB mimalloc higher
20q ladder stress, 15 steps 51.40s, HWM ~112.7 MiB, final RSS ~19.5 MiB 49.61s, HWM ~172.8 MiB, final RSS ~163.0 MiB small speedup, much worse RSS
18q ladder stress, 15 steps 50.87s, HWM ~112.8 MiB, final RSS ~19.5 MiB 49.62s, HWM ~172.7 MiB, final RSS ~162.8 MiB same memory pattern

Conclusion: these Linux results do not support the stated RSS/memory-throughput rationale. Throughput is neutral on the small Trotter workload, and the memory-stress workload retains substantially more resident memory with mimalloc. Closing this PR rather than taking the allocator change as-is.

@Roger-luo Roger-luo closed this Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants