perf(tableau): drop hash coalesce from apply path (bijective relabel)#155
Merged
Conversation
The apply coefficient accumulation (`compute_coefficients_after_pauli_apply`, reached only via `rotate_2` → RXX/RYY/RZZ) relabels every branch by a fixed `idx ^ stab_anticomm_bits`. XOR by a constant is a bijection, so distinct input indices always produce distinct branch indices — unlike the T-gate branch split, the apply path emits a single stream with no possible collisions. The `FxHashMap` coalesce there could therefore never merge two entries: it was pure overhead (a hash + probe per element plus a table allocation) for what is a straight relabel. Replace it with a flat-`Vec` relabel in one sequential pass (`apply_coefficients_seq`), and drop the post-`par_map` sequential hash-fold in `apply_coefficients_parallel` (the mapped pairs already have unique keys). Generalises PR #154's "stop random probing a map" insight to the sibling path #154 left untouched. Correctness: a new differential test (`tests/apply_path.rs`) FNV-1a-digests the full measurement record of a branchy RXX/RYY/RZZ brickwork over 256 seeds; the digest is bit-identical to the hash-coalesce path on both default and `rayon` builds. `cargo nextest run --workspace` is 857/857, including the independent `sampler_vs_pure` state-vector reference checks. A new `benches/rot2-apply.rs` exercises the path (the headline `stim-circuits` bench is T-gate only and never reaches `rotate_2`). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Generalises PR #154's core insight — stop random-probing a hash map where a
sequential pass suffices — to the one hot path #154 left untouched: the apply
coefficient accumulation in
ppvm-tableau(compute_coefficients_after_pauli_apply,reached only via
rotate_2→RXX/RYY/RZZ).That accumulation relabels every branch by a fixed
branch_index = idx ^ stab_anticomm_bits.XOR by a constant is a bijection, so distinct input indices always map to
distinct branch indices. Unlike the T-gate branch split (which emits two streams
that genuinely collide and therefore need a real coalesce), the apply path emits a
single stream with no possible collisions — the
FxHashMapthere could nevermerge two entries. It was pure overhead: a hash + probe per element plus a table
allocation, to perform what is just a relabel.
This PR replaces it with a flat-
Vecrelabel in one sequential, prefetch-friendlypass:
apply_coefficients_seqbuildsVec<(I, Complex)>instead of coalescing into a map.apply_coefficients_paralleldrops the post-par_mapsequential hash-fold — themapped pairs already have unique keys, so the result is collected straight into a
Vec.FxHashMapimport is now#[cfg(feature = "rayon")](only the rayon branchhelpers still coalesce).
The bijection means the
Vecbacking is sound exactly where the map was: keys stay unique.Correctness — bit-identical
A new differential test,
tests/apply_path.rs, FNV-1a-digests the full measurementrecord of a branchy
RXX/RYY/RZZbrickwork over 256 seeds × 8 qubits. Thedigest (
0x2401e08e70e6ecc8) is identical to the hash-coalesce path on boththe default and
rayonbuilds — the golden value was captured onorigin/mainbeforethe change and is asserted by the test.
cargo nextest run --workspace: 857/857 pass (default), 503/503 withrayon,including the independent
ppvm-tableau-sum::sampler_vs_purestate-vector referencechecks for
rxx/ryy/rzz/u3.cargo fmt --check,cargo check --workspace --all-targets, andcargo clippy --workspace -- -D warningsare all clean — no#[allow], nounsafe.Measured result
The headline
stim-circuitsbench is T-gate only and never reachesrotate_2, so thispath was previously unbenchmarked. New
benches/rot2-apply.rsexercises it directly(branchy two-qubit-rotation brickwork). Before/after by swapping only
data.rs:main(hash coalesce)All statistically significant (criterion
p < 0.05, "Performance has improved"). Thenumbers are whole-brickwork wall time, where the apply path is only one component of each
rotate_2(clone + two applies + a separate coalesce), so a single-digit end-to-end winis the honest read for removing one of those components outright.
Follow-up
rotate_2itself (gates/rot2.rs) still coalescesold + branchthrough acapacity-less
HashMap::new()— a genuine two-stream collide, so it's a directsort-merge candidate (PR #154's pattern) and the next obvious win on this path.
🤖 Generated with Claude Code