Skip to content

perf(tableau): drop hash coalesce from apply path (bijective relabel)#155

Merged
Roger-luo merged 2 commits into
mainfrom
perf/apply-path-relabel
Jun 24, 2026
Merged

perf(tableau): drop hash coalesce from apply path (bijective relabel)#155
Roger-luo merged 2 commits into
mainfrom
perf/apply-path-relabel

Conversation

@Roger-luo

Copy link
Copy Markdown
Collaborator

Summary

Generalises PR #154's core insight — stop random-probing a hash map where a
sequential pass suffices
— to the one hot path #154 left untouched: the apply
coefficient accumulation in ppvm-tableau (compute_coefficients_after_pauli_apply,
reached only via rotate_2RXX/RYY/RZZ).

That accumulation relabels every branch by a fixed branch_index = idx ^ stab_anticomm_bits.
XOR by a constant is a bijection, so distinct input indices always map to
distinct branch indices. Unlike the T-gate branch split (which emits two streams
that genuinely collide and therefore need a real coalesce), the apply path emits a
single stream with no possible collisions — the FxHashMap there could never
merge two entries. It was pure overhead: a hash + probe per element plus a table
allocation, to perform what is just a relabel.

This PR replaces it with a flat-Vec relabel in one sequential, prefetch-friendly
pass:

  • apply_coefficients_seq builds Vec<(I, Complex)> instead of coalescing into a map.
  • apply_coefficients_parallel drops the post-par_map sequential hash-fold — the
    mapped pairs already have unique keys, so the result is collected straight into a Vec.
  • The FxHashMap import is now #[cfg(feature = "rayon")] (only the rayon branch
    helpers still coalesce).

The bijection means the Vec backing is sound exactly where the map was: keys stay unique.

Correctness — bit-identical

A new differential test, tests/apply_path.rs, FNV-1a-digests the full measurement
record
of a branchy RXX/RYY/RZZ brickwork over 256 seeds × 8 qubits. The
digest (0x2401e08e70e6ecc8) is identical to the hash-coalesce path on both
the default and rayon builds — the golden value was captured on origin/main before
the change and is asserted by the test.

  • cargo nextest run --workspace: 857/857 pass (default), 503/503 with rayon,
    including the independent ppvm-tableau-sum::sampler_vs_pure state-vector reference
    checks for rxx/ryy/rzz/u3.
  • cargo fmt --check, cargo check --workspace --all-targets, and
    cargo clippy --workspace -- -D warnings are all clean — no #[allow], no unsafe.

Measured result

The headline stim-circuits bench is T-gate only and never reaches rotate_2, so this
path was previously unbenchmarked. New benches/rot2-apply.rs exercises it directly
(branchy two-qubit-rotation brickwork). Before/after by swapping only data.rs:

workload main (hash coalesce) this PR (relabel) change
n8, m≈256 581 µs 545 µs −5.9%
n10, m≈1024 3.16 ms 2.94 ms −7.0%
n12, m≈4096 11.39 ms 10.97 ms −3.8%

All statistically significant (criterion p < 0.05, "Performance has improved"). The
numbers are whole-brickwork wall time, where the apply path is only one component of each
rotate_2 (clone + two applies + a separate coalesce), so a single-digit end-to-end win
is the honest read for removing one of those components outright.

Follow-up

rotate_2 itself (gates/rot2.rs) still coalesces old + branch through a
capacity-less HashMap::new() — a genuine two-stream collide, so it's a direct
sort-merge candidate (PR #154's pattern) and the next obvious win on this path.

🤖 Generated with Claude Code

The apply coefficient accumulation (`compute_coefficients_after_pauli_apply`,
reached only via `rotate_2` → RXX/RYY/RZZ) relabels every branch by a fixed
`idx ^ stab_anticomm_bits`. XOR by a constant is a bijection, so distinct input
indices always produce distinct branch indices — unlike the T-gate branch split,
the apply path emits a single stream with no possible collisions. The
`FxHashMap` coalesce there could therefore never merge two entries: it was pure
overhead (a hash + probe per element plus a table allocation) for what is a
straight relabel.

Replace it with a flat-`Vec` relabel in one sequential pass (`apply_coefficients_seq`),
and drop the post-`par_map` sequential hash-fold in `apply_coefficients_parallel`
(the mapped pairs already have unique keys). Generalises PR #154's "stop random
probing a map" insight to the sibling path #154 left untouched.

Correctness: a new differential test (`tests/apply_path.rs`) FNV-1a-digests the
full measurement record of a branchy RXX/RYY/RZZ brickwork over 256 seeds; the
digest is bit-identical to the hash-coalesce path on both default and `rayon`
builds. `cargo nextest run --workspace` is 857/857, including the independent
`sampler_vs_pure` state-vector reference checks.

A new `benches/rot2-apply.rs` exercises the path (the headline `stim-circuits`
bench is T-gate only and never reaches `rotate_2`).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Roger-luo Roger-luo requested a review from david-pl June 24, 2026 04:32
@github-actions

github-actions Bot commented Jun 24, 2026

Copy link
Copy Markdown
PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-06-24 14:50 UTC

@david-pl david-pl left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Roger-luo Roger-luo enabled auto-merge (squash) June 24, 2026 14:43
@Roger-luo Roger-luo merged commit 249a56a into main Jun 24, 2026
13 checks passed
@Roger-luo Roger-luo deleted the perf/apply-path-relabel branch June 24, 2026 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants