bench(tableau): branch-coalesce scaling — sort-merge vs FxHashMap A/B by Roger-luo · Pull Request #156 · QuEraComputing/ppvm

Roger-luo · 2026-06-24T04:54:00Z

Summary

Follow-up study for #154, which replaced the FxHashMap coalesce in the T-gate
hot path (GeneralizedTableau::branch_with_coefficients) with a sort-merge and
measured ~10× on cultivation_d5. That win was found on one circuit. This bench
answers the open questions head-on:

Does the sort-merge advantage persist as the branch count m grows, and is
there a regime where the hash coalesce wins again?

Because #154 deleted the hash path from the default build (it survives only behind
rayon), there's no way to A/B the two through the public gate API — so the bench
reimplements both coalesce routines as faithful free functions:

coalesce_sortmerge — verbatim port of the sequential sort-merge in
branch_with_coefficients, keeping both the u64-packed fast path and the
generic (I, u32) fallback.
coalesce_hashmap — the pre-perf(tableau): sort-merge branch & measurement coalesce (~10× on cultivation_d5) #154 FxHashMap coalesce (mirrors
branch_coefficients_seq).

Both consume identical real input: a coefficient vector grown to exactly
m = 2^k by k branching T gates on an 80-qubit u128 tableau, plus the genuine
decomposition of the next T gate. verify_equivalence asserts the two produce the
same coefficient set before any timing, so a drifted port fails loudly.

k branching T gates → exactly m = 2^k branches (T gates touch only the
coefficient vector, never the tableau), so the swept axis is the T-gate count.
40 untruncated branching T gates would be 2^40 ≈ 10^12 branches, out of reach for
any coalesce — so the honest variable is m. Two collision regimes are measured:

doubling — the next T flips a fresh index bit (output 2m, zero merges);
the canonical per-T-gate cost.
merge — the next T flips a bit the set is already closed under (output m,
all collisions); the flavour of the measurement case-a path.

Result

The #154 win persists and grows at scale in the doubling regime; the hash
coalesce wins back the collision-heavy regime.

m	doubling sort-merge speedup	merge sort-merge speedup
4	1.15×	1.00×
256	1.94×	1.18×
2 048	1.41×	0.82× (hash wins)
16 384	1.49×	0.90×
65 536	1.63×	0.57× (hash wins)
262 144	3.41×	0.65×
1 048 576	3.83×	0.83×

speedup = t_hashmap / t_sortmerge (>1 sort-merge wins, <1 hash wins). Medians, 80q / u128 index.

Why. In doubling the 2m output keys are all distinct: the hashmap does 2m
random probes into a 2m-entry table and hits a cache cliff once it outgrows L3
(8.4× slower for 4× more work between m=64K→256K), while sort-merge stays
bandwidth-bound and scales linearly — exactly the "gap widens with scale" claim in
#154, confirmed to 3.8×. In merge only m keys are distinct: the table stays
half-size and hot, entry() coalesces-on-insert for free, and sort-merge's
O(m log m) sort becomes pure overhead for an m-size output, so hash wins for
m ≳ 2K.

📈 Scaling plot (left: time vs m, log-log; right: sort-merge speedup with the
crossover line and "hash wins" band) is at benchmarks/branch_coalesce_scaling.png
— rendered locally and attached below; not checked in, per the benchmarks/
convention.

Actionable follow-up

#154 also applied sort-merge to the measurement case-a coalesce in
measure.rs, which is collision-heavy (projection roughly halves the set) — i.e.
the merge regime, where this bench shows the hash coalesce is up to ~1.75× faster
at large m. Worth checking whether case-a should keep (or revert to) the hash
coalesce; the harness extends naturally to model that path directly.

Reproduce

cargo bench -p ppvm-tableau --bench branch-coalesce-scaling   # PPVM_BRANCH_MAX_EXP=22 to push higher
uv run --with matplotlib python benchmarks/plot_branch_coalesce.py \
  --out benchmarks/branch_coalesce_scaling.png

Files

crates/ppvm-tableau/benches/branch-coalesce-scaling.rs — the A/B bench.
benchmarks/plot_branch_coalesce.py — renders the plot straight from criterion's estimates.json.
benchmarks/README.md, Cargo.toml — doc section + bench registration.

🤖 Generated with Claude Code

Follow-up study for #154, which replaced the FxHashMap coalesce in the T-gate hot path (branch_with_coefficients) with a sort-merge. This bench answers the open question: does the win persist as the branch count m grows, and where does the hash coalesce win again? #154 deleted the hash path from the default build, so the bench reimplements both coalesce routines as faithful free functions (the sort-merge keeps both the u64-packed fast path and the generic fallback; the hashmap mirrors branch_coefficients_seq), asserts them equivalent on real input at start-up, then drives both with identical coefficient vectors grown to m = 2^k on an 80-qubit u128 tableau. Two collision regimes: doubling (fresh bit, output 2m — the per-T-gate cost) and merge (closed set, output m — the measurement case-a flavour). Findings (medians, 80q/u128): - doubling: sort-merge wins throughout, gap widens 1.1x (m=4) -> 3.8x (m=2^20) as the hash table outgrows L3 and goes cache-miss-bound. - merge: the hash coalesce overtakes for m >~ 2K (up to ~1.75x at m=2^16), where dense collisions make the O(m log m) sort pure overhead. Adds benchmarks/plot_branch_coalesce.py, which renders the scaling plot (time vs m, and sort-merge speedup with the crossover band) straight from criterion's estimates.json. Rendered PNG stays untracked per convention. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-24T05:03:17Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-06-24 15:52 UTC

…-scaling # Conflicts: # crates/ppvm-tableau/Cargo.toml

…nto bench/branch-coalesce-scaling

Roger-luo requested a review from david-pl June 24, 2026 05:00

Merge branch 'main' into bench/branch-coalesce-scaling

0f4e2d7

david-pl approved these changes Jun 24, 2026

View reviewed changes

Roger-luo added 2 commits June 24, 2026 11:42

Merge remote-tracking branch 'origin/main' into bench/branch-coalesce…

bfac97a

…-scaling # Conflicts: # crates/ppvm-tableau/Cargo.toml

Merge remote-tracking branch 'origin/bench/branch-coalesce-scaling' i…

2ede9b3

…nto bench/branch-coalesce-scaling

Roger-luo enabled auto-merge (squash) June 24, 2026 15:45

Roger-luo merged commit f036796 into main Jun 24, 2026
10 of 11 checks passed

Roger-luo deleted the bench/branch-coalesce-scaling branch June 24, 2026 15:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bench(tableau): branch-coalesce scaling — sort-merge vs FxHashMap A/B#156

bench(tableau): branch-coalesce scaling — sort-merge vs FxHashMap A/B#156
Roger-luo merged 4 commits into
mainfrom
bench/branch-coalesce-scaling

Roger-luo commented Jun 24, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Roger-luo commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Result

Actionable follow-up

Reproduce

Files

Uh oh!

github-actions Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Roger-luo commented Jun 24, 2026 •

edited

Loading

github-actions Bot commented Jun 24, 2026 •

edited

Loading