Skip to content

internal(bench): Reduce benchmark variance for tighter CI results#3880

Merged
ntucker merged 3 commits intomasterfrom
bench-react-reduce-variance
Apr 6, 2026
Merged

internal(bench): Reduce benchmark variance for tighter CI results#3880
ntucker merged 3 commits intomasterfrom
bench-react-reduce-variance

Conversation

@ntucker
Copy link
Copy Markdown
Collaborator

@ntucker ntucker commented Apr 6, 2026

Motivation

Benchmark CI results have been showing high within-run variance (4–8% on many scenarios, some exceeding 8%), making it difficult to detect real regressions vs noise. Target is commonly <2% and never >5%.

Solution

CPU pinning (benchmark-react.yml, benchmark.yml):

  • taskset -c 0,1 pins the benchmark process tree to CPU cores 0–1 on the 4-vCPU GH Actions runner — eliminates L1/L2 cache thrashing from OS core migration, the dominant source of CI noise

Convergent config (scenarios.ts):

  • Warmup iterations: 5/3 → 8/5 (small/large) — more JIT warmup without excessive CI time
  • Min measurement: 5 → 10 — ensures enough samples for tight confidence intervals
  • Max measurement: 50/40 → 60/50 — headroom for noisy scenarios to converge
  • CI convergence target: 4%/6% → 2%/3% — runner keeps collecting until margin is tight

Chromium stability flags (runner.ts):

  • --disable-background-timer-throttling, --disable-renderer-backgrounding, --disable-backgrounding-occluded-windows, --disable-hang-monitor — prevents Chrome from deprioritizing the benchmark tab

GC management (runner.ts):

  • Double-GC between scenarios (500ms total pause) — single pass doesn't always collect incremental/weak refs
  • CONVERGENT_GC_INTERVAL 15 → 8 — more frequent GC reduces spike probability during samples
  • Double-GC at each interval within the convergent loop

CI system tuning (both benchmark-react.yml and benchmark.yml):

  • CPU governor → performance mode (reduces frequency scaling jitter)
  • Swap disabled (prevents random swap-in/out spikes)
  • React benchmark: replaced sleep 10 with polling curl for robust server readiness

Docs — Updated expected variance tables in README and benchmarking rule to reflect new targets (<2% stable, 2–4% moderate, 5–15% volatile).

Open questions

N/A

Tighten convergent config (15/10 warmup, 80/60 max iterations, 2%/3% CI
targets), add Chromium stability flags, double-GC between scenarios with
longer pauses, tune CI system (CPU governor, swap off, robust server wait).

Made-with: Cursor
@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Apr 6, 2026

⚠️ No Changeset found

Latest commit: 27fc308

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@vercel
Copy link
Copy Markdown

vercel bot commented Apr 6, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs-site Ignored Ignored Preview Apr 6, 2026 1:22pm

Same CPU governor and swap tuning as bench-react for consistent results.

Made-with: Cursor
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.11%. Comparing base (f57e925) to head (27fc308).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3880   +/-   ##
=======================================
  Coverage   98.11%   98.11%           
=======================================
  Files         153      153           
  Lines        2913     2913           
  Branches      565      565           
=======================================
  Hits         2858     2858           
  Misses         11       11           
  Partials       44       44           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark

Details
Benchmark suite Current: 27fc308 Previous: f57e925 Ratio
normalizeLong 451 ops/sec (±1.90%) 444 ops/sec (±0.93%) 0.98
normalizeLong Values 414 ops/sec (±0.21%) 416 ops/sec (±0.22%) 1.00
denormalizeLong 248 ops/sec (±3.78%) 301 ops/sec (±3.03%) 1.21
denormalizeLong Values 335 ops/sec (±0.48%) 273 ops/sec (±2.29%) 0.81
denormalizeLong donotcache 1016 ops/sec (±0.16%) 1036 ops/sec (±0.13%) 1.02
denormalizeLong Values donotcache 747 ops/sec (±0.26%) 756 ops/sec (±0.21%) 1.01
denormalizeShort donotcache 500x 1569 ops/sec (±0.69%) 1566 ops/sec (±0.08%) 1.00
denormalizeShort 500x 1079 ops/sec (±2.29%) 856 ops/sec (±2.38%) 0.79
denormalizeShort 500x withCache 7304 ops/sec (±0.22%) 7654 ops/sec (±0.70%) 1.05
queryShort 500x withCache 3066 ops/sec (±0.09%) 3024 ops/sec (±0.27%) 0.99
buildQueryKey All 54198 ops/sec (±0.38%) 53733 ops/sec (±0.65%) 0.99
query All withCache 5967 ops/sec (±0.29%) 6879 ops/sec (±0.11%) 1.15
denormalizeLong with mixin Entity 359 ops/sec (±2.58%) 283 ops/sec (±2.17%) 0.79
denormalizeLong withCache 7523 ops/sec (±0.25%) 7737 ops/sec (±0.23%) 1.03
denormalizeLong Values withCache 5101 ops/sec (±0.24%) 5100 ops/sec (±0.18%) 1.00
denormalizeLong All withCache 5757 ops/sec (±0.21%) 6660 ops/sec (±0.09%) 1.16
denormalizeLong Query-sorted withCache 5919 ops/sec (±0.22%) 6780 ops/sec (±0.58%) 1.15
denormalizeLongAndShort withEntityCacheOnly 1756 ops/sec (±0.23%) 1813 ops/sec (±0.39%) 1.03
denormalize bidirectional 50 6958 ops/sec (±0.24%) 5882 ops/sec (±1.84%) 0.85
denormalize bidirectional 50 donotcache 38994 ops/sec (±1.82%) 41700 ops/sec (±0.57%) 1.07
getResponse 4568 ops/sec (±0.55%) 4593 ops/sec (±0.91%) 1.01
getResponse (null) 10476151 ops/sec (±0.86%) 9732901 ops/sec (±1.03%) 0.93
getResponse (clear cache) 336 ops/sec (±2.48%) 276 ops/sec (±2.02%) 0.82
getSmallResponse 3647 ops/sec (±0.24%) 3652 ops/sec (±0.09%) 1.00
getSmallInferredResponse 2758 ops/sec (±0.17%) 2690 ops/sec (±0.59%) 0.98
getResponse Collection 4575 ops/sec (±0.42%) 4624 ops/sec (±0.45%) 1.01
get Collection 4580 ops/sec (±0.28%) 4646 ops/sec (±0.23%) 1.01
get Query-sorted 4629 ops/sec (±0.33%) 5261 ops/sec (±0.45%) 1.14
setLong 461 ops/sec (±0.16%) 460 ops/sec (±0.22%) 1.00
setLongWithMerge 259 ops/sec (±0.14%) 259 ops/sec (±0.26%) 1
setLongWithSimpleMerge 278 ops/sec (±0.24%) 270 ops/sec (±0.23%) 0.97
setSmallResponse 500x 926 ops/sec (±0.94%) 934 ops/sec (±0.13%) 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark React

Details
Benchmark suite Current: 27fc308 Previous: f57e925 Ratio
data-client: getlist-100 142.86 ops/s (± 4.9%) 173.93 ops/s (± 4.6%) 1.22
data-client: getlist-500 41.75 ops/s (± 5.9%) 49.75 ops/s (± 5.7%) 1.19
data-client: update-entity 357.14 ops/s (± 10.5%) 513.16 ops/s (± 5.0%) 1.44
data-client: update-user 370.37 ops/s (± 7.4%) 454.55 ops/s (± 0.0%) 1.23
data-client: getlist-500-sorted 43.97 ops/s (± 6.1%) 53.48 ops/s (± 4.5%) 1.22
data-client: update-entity-sorted 303.03 ops/s (± 7.6%) 384.62 ops/s (± 5.5%) 1.27
data-client: update-entity-multi-view 344.83 ops/s (± 6.4%) 370.37 ops/s (± 6.4%) 1.07
data-client: list-detail-switch-10 7.28 ops/s (± 9.4%) 12.17 ops/s (± 7.6%) 1.67
data-client: update-user-10000 82.64 ops/s (± 4.5%) 101.53 ops/s (± 2.4%) 1.23
data-client: invalidate-and-resolve 36.23 ops/s (± 4.8%) 56.82 ops/s (± 1.1%) 1.57
data-client: unshift-item 222.22 ops/s (± 3.2%) 322.58 ops/s (± 4.0%) 1.45
data-client: delete-item 289.92 ops/s (± 7.2%) 434.78 ops/s (± 3.2%) 1.50
data-client: move-item 180.19 ops/s (± 6.2%) 263.16 ops/s (± 5.0%) 1.46

This comment was automatically generated by workflow using github-action-benchmark.

Config tuning alone didn't reduce variance — CI runner noise from CPU
migration and shared-infrastructure scheduling is the dominant factor.
Pin benchmark processes to cores 0,1 via taskset to eliminate L1/L2
cache thrashing from core migration. Moderate warmup/iteration counts
back to reasonable levels since extra iterations can't fix environmental
noise.

Made-with: Cursor
@ntucker ntucker changed the title internal(bench-react): Reduce benchmark variance for tighter CI results internal(bench): Reduce benchmark variance for tighter CI results Apr 6, 2026
@ntucker ntucker merged commit cc330d6 into master Apr 6, 2026
24 checks passed
@ntucker ntucker deleted the bench-react-reduce-variance branch April 6, 2026 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant