internal(bench): Reduce benchmark variance for tighter CI results#3880
internal(bench): Reduce benchmark variance for tighter CI results#3880
Conversation
Tighten convergent config (15/10 warmup, 80/60 max iterations, 2%/3% CI targets), add Chromium stability flags, double-GC between scenarios with longer pauses, tune CI system (CPU governor, swap off, robust server wait). Made-with: Cursor
|
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
Same CPU governor and swap tuning as bench-react for consistent results. Made-with: Cursor
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3880 +/- ##
=======================================
Coverage 98.11% 98.11%
=======================================
Files 153 153
Lines 2913 2913
Branches 565 565
=======================================
Hits 2858 2858
Misses 11 11
Partials 44 44 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Benchmark
Details
| Benchmark suite | Current: 27fc308 | Previous: f57e925 | Ratio |
|---|---|---|---|
normalizeLong |
451 ops/sec (±1.90%) |
444 ops/sec (±0.93%) |
0.98 |
normalizeLong Values |
414 ops/sec (±0.21%) |
416 ops/sec (±0.22%) |
1.00 |
denormalizeLong |
248 ops/sec (±3.78%) |
301 ops/sec (±3.03%) |
1.21 |
denormalizeLong Values |
335 ops/sec (±0.48%) |
273 ops/sec (±2.29%) |
0.81 |
denormalizeLong donotcache |
1016 ops/sec (±0.16%) |
1036 ops/sec (±0.13%) |
1.02 |
denormalizeLong Values donotcache |
747 ops/sec (±0.26%) |
756 ops/sec (±0.21%) |
1.01 |
denormalizeShort donotcache 500x |
1569 ops/sec (±0.69%) |
1566 ops/sec (±0.08%) |
1.00 |
denormalizeShort 500x |
1079 ops/sec (±2.29%) |
856 ops/sec (±2.38%) |
0.79 |
denormalizeShort 500x withCache |
7304 ops/sec (±0.22%) |
7654 ops/sec (±0.70%) |
1.05 |
queryShort 500x withCache |
3066 ops/sec (±0.09%) |
3024 ops/sec (±0.27%) |
0.99 |
buildQueryKey All |
54198 ops/sec (±0.38%) |
53733 ops/sec (±0.65%) |
0.99 |
query All withCache |
5967 ops/sec (±0.29%) |
6879 ops/sec (±0.11%) |
1.15 |
denormalizeLong with mixin Entity |
359 ops/sec (±2.58%) |
283 ops/sec (±2.17%) |
0.79 |
denormalizeLong withCache |
7523 ops/sec (±0.25%) |
7737 ops/sec (±0.23%) |
1.03 |
denormalizeLong Values withCache |
5101 ops/sec (±0.24%) |
5100 ops/sec (±0.18%) |
1.00 |
denormalizeLong All withCache |
5757 ops/sec (±0.21%) |
6660 ops/sec (±0.09%) |
1.16 |
denormalizeLong Query-sorted withCache |
5919 ops/sec (±0.22%) |
6780 ops/sec (±0.58%) |
1.15 |
denormalizeLongAndShort withEntityCacheOnly |
1756 ops/sec (±0.23%) |
1813 ops/sec (±0.39%) |
1.03 |
denormalize bidirectional 50 |
6958 ops/sec (±0.24%) |
5882 ops/sec (±1.84%) |
0.85 |
denormalize bidirectional 50 donotcache |
38994 ops/sec (±1.82%) |
41700 ops/sec (±0.57%) |
1.07 |
getResponse |
4568 ops/sec (±0.55%) |
4593 ops/sec (±0.91%) |
1.01 |
getResponse (null) |
10476151 ops/sec (±0.86%) |
9732901 ops/sec (±1.03%) |
0.93 |
getResponse (clear cache) |
336 ops/sec (±2.48%) |
276 ops/sec (±2.02%) |
0.82 |
getSmallResponse |
3647 ops/sec (±0.24%) |
3652 ops/sec (±0.09%) |
1.00 |
getSmallInferredResponse |
2758 ops/sec (±0.17%) |
2690 ops/sec (±0.59%) |
0.98 |
getResponse Collection |
4575 ops/sec (±0.42%) |
4624 ops/sec (±0.45%) |
1.01 |
get Collection |
4580 ops/sec (±0.28%) |
4646 ops/sec (±0.23%) |
1.01 |
get Query-sorted |
4629 ops/sec (±0.33%) |
5261 ops/sec (±0.45%) |
1.14 |
setLong |
461 ops/sec (±0.16%) |
460 ops/sec (±0.22%) |
1.00 |
setLongWithMerge |
259 ops/sec (±0.14%) |
259 ops/sec (±0.26%) |
1 |
setLongWithSimpleMerge |
278 ops/sec (±0.24%) |
270 ops/sec (±0.23%) |
0.97 |
setSmallResponse 500x |
926 ops/sec (±0.94%) |
934 ops/sec (±0.13%) |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Benchmark React
Details
| Benchmark suite | Current: 27fc308 | Previous: f57e925 | Ratio |
|---|---|---|---|
data-client: getlist-100 |
142.86 ops/s (± 4.9%) |
173.93 ops/s (± 4.6%) |
1.22 |
data-client: getlist-500 |
41.75 ops/s (± 5.9%) |
49.75 ops/s (± 5.7%) |
1.19 |
data-client: update-entity |
357.14 ops/s (± 10.5%) |
513.16 ops/s (± 5.0%) |
1.44 |
data-client: update-user |
370.37 ops/s (± 7.4%) |
454.55 ops/s (± 0.0%) |
1.23 |
data-client: getlist-500-sorted |
43.97 ops/s (± 6.1%) |
53.48 ops/s (± 4.5%) |
1.22 |
data-client: update-entity-sorted |
303.03 ops/s (± 7.6%) |
384.62 ops/s (± 5.5%) |
1.27 |
data-client: update-entity-multi-view |
344.83 ops/s (± 6.4%) |
370.37 ops/s (± 6.4%) |
1.07 |
data-client: list-detail-switch-10 |
7.28 ops/s (± 9.4%) |
12.17 ops/s (± 7.6%) |
1.67 |
data-client: update-user-10000 |
82.64 ops/s (± 4.5%) |
101.53 ops/s (± 2.4%) |
1.23 |
data-client: invalidate-and-resolve |
36.23 ops/s (± 4.8%) |
56.82 ops/s (± 1.1%) |
1.57 |
data-client: unshift-item |
222.22 ops/s (± 3.2%) |
322.58 ops/s (± 4.0%) |
1.45 |
data-client: delete-item |
289.92 ops/s (± 7.2%) |
434.78 ops/s (± 3.2%) |
1.50 |
data-client: move-item |
180.19 ops/s (± 6.2%) |
263.16 ops/s (± 5.0%) |
1.46 |
This comment was automatically generated by workflow using github-action-benchmark.
Config tuning alone didn't reduce variance — CI runner noise from CPU migration and shared-infrastructure scheduling is the dominant factor. Pin benchmark processes to cores 0,1 via taskset to eliminate L1/L2 cache thrashing from core migration. Moderate warmup/iteration counts back to reasonable levels since extra iterations can't fix environmental noise. Made-with: Cursor
Motivation
Benchmark CI results have been showing high within-run variance (4–8% on many scenarios, some exceeding 8%), making it difficult to detect real regressions vs noise. Target is commonly <2% and never >5%.
Solution
CPU pinning (
benchmark-react.yml,benchmark.yml):taskset -c 0,1pins the benchmark process tree to CPU cores 0–1 on the 4-vCPU GH Actions runner — eliminates L1/L2 cache thrashing from OS core migration, the dominant source of CI noiseConvergent config (
scenarios.ts):Chromium stability flags (
runner.ts):--disable-background-timer-throttling,--disable-renderer-backgrounding,--disable-backgrounding-occluded-windows,--disable-hang-monitor— prevents Chrome from deprioritizing the benchmark tabGC management (
runner.ts):CONVERGENT_GC_INTERVAL15 → 8 — more frequent GC reduces spike probability during samplesCI system tuning (both
benchmark-react.ymlandbenchmark.yml):performancemode (reduces frequency scaling jitter)sleep 10with pollingcurlfor robust server readinessDocs — Updated expected variance tables in README and benchmarking rule to reflect new targets (<2% stable, 2–4% moderate, 5–15% volatile).
Open questions
N/A