Skip to content

Adds a --bootstrap histogram to rcp_checker/visualization_scripts/rcp_viewer.py#465

Merged
ShriyaRishab merged 4 commits into
mlcommons:masterfrom
matthew-frank:matthew-frank/rcp-jackknife
May 27, 2026
Merged

Adds a --bootstrap histogram to rcp_checker/visualization_scripts/rcp_viewer.py#465
ShriyaRishab merged 4 commits into
mlcommons:masterfrom
matthew-frank:matthew-frank/rcp-jackknife

Conversation

@matthew-frank
Copy link
Copy Markdown
Contributor

No description provided.

matthew-frank and others added 4 commits May 27, 2026 11:38
--jackknife GBS restricts output to the single real (non-interpolated)
RCP at the given global batch size, validating it against the full
measured set (so pruned-out batch sizes are still accepted), and also
prints the benchmark's submission_runs count.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When --jackknife is given, resample the reference convergence runs 1000
times (drawing submission_runs values with replacement), take a trimmed
mean (trim ceil(10%) from each end), and print an ASCII histogram of the
resulting score distribution. Add --seed for reproducible output.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The resampling draws with replacement, which is a bootstrap, not a
jackknife, so name it accurately. Rewrite the flag help to lead with its
real purpose (producing the score histogram) rather than the output
restriction, and increase the resample count from 1000 to 10000 for a
smoother distribution.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add two summary lines to --bootstrap output: max_speedup (RCP mean / RCP
min, the largest score ratio achievable from lucky-fast convergence) and
P(score < min), the measured fraction of bootstrap scores falling below
the RCP min.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@matthew-frank matthew-frank requested review from a team as code owners May 27, 2026 18:23
@github-actions
Copy link
Copy Markdown

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Copy link
Copy Markdown
Contributor

@ShriyaRishab ShriyaRishab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bootstrapping helps understand the spread of submission scores based on the RCPs without actually running infinite submission scores by sampling from the RCP data. It does not account for the trimmed olympic mean and for the fact that our data is not continuous (we do eval only based on eval intervals and not at each step, thus making it discrete).

That said, this visualization can help understand the t-test better so it is approved.

@ShriyaRishab ShriyaRishab merged commit 56e5bcd into mlcommons:master May 27, 2026
2 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators May 27, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants