Enable PR-level performance quality gates by igoragoli · Pull Request #5571 · DataDog/dd-trace-rb

igoragoli · 2026-04-09T14:50:14Z

What does this PR do?

Adds a PR-level performance quality gate for microbenchmarks: microbenchmarks-check-big-regressions.

The job runs after microbenchmarks complete and fails if any benchmark regresses by more than 20%, using bp-runner bp-runner.fail-on-regression.yml --debug.

Motivation:

APMSP-2545 Setup pre-release and PR level quality gates for Ruby

Change log entry

None.

Additional Notes:

Can we bypass this?
Yes. I added a comment with directions on how to do it:

dd-trace-rb/.gitlab/benchmarks.yml

Lines 315 to 317 in 940bd7e

    
           # Verify that the microbenchmarks-check-big-regressions CI job has passed. If any regression happened, merging this PR will be blocked. 
        
           # If bypassing is necessary, see https://datadoghq.atlassian.net/wiki/x/8YFzMwE for more details. 
        
           microbenchmarks-check-big-regressions:

What does this command even mean?

bp-runner is the tool we use to orchestrate benchmarks and provide common actions across repos (such as this one of comparing candidates against baselines against a regression threshold of 20%).
bp-runner.fail-on-regression.yml is the config file which tells bp-runner what to do. It's on https://github.com/DataDog/benchmarking-platform/blob/dd-trace-rb/bp-runner.fail-on-regression.yml, but we could alternatively move it to this repo :)

Why 20%?

This is a default limit for regressions on a single PR.
We could have fixed limits via SLOs, like dd-trace-py does.
We could also shrink this regression threshold and make it more ambitious, but I prefer to make sure benchmarking jobs are up and running correctly and fast first and make this threshold more agressive later.

How to test the change?

microbenchmarks-check-big-regressions running after microbenchmarks in CI: https://gitlab.ddbuild.io/DataDog/apm-reliability/dd-trace-rb/-/jobs/1592412230

github-actions · 2026-04-09T14:50:27Z

Thank you for updating Change log entry section 👏

^{Visited at: 2026-04-09 14:50:40 UTC}

igoragoli · 2026-04-09T14:50:32Z

Enable PR-level performance quality gates #5571 👈 (View in Graphite)
master

This stack of pull requests is managed by Graphite. Learn more about stacking.

pr-commenter · 2026-04-09T15:15:41Z

Benchmarks

Benchmark execution time: 2026-04-14 12:56:25

Comparing candidate commit 515d6bd in PR branch augusto/enable-perf-quality-gates with baseline commit 3260714 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 45 metrics, 1 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

🟩 = significantly better candidate vs. baseline
🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

Adds a chainguard policy allowing GitLab CI to obtain a short-lived GitHub token with contents:read scope. Used by check-slo-breaches to track SLO threshold changes in git history.

Add macrobenchmarks-gates and macrobenchmarks-notify stages. Include check-slo-breaches and notify-slo-breaches templates from benchmarking-platform-tools. Add placeholder check-slo-breaches job that depends on all 8 macrobenchmark jobs. Temporarily set macrobenchmarks to auto-trigger on all branches to collect baseline artifacts for SLO threshold generation.

Adds a quality gate that fails on microbenchmark regressions exceeding 20%. Uses bp-runner fail_on_regression step from benchmarking-platform. Runs after microbenchmarks with when: always to catch failures too. Set to allow_failure: true until thresholds are validated.

Replace check-slo-breaches placeholder with real fail_on_breach implementation. Add notify-slo-breaches job to alert on apm-dcs-performance-alerts. Generate 209 SLO thresholds across 42 scenarios using tight strategy (T=5%). Revert macrobenchmarks to manual trigger on non-master branches.

Move microbenchmarks before macrobenchmarks so macro gates and notify stages are adjacent. Restrict check-slo-breaches and notify-slo-breaches to master only since non-master branches use manual macrobenchmarks.

Drop rules: block from check-slo-breaches and notify-slo-breaches. GitLab ignores top-level when: when rules: is present. Follow dd-trace-py pattern: use when: always with no rules.

Use rules: with when: always on master, default on_success on branches. Remove conflicting top-level when: always which GitLab ignores when rules: is present.

Remove baseline scenarios (not actionable). Keep only: - normal_operation: agg_http_req_duration p50/p99 - high_load: throughput - utilization monitors: cpu_usage_percentage, rss Drop data_received, data_sent, dropped_iterations, http_req_duration. Reduces from 209 to 66 thresholds across 36 scenarios.

Fix macrobenchmarks-notify-slo-breaches referencing wrong job name. Move when: always into rules for microbenchmarks-check-big-regressions since GitLab ignores top-level when: when rules: is present.

Single-run SLO generation produced a tight RSS threshold (2.73 GB) that doesn't account for cross-run variance. Bump to 3.25 GB based on observed values across multiple runs.

Only PR-level microbenchmark regression checks are needed. Remove check-slo-breaches, notify-slo-breaches, SLO thresholds file, dd-octo-sts policy, and associated stages.

datadog-official · 2026-04-14T08:45:04Z

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
• Patch Coverage: 100.00%
• Overall Coverage: 95.35% (-0.02%)

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 515d6bd | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!}

Remove allow_failure from microbenchmarks-check-big-regressions. Restore original stage order (macrobenchmarks before microbenchmarks).

Store bp-runner.fail-on-regression.yml in the repo instead of cloning benchmarking-platform at runtime. Drop redundant CI variable re-exports. Makes the 20% regression threshold visible and configurable in this repo.

igoragoli changed the title ~~ci: scaffold macrobenchmark quality gates and auto-trigger benchmarks~~ ci: enable performance quality gates Apr 9, 2026

igoragoli mentioned this pull request Apr 9, 2026

ci: add dd-octo-sts policy for GitLab SLO change tracking #5570

Closed

igoragoli added the AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos label Apr 9, 2026

igoragoli force-pushed the augusto/enable-perf-quality-gates branch 3 times, most recently from c866a4b to c3caecc Compare April 10, 2026 08:31

igoragoli marked this pull request as ready for review April 14, 2026 08:19

igoragoli requested a review from a team as a code owner April 14, 2026 08:19

igoragoli changed the base branch from augusto/add-perf-quality-gate-dd-octo-sts-policy to master April 14, 2026 08:21

igoragoli added 18 commits April 14, 2026 10:25

ci: add dd-octo-sts policy for GitLab SLO change tracking

10b056b

Adds a chainguard policy allowing GitLab CI to obtain a short-lived GitHub token with contents:read scope. Used by check-slo-breaches to track SLO threshold changes in git history.

ci: restrict SLO checks to master, reorder stages

2a6a3a2

Move microbenchmarks before macrobenchmarks so macro gates and notify stages are adjacent. Restrict check-slo-breaches and notify-slo-breaches to master only since non-master branches use manual macrobenchmarks.

ci: fix check-slo-breaches rules, use when: always without rules

5430551

Drop rules: block from check-slo-breaches and notify-slo-breaches. GitLab ignores top-level when: when rules: is present. Follow dd-trace-py pattern: use when: always with no rules.

ci: fix quality gates rules to match dd-trace-go pattern

874ef5d

Use rules: with when: always on master, default on_success on branches. Remove conflicting top-level when: always which GitLab ignores when rules: is present.

docs: add docs for microbenchmarks-check-big-regressions

fa5fea1

ci: simplify microbenchmarks-check-big-regressions

15177e9

ci: improve docs and naming for pre-release gates

521b2f6

ci: move SLO check jobs to a better place

e9ed8f4

ci: SLO check doesn't need the baseline

525699b

ci: make SLO file more human readable

9856ca3

ci: fix notify needs reference and check-big-regressions rules

702a8fd

Fix macrobenchmarks-notify-slo-breaches referencing wrong job name. Move when: always into rules for microbenchmarks-check-big-regressions since GitLab ignores top-level when: when rules: is present.

ci: bump puma RSS threshold for profiling-and-tracing-and-appsec

1dcd93e

Single-run SLO generation produced a tight RSS threshold (2.73 GB) that doesn't account for cross-run variance. Bump to 3.25 GB based on observed values across multiple runs.

ci: remove macrobenchmark SLO quality gates

56b27f5

Only PR-level microbenchmark regression checks are needed. Remove check-slo-breaches, notify-slo-breaches, SLO thresholds file, dd-octo-sts policy, and associated stages.

ci: retrigger CI

5ce82d2

igoragoli force-pushed the augusto/enable-perf-quality-gates branch from 43a7138 to 5ce82d2 Compare April 14, 2026 08:25

ci: enable microbenchmarks quality gate and restore stage order

940bd7e

Remove allow_failure from microbenchmarks-check-big-regressions. Restore original stage order (macrobenchmarks before microbenchmarks).

igoragoli added the dev/ci Involves CircleCI, GitHub Actions, or GitLab label Apr 14, 2026

igoragoli changed the title ~~ci: enable performance quality gates~~ Enable PR-level performance quality gates Apr 14, 2026

TonyCTHsu reviewed Apr 14, 2026

View reviewed changes

Comment thread .gitlab/benchmarks.yml Outdated

TonyCTHsu reviewed Apr 14, 2026

View reviewed changes

Comment thread .gitlab/benchmarks.yml Outdated

TonyCTHsu reviewed Apr 14, 2026

View reviewed changes

Comment thread .gitlab/benchmarks.yml Outdated

ddyurchenko reviewed Apr 14, 2026

View reviewed changes

Comment thread .gitlab/benchmarks.yml Outdated

igoragoli added 2 commits April 14, 2026 14:25

ci: move fail-on-regression config into dd-trace-rb

cb2c5be

Store bp-runner.fail-on-regression.yml in the repo instead of cloning benchmarking-platform at runtime. Drop redundant CI variable re-exports. Makes the 20% regression threshold visible and configurable in this repo.

ci: drop redundant CI variable re-exports in microbenchmarks-pr-comment

515d6bd

igoragoli requested review from TonyCTHsu and ddyurchenko April 14, 2026 12:31

p-datadog approved these changes Apr 14, 2026

View reviewed changes

gh-worker-dd-devflow-36fce6 bot added the mergequeue-status: waiting label Apr 14, 2026

TonyCTHsu approved these changes Apr 14, 2026

View reviewed changes

gh-worker-dd-devflow-36fce6 bot added mergequeue-status: queued mergequeue-status: in_progress and removed mergequeue-status: waiting mergequeue-status: queued labels Apr 14, 2026

igoragoli merged commit 5cd7833 into master Apr 14, 2026
353 checks passed

gh-worker-dd-devflow-36fce6 bot removed the mergequeue-status: in_progress label Apr 14, 2026

igoragoli deleted the augusto/enable-perf-quality-gates branch April 14, 2026 13:25

gh-worker-dd-devflow-36fce6 bot added the mergequeue-status: done label Apr 14, 2026

github-actions bot added this to the 2.31.0 milestone Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable PR-level performance quality gates#5571

Enable PR-level performance quality gates#5571
igoragoli merged 21 commits intomasterfrom
augusto/enable-perf-quality-gates

igoragoli commented Apr 9, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 9, 2026 •

edited

Loading

Uh oh!

igoragoli commented Apr 9, 2026 •

edited

Loading

Uh oh!

pr-commenter bot commented Apr 9, 2026 •

edited

Loading

Explanation

More details about the CI and significant changes

Uh oh!

datadog-official bot commented Apr 14, 2026 •

edited by datadog-datadog-prod-us1 bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	# Verify that the microbenchmarks-check-big-regressions CI job has passed. If any regression happened, merging this PR will be blocked.
	# If bypassing is necessary, see https://datadoghq.atlassian.net/wiki/x/8YFzMwE for more details.
	microbenchmarks-check-big-regressions:

Conversation

igoragoli commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

igoragoli commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pr-commenter bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Explanation

More details about the CI and significant changes

Uh oh!

datadog-official bot commented Apr 14, 2026 • edited by datadog-datadog-prod-us1 bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

igoragoli commented Apr 9, 2026 •

edited

Loading

github-actions bot commented Apr 9, 2026 •

edited

Loading

igoragoli commented Apr 9, 2026 •

edited

Loading

pr-commenter bot commented Apr 9, 2026 •

edited

Loading

datadog-official bot commented Apr 14, 2026 •

edited by datadog-datadog-prod-us1 bot

Loading