Enable PR-level performance quality gates#5571
Conversation
|
Thank you for updating Change log entry section 👏 Visited at: 2026-04-09 14:50:40 UTC |
BenchmarksBenchmark execution time: 2026-04-14 12:56:25 Comparing candidate commit 515d6bd in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 45 metrics, 1 unstable metrics.
|
c866a4b to
c3caecc
Compare
Adds a chainguard policy allowing GitLab CI to obtain a short-lived GitHub token with contents:read scope. Used by check-slo-breaches to track SLO threshold changes in git history.
Add macrobenchmarks-gates and macrobenchmarks-notify stages. Include check-slo-breaches and notify-slo-breaches templates from benchmarking-platform-tools. Add placeholder check-slo-breaches job that depends on all 8 macrobenchmark jobs. Temporarily set macrobenchmarks to auto-trigger on all branches to collect baseline artifacts for SLO threshold generation.
Adds a quality gate that fails on microbenchmark regressions exceeding 20%. Uses bp-runner fail_on_regression step from benchmarking-platform. Runs after microbenchmarks with when: always to catch failures too. Set to allow_failure: true until thresholds are validated.
Replace check-slo-breaches placeholder with real fail_on_breach implementation. Add notify-slo-breaches job to alert on apm-dcs-performance-alerts. Generate 209 SLO thresholds across 42 scenarios using tight strategy (T=5%). Revert macrobenchmarks to manual trigger on non-master branches.
Move microbenchmarks before macrobenchmarks so macro gates and notify stages are adjacent. Restrict check-slo-breaches and notify-slo-breaches to master only since non-master branches use manual macrobenchmarks.
Drop rules: block from check-slo-breaches and notify-slo-breaches. GitLab ignores top-level when: when rules: is present. Follow dd-trace-py pattern: use when: always with no rules.
Use rules: with when: always on master, default on_success on branches. Remove conflicting top-level when: always which GitLab ignores when rules: is present.
Remove baseline scenarios (not actionable). Keep only: - normal_operation: agg_http_req_duration p50/p99 - high_load: throughput - utilization monitors: cpu_usage_percentage, rss Drop data_received, data_sent, dropped_iterations, http_req_duration. Reduces from 209 to 66 thresholds across 36 scenarios.
Fix macrobenchmarks-notify-slo-breaches referencing wrong job name. Move when: always into rules for microbenchmarks-check-big-regressions since GitLab ignores top-level when: when rules: is present.
Single-run SLO generation produced a tight RSS threshold (2.73 GB) that doesn't account for cross-run variance. Bump to 3.25 GB based on observed values across multiple runs.
Only PR-level microbenchmark regression checks are needed. Remove check-slo-breaches, notify-slo-breaches, SLO thresholds file, dd-octo-sts policy, and associated stages.
43a7138 to
5ce82d2
Compare
|
✅ Tests 🎉 All green!❄️ No new flaky tests detected 🎯 Code Coverage (details) 🔗 Commit SHA: 515d6bd | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback! |
Remove allow_failure from microbenchmarks-check-big-regressions. Restore original stage order (macrobenchmarks before microbenchmarks).
Store bp-runner.fail-on-regression.yml in the repo instead of cloning benchmarking-platform at runtime. Drop redundant CI variable re-exports. Makes the 20% regression threshold visible and configurable in this repo.

What does this PR do?
Adds a PR-level performance quality gate for microbenchmarks:
microbenchmarks-check-big-regressions.The job runs after microbenchmarks complete and fails if any benchmark regresses by more than 20%, using
bp-runner bp-runner.fail-on-regression.yml --debug.Motivation:
APMSP-2545 Setup pre-release and PR level quality gates for Ruby
Change log entry
None.
Additional Notes:
Can we bypass this?
Yes. I added a comment with directions on how to do it:
dd-trace-rb/.gitlab/benchmarks.yml
Lines 315 to 317 in 940bd7e
What does this command even mean?
bp-runneris the tool we use to orchestrate benchmarks and provide common actions across repos (such as this one of comparing candidates against baselines against a regression threshold of 20%).bp-runner.fail-on-regression.ymlis the config file which tellsbp-runnerwhat to do. It's on https://github.com/DataDog/benchmarking-platform/blob/dd-trace-rb/bp-runner.fail-on-regression.yml, but we could alternatively move it to this repo :)Why 20%?
How to test the change?
microbenchmarks-check-big-regressionsrunning after microbenchmarks in CI: https://gitlab.ddbuild.io/DataDog/apm-reliability/dd-trace-rb/-/jobs/1592412230