Stabilize performance test SLAs with tolerance thresholds#16371
Stabilize performance test SLAs with tolerance thresholds#16371aviralgarg05 wants to merge 1 commit intoknative:mainfrom
Conversation
Fixes knative#14793 This change adds tolerance thresholds to SLA checks that previously used strict equality for request counts/rates. Due to timing variations in test execution, actual counts can vary slightly from expected values, causing false SLA failures. Changes: - scale-from-zero SLA 3: Allow 1 request tolerance - dataplane-probe SLA 2: Allow 0.1% tolerance (min 1 request) - rollout-probe SLA 2: Allow 1% rate tolerance This approach follows the pattern already established in load-test. Signed-off-by: aviralgarg05 <gargaviral99@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: aviralgarg05 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @aviralgarg05. Thanks for your PR. I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #16371 +/- ##
==========================================
+ Coverage 80.18% 80.21% +0.02%
==========================================
Files 216 216
Lines 13440 13440
==========================================
+ Hits 10777 10781 +4
+ Misses 2298 2296 -2
+ Partials 365 363 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Fixes #14793
Context
This change adds tolerance thresholds to SLA checks that previously used strict equality for request counts/rates. Due to timing variations in test execution, actual counts can vary slightly from expected values, causing false SLA failures.
As noted in the PR #14429 discussion, the SLA failures in performance tests were identified as a separate issue to address. The pattern for handling this already exists in
load-test/main.gowhich uses a 0.1% threshold.Proposed Changes
scale-from-zero SLA 3: Changed from strict equality (
results.Requests == uint64(parallel)) to allow 1 request tolerance. This handles cases like "24 requests when 25 expected" due to timing.dataplane-probe SLA 2: Changed from strict equality to 0.1% tolerance (minimum 1 request), matching the pattern in load-test.
rollout-probe SLA 2: Changed from
math.Round(results.Rate) == rate.Rate(time.Second)to allow 1% rate tolerance usingmath.Abs(results.Rate-expectedRate) <= tolerance.Files Changed
test/performance/benchmarks/scale-from-zero/main.gotest/performance/benchmarks/dataplane-probe/main.gotest/performance/benchmarks/rollout-probe/main.goVerification
go build ./test/performance/...- Passedgo vet ./test/performance/benchmarks/...- Passedgofmt -s -d- No formatting issuesRelease Note