Skip to content

OCPQE-32065: Proposed Implementation for Spot Check Jobs#3612

Open
dgoodwin wants to merge 39 commits into
openshift:mainfrom
dgoodwin:spot-check-jobs
Open

OCPQE-32065: Proposed Implementation for Spot Check Jobs#3612
dgoodwin wants to merge 39 commits into
openshift:mainfrom
dgoodwin:spot-check-jobs

Conversation

@dgoodwin

@dgoodwin dgoodwin commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

From the next-gen QE effort, we are looking to implement spot check jobs, jobs which are much less sensitive, have an owning component/capability, test a specific install configuration or stable feature, once in awhile, to confirm it still works. The goal is for them to run just once a month, be periodically retried if they fail every day or so up to perhaps four attempts, and after roughly 2 failures the component goes regressed in component readiness. One pass is all we require.

This is intended to provide a mechanism to land the many jobs QE has for obscure configurations, and for feature jobs that ran often for GA promotion to continue producing signal without a lot of cost. Examples today would include etcd-scaling job (which has never passed, but this is due to monitortests, spot check jobs will not run failable monitortests: openshift/origin#31212)

This PR adds the mechanism to have them mark as regressions in component readiness.

This attempts something we talked about for a long time, what if component readiness could show signal for other things.

Middleware concept pays off as this stays fairly localized to middleware that gets it's chance to inject the synthetic test results and handle all drilldown/reports.

Jobs must have capabilities/components for this system, I did this via the variant registry.

This replaces the admittedly terrible prior attempt, rarely run jobs: #3562

Implications:

  • This uses synthetic test IDs. The PR proves we can make this work in component readiness, but any downstream tooling that were to try to use that test ID in another system is going to find no results.

Summary by CodeRabbit

  • New Features

    • Added configurable spot-check job analysis windows to component readiness reports (including automatic default spot-check-30d when not explicitly set).
    • Added Spot-Check comparison output and drill-down for underlying spot-check job runs.
    • Introduced additional statistical analysis for regression detection (Fisher’s exact and pass-rate based).
    • Updated the UI to render spot-check mode with spot-check–specific layout and labels.
  • Refactor

    • Migrated per-test analysis from generator logic to a middleware-driven flow, enabling targeted analyses per scenario.
  • Bug Fixes

    • Improved report generation to continue when per-test detail inputs are partially missing; spot-check tests no longer require all baseline variants to be present.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a spot-check jobs feature to component readiness that queries non-JUnit periodic/release jobs grouped by SpotCheckComponent/SpotCheckCapability BigQuery variants. Statistical analysis (Fisher exact, pass-rate) is extracted from the generator into dedicated middleware. A new SpotCheckJobs middleware injects synthetic test results with a run-count–based status heuristic. Frontend panels hide the baseline comparison section for spot-check results.

Changes

Spot-Check Jobs for Component Readiness

Layer / File(s) Summary
API types, constants, and contract definitions
pkg/apis/api/componentreport/crtest/types.go, pkg/apis/api/componentreport/crview/types.go, pkg/apis/api/componentreport/reqopts/types.go, pkg/api/componentreadiness/dataprovider/interface.go, pkg/api/componentreadiness/middleware/interface.go, pkg/bigquery/bqlabel/labels.go
Adds SpotCheck comparison constant, SpotCheckJobSample/SpotCheckJobSampleOpts config and request types, SpotCheckQuerier interface embedded in DataProvider with SpotCheckGroup/JobRunDetail structs, and an Analyze(testKey, testStats) (bool, error) hook on Middleware.
Data provider interface and implementations
pkg/api/componentreadiness/dataprovider/interface.go, pkg/api/componentreadiness/dataprovider/bigquery/provider.go, pkg/api/componentreadiness/dataprovider/postgres/provider.go
Introduces SpotCheckQuerier interface embedded in DataProvider; BigQueryProvider implements QuerySpotCheckJobRuns (aggregated group runs with dynamic variant JOINs and COALESCE component/capability fallback) and QuerySpotCheckJobRunDetails (per-run drill-down with case-insensitive filtering); PostgresProvider adds no-op stubs.
Middleware Analyze hook and dispatch
pkg/api/componentreadiness/middleware/interface.go, pkg/api/componentreadiness/middleware/list.go, pkg/api/componentreadiness/middleware/analysis/passrate.go
Adds Analyze method to Middleware interface for override-based statistical analysis; List.Analyze sequentially dispatches to middlewares with first-responder-wins semantics; shared analysis module provides BuildPassRateTestStats and ExplanationNoRegression constant.
Statistical analysis middleware implementations
pkg/api/componentreadiness/middleware/{alltestspassrate,fisherexact,newtestpassrate}/, pkg/api/componentreadiness/middleware/{linkinjector,regressionallowances,regressiontracker,releasefallback}/, test files
FisherExact implements contingency-table Fisher flow with confidence threshold, pity/minimum-failure triage, and regression messaging; NewTestPassRate and AllTestsPassRate compute pass-rate stats for new and all-test scenarios; existing middlewares receive no-op Analyze stubs; comprehensive tests validate all scenarios.
SpotCheckJobs middleware
pkg/api/componentreadiness/middleware/spotcheckjobs/spotcheckjobs.go, pkg/api/componentreadiness/middleware/spotcheckjobs/spotcheckjobs_test.go
Queries BigQuery for aggregated spot-check groups per sample window, filters by requested variants, and publishes synthetic TestStatus entries. Analyze applies run-count heuristic (0/1→MissingSample, 2→SignificantRegression, 3+→ExtremeRegression, any success→NotSignificant). PreTestDetailsAnalysis populates drill-down rows from mutex-protected job-run cache. Tests validate variant filtering, query behavior, and status logic.
Generator: middleware wiring and analysis delegation
pkg/api/componentreadiness/component_report.go, pkg/api/componentreadiness/component_report_test.go
initializeMiddleware conditionally appends SpotCheckJobs when configured, then wires ordered analysis chain (NewTestPassRate→AllTestsPassRate→FisherExact); generateComponentTestReport replaces local assessComponentStatus with c.middlewares.Analyze; removes local analysis functions. Test updates initialize middleware and rename analysis test.
Test details: spot-check validation and middleware dispatch
pkg/api/componentreadiness/test_details.go
Bypasses required-variants validation for spotcheck- test IDs; creates empty TestJobRunStatuses for missing test keys; promotes PreTestDetailsAnalysis to run unconditionally; replaces assessComponentStatus with middlewares.Analyze and logs errors.
Request parsing: resolve spot-check samples
pkg/api/componentreadiness/utils/queryparamparser.go
ParseComponentReportRequest resolves view.SpotCheckJobSamples into concrete release windows using GetViewReleaseOptions("spot_check_<name>", ...); defaults to spotcheck-30d entry when sample release has start/end set.
Cache loader: spot-check sample resolution
pkg/dataloader/regressioncacheloader/regressioncacheloader.go
buildGenerator resolves view.SpotCheckJobSamples entries into RelativeRelease with "spot_check_" type, calls GetViewReleaseOptions with CR rounding, appends resolved SpotCheckJobSampleOpts to reqOpts.SpotCheckJobSamples.
Frontend: conditional rendering for spot-check
sippy-ng/src/component_readiness/TestDetailsReport.js, sippy-ng/src/component_readiness/CompReadyTestPanel.js, sippy-ng/src/component_readiness/CompReadyTestDetailRow.js, sippy-ng/src/component_readiness/CompReadyUtils.js
TestDetailsReport derives isSpotCheck from comparison==='spot_check', suppresses base-override section for spot-checks. CompReadyTestPanel hides "Basis (historical)", expands sample to full width as "Spot-Check Window", omits basis header cells. CompReadyTestDetailRow conditionally omits basis cells and adjusts colSpan. CompReadyUtils injects view parameter into query strings.
Configuration, variant sanitization, and lock file
config/views.yaml, pkg/api/componentreadiness/dataprovider/bigquery/querygenerators.go, pkg/bigquery/bqlabel/labels.go, apm.lock.yaml
views.yaml adds spotcheck-30d sample window and JobTier variant to 5.0-main; querygenerators.go cleanses variant keys in generated SQL aliases/joins for consistent naming; bqlabel.go adds CRSpotCheck and CRSpotCheckDetails query labels; apm.lock.yaml updates hash values.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant queryparamparser
  participant Generator
  participant SpotCheckJobsMiddleware
  participant BigQueryProvider
  participant AnalysisMiddlewareChain

  Client->>queryparamparser: ParseComponentReportRequest(view)
  queryparamparser->>queryparamparser: resolve SpotCheckJobSamples → absolute Release windows
  queryparamparser-->>Generator: RequestOptions with SpotCheckJobSamples

  Generator->>Generator: initializeMiddleware()
  Note over Generator: appends SpotCheckJobs if SpotCheckJobSamples non-empty,<br/>then NewTestPassRate→AllTestsPassRate→FisherExact

  Generator->>SpotCheckJobsMiddleware: Query(ctx, wg, allJobVariants, _, sampleStatusCh, errCh)
  SpotCheckJobsMiddleware->>BigQueryProvider: QuerySpotCheckJobRuns(reqOptions, variants, start, end)
  BigQueryProvider-->>SpotCheckJobsMiddleware: []SpotCheckGroup
  SpotCheckJobsMiddleware->>SpotCheckJobsMiddleware: filter by variantsMatch, build synthetic TestStatus
  SpotCheckJobsMiddleware-->>Generator: sampleStatusCh ← map[syntheticTestID]TestStatus

  Generator->>Generator: generateComponentTestReport(testKey, cellReport)
  Generator->>AnalysisMiddlewareChain: Analyze(testKey, &cellReport)
  alt spotcheck synthetic test ID
    AnalysisMiddlewareChain->>SpotCheckJobsMiddleware: Analyze(testKey, testStats)
    SpotCheckJobsMiddleware-->>AnalysisMiddlewareChain: handled=true, ReportStatus=run-count heuristic
  else regular test
    AnalysisMiddlewareChain->>AnalysisMiddlewareChain: NewTestPassRate / AllTestsPassRate / FisherExact
  end
  AnalysisMiddlewareChain-->>Generator: (handled, err)
  Generator-->>Client: ComponentReport with spot-check cells
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • openshift/sippy#3617: Introduces the SpotCheckComponent/SpotCheckCapability variant schema and sets JobTier=spotcheck-30d in the variant registry — the exact variant dimensions that this PR's BigQuery spot-check queries JOIN on and filter by.

Suggested reviewers

  • deads2k
  • sosiouxme
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

@dgoodwin dgoodwin changed the title spot check jobs Proposed Implementation for Spot Check Jobs Jun 11, 2026
@openshift-ci openshift-ci Bot requested review from neisw and stbenjam June 11, 2026 17:36
@openshift-ci

openshift-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgoodwin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 11, 2026
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (7)
sippy-ng/src/component_readiness/CompReadyTestDetailRow.js (1)

1-271: ⚡ Quick win

Run ESLint and Prettier to ensure code consistency.

After making changes to frontend files, run the following commands to maintain code quality:

npx eslint . --fix
npx prettier --write .
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@sippy-ng/src/component_readiness/CompReadyTestDetailRow.js` around lines 1 -
271, Run ESLint and Prettier to fix formatting and lint issues introduced in
CompReadyTestDetailRow.js: execute "npx eslint . --fix" and "npx prettier
--write ." then review and stage the resulting changes (focus on formatting and
lint fixes around the CompReadyTestDetailRow component and helper functions like
getJobRunColor, isMassFailure, testJobDetailCell, and getSelectingCheckBox),
resolve any remaining lint errors reported (e.g., unused imports or console
statements), and commit the auto-fixed files.

Source: Coding guidelines

docs/plans/spot-check-jobs.md (1)

80-85: 💤 Low value

Add language specifier to fenced code block.

The code block at line 80 is missing a language identifier. Specify text or leave it as plain markdown to silence the linter.

📝 Proposed fix
+```text
 candidate → spotcheck
               ↑
     add pattern to setSpotCheckComponent in ocp.go
+```
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/plans/spot-check-jobs.md` around lines 80 - 85, The fenced code block in
docs/plans/spot-check-jobs.md is missing a language specifier; update the block
around the snippet referencing "candidate → spotcheck" and the note "add pattern
to setSpotCheckComponent in ocp.go" by adding a language tag (e.g., ```text) on
the opening fence so the linter recognizes it as a text block. Ensure the
closing triple backticks remain present and that the content mentioning
setSpotCheckComponent and ocp.go is unchanged.
pkg/api/componentreadiness/dataprovider/bigquery/provider.go (1)

514-542: 💤 Low value

Type assertions on raw BigQuery values can panic if types are unexpected.

The type assertions (e.g., val.(string), val.(int64)) will panic if BigQuery returns a different type than expected. While nil is checked, type mismatches are not handled. Consider using type assertion with ok check, or use a typed struct like QuerySpotCheckJobRunDetails does on lines 627-633.

🛡️ Example defensive pattern
-		if val, ok := rawRow["spot_check_component"]; ok && val != nil {
-			group.Component = val.(string)
+		if val, ok := rawRow["spot_check_component"]; ok && val != nil {
+			if s, ok := val.(string); ok {
+				group.Component = s
+			}
 		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/api/componentreadiness/dataprovider/bigquery/provider.go` around lines
514 - 542, The current code assigns values from rawRow into a
dataprovider.SpotCheckGroup using unchecked type assertions (e.g., val.(string),
val.(int64), val.([]bigquery.Value), val.(time.Time)) which can panic on
unexpected types; update the parsing in the block that builds group (the
variable group, rawRow map access, the loop over columnGroupByVariants.List())
to perform safe type assertions (v, ok := val.(string) / int64 /
[]bigquery.Value / time.Time) or use type switches and only set fields when
assertions succeed, converting numeric types safely to int and iterating
elements defensively for JobNames; optionally centralize this into a helper like
parseSpotCheckRow(rawRow) to mirror the safer pattern used by
QuerySpotCheckJobRunDetails.
pkg/api/componentreadiness/utils/queryparamparser.go (1)

59-71: 💤 Low value

Consider extracting spot-check resolution into a shared helper.

The spot-check sample resolution logic (constructing spotCheckRelative and calling GetViewReleaseOptions) is duplicated across these two files. Extracting it into a helper function in pkg/api/componentreadiness/utils would reduce duplication and make future changes easier.

♻️ Possible helper function structure
// In pkg/api/componentreadiness/utils/utils.go or similar:

// ResolveSpotCheckSample resolves a spot check sample from view configuration
// to absolute start/end times using the provided releases and rounding parameters.
func ResolveSpotCheckSample(
    releases []v1.Release,
    sampleReleaseName string,
    spotCheckConfig *crview.SpotCheckSample,
    roundingFactor, roundingOffset time.Duration,
) (*reqopts.Release, error) {
    if spotCheckConfig == nil {
        return nil, nil
    }
    
    spotCheckRelative := reqopts.RelativeRelease{
        Release:       reqopts.Release{Name: sampleReleaseName},
        RelativeStart: spotCheckConfig.RelativeStart,
        RelativeEnd:   spotCheckConfig.RelativeEnd,
    }
    resolved, err := GetViewReleaseOptions(
        releases, "spot_check", spotCheckRelative, roundingFactor, roundingOffset)
    if err != nil {
        return nil, err
    }
    return &resolved, nil
}

Then both call sites become:

if view.SpotCheckSample != nil {
    resolved, err := utils.ResolveSpotCheckSample(
        releases, view.SampleRelease.Name, view.SpotCheckSample, 
        crTimeRoundingFactor, crTimeRoundingOffset)
    if err != nil {
        // handle error
    }
    opts.SpotCheckSample = resolved
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/api/componentreadiness/utils/queryparamparser.go` around lines 59 - 71,
The spot-check resolution block (constructing spotCheckRelative and calling
GetViewReleaseOptions) is duplicated; extract it into a helper like
ResolveSpotCheckSample in pkg/api/componentreadiness/utils so both call sites
reuse it. Implement ResolveSpotCheckSample to accept releases,
sampleReleaseName, view.SpotCheckSample, crTimeRoundingFactor and
crTimeRoundingOffset, return either a resolved reqopts.Release (or nil) and
error, and replace the inline code that builds spotCheckRelative and calls
GetViewReleaseOptions with a call to ResolveSpotCheckSample that assigns
opts.SpotCheckSample and propagates any error.
pkg/api/componentreadiness/middleware/newtestpassrate/newtestpassrate.go (1)

52-54: ⚡ Quick win

Add comment explaining the MissingBasis status logic.

The condition at lines 52-53 has subtle semantics: when a new test shows NotSignificant via pass-rate analysis but PassRateRequiredAllTests is disabled, the status is changed to MissingBasis to avoid incorrectly reporting "no regression" for a test that lacks historical baseline data. A brief inline comment would clarify this intent for future maintainers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/api/componentreadiness/middleware/newtestpassrate/newtestpassrate.go`
around lines 52 - 54, Add an inline comment above the condition that mutates
testStats.ReportStatus to crtest.MissingBasis to explain the semantics: when a
test's pass-rate analysis yields crtest.NotSignificant but historical baseline
checking is disabled (opts.PassRateRequiredAllTests == 0), we treat the result
as missing basis rather than "no regression" to avoid false negatives; reference
the symbols testStats.ReportStatus, crtest.NotSignificant,
opts.PassRateRequiredAllTests, and crtest.MissingBasis in the comment so future
maintainers understand why the status is changed.
pkg/api/componentreadiness/middleware/spotcheckjobs/spotcheckjobs.go (1)

277-292: ⚡ Quick win

Clarify comment about component case handling.

The comment at lines 286-288 states "we need title case for component" but the code at line 289 returns component without any case conversion (it's already lowercase from parts[1]).

The code appears correct for matching the BigQuery query's COALESCE fallback, but the comment is misleading. Either:

  1. Update the comment to remove "title case for component" since the component is used as-is (lowercase), or
  2. If title case is actually required, add component = strings.Title(component) at line 289
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/api/componentreadiness/middleware/spotcheckjobs/spotcheckjobs.go` around
lines 277 - 292, The comment in componentCapabilityFromTestID is misleading
about component casing — update the function's comment to remove the "we need
title case for component" phrase and state that the component is returned as-is
(lowercased) while the capability is converted from dash-separated to
space-separated for BigQuery matching; if title case is actually required
instead, modify the function to convert component (parts[1]) to title case
(e.g., via strings.Title or equivalent) and update the comment accordingly.
pkg/api/componentreadiness/middleware/fisherexact/fisherexact.go (1)

147-153: 💤 Low value

Document or make configurable the extreme regression threshold.

The getRegressionStatus function uses a hardcoded 0.15 (15% pass-rate drop) threshold to distinguish between ExtremeRegression and SignificantRegression. This policy decision should either be:

  1. Documented with an inline comment explaining why 15% is the appropriate threshold, or
  2. Made configurable via reqopts.Advanced so it can be tuned per-release or per-view
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/api/componentreadiness/middleware/fisherexact/fisherexact.go` around
lines 147 - 153, The hardcoded 0.15 threshold in getRegressionStatus should be
configurable and documented: add a constant DefaultExtremeRegressionThreshold =
0.15 with an inline comment, change getRegressionStatus to accept an
extremeThreshold float64 (or read reqopts.Advanced.ExtremeRegressionThreshold if
that options struct is available), use that threshold instead of 0.15, and
update all callers to pass the configured value (falling back to
DefaultExtremeRegressionThreshold when reqopts.Advanced is nil or the field is
zero); also add a short comment in getRegressionStatus explaining the threshold
semantics.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/api/componentreadiness/middleware/alltestspassrate/alltestspassrate.go`:
- Around line 41-51: Add unit tests for AllTestsPassRate.Analyze covering the
conditional branch and delegation to analysis.BuildPassRateTestStats: create
tests that (1) set reqOptions.AdvancedOption.PassRateRequiredAllTests == 0 and
assert Analyze returns (false, nil) and does not modify testStats; (2) set
PassRateRequiredAllTests > 0 with testdetails.TestComparison representing a
passing case and assert Analyze returns (true, nil) and that
testStats.Status/Significance reflect NotSignificant; and (3) set
PassRateRequiredAllTests > 0 with a failing case and assert Analyze returns
(true, nil) and that testStats reflects a regression status; use
crtest.Identification zero value for the id param and validate that
analysis.BuildPassRateTestStats effects on testdetails.TestComparison are
applied (you can stub or construct expected testStats values). Ensure tests
reference AllTestsPassRate, Analyze,
reqOptions.AdvancedOption.PassRateRequiredAllTests,
analysis.BuildPassRateTestStats and testdetails.TestComparison.

In `@pkg/api/componentreadiness/middleware/fisherexact/fisherexact.go`:
- Around line 46-137: Add comprehensive unit tests for the FisherExact.Analyze
method to cover all conditional branches and policy decisions: create
table-driven tests that construct testdetails.TestComparison inputs (including
BaseStats, SampleStats, RequiredConfidence, PityAdjustment) and
FisherExact.reqOptions.AdvancedOption settings to exercise missing-data (sample
total 0 with IgnoreMissing true/false), minimum-failure triage (MinimumFailure
threshold), improved performance (sample pass rate > base with fisherExactTest
returning significant), degraded within-pity (drop <= PityFactor →
NotSignificant), degraded beyond-pity (fisherExactTest significant producing
SignificantRegression or ExtremeRegression depending on drop < 15% vs >= 15%),
and confidence threshold behavior (RequiredConfidence overrides option). For
each case assert returned bool/error, TestComparison.ReportStatus, FisherExact
value, and Explanations; mock or control fisherExactTest outputs where needed to
deterministically simulate significance.

In `@pkg/api/componentreadiness/middleware/newtestpassrate/newtestpassrate.go`:
- Around line 41-56: Add unit tests exercising NewTestPassRate.Analyze covering
all branches: (1) when reqOptions.AdvancedOption.PassRateRequiredNewTests == 0
it should return handled=false; (2) when testStats.BaseStats != nil and
BaseStats.Total()>0 it should return handled=false; (3) for a new test
(BaseStats nil) with passing samples assert analysis.BuildPassRateTestStats
yields ReportStatus == crtest.NotSignificant; (4) for a new test with failing
samples assert BuildPassRateTestStats results in a regression status (e.g.,
SignificantRegression/ExtremeRegression); and (5) when ReportStatus remains
crtest.NotSignificant and PassRateRequiredAllTests == 0 assert Analyze sets
ReportStatus to crtest.MissingBasis. Create test fixtures for
testdetails.TestComparison inputs, stub or exercise
analysis.BuildPassRateTestStats as appropriate, and assert the returned bool and
testStats.ReportStatus for each case.

---

Nitpick comments:
In `@docs/plans/spot-check-jobs.md`:
- Around line 80-85: The fenced code block in docs/plans/spot-check-jobs.md is
missing a language specifier; update the block around the snippet referencing
"candidate → spotcheck" and the note "add pattern to setSpotCheckComponent in
ocp.go" by adding a language tag (e.g., ```text) on the opening fence so the
linter recognizes it as a text block. Ensure the closing triple backticks remain
present and that the content mentioning setSpotCheckComponent and ocp.go is
unchanged.

In `@pkg/api/componentreadiness/dataprovider/bigquery/provider.go`:
- Around line 514-542: The current code assigns values from rawRow into a
dataprovider.SpotCheckGroup using unchecked type assertions (e.g., val.(string),
val.(int64), val.([]bigquery.Value), val.(time.Time)) which can panic on
unexpected types; update the parsing in the block that builds group (the
variable group, rawRow map access, the loop over columnGroupByVariants.List())
to perform safe type assertions (v, ok := val.(string) / int64 /
[]bigquery.Value / time.Time) or use type switches and only set fields when
assertions succeed, converting numeric types safely to int and iterating
elements defensively for JobNames; optionally centralize this into a helper like
parseSpotCheckRow(rawRow) to mirror the safer pattern used by
QuerySpotCheckJobRunDetails.

In `@pkg/api/componentreadiness/middleware/fisherexact/fisherexact.go`:
- Around line 147-153: The hardcoded 0.15 threshold in getRegressionStatus
should be configurable and documented: add a constant
DefaultExtremeRegressionThreshold = 0.15 with an inline comment, change
getRegressionStatus to accept an extremeThreshold float64 (or read
reqopts.Advanced.ExtremeRegressionThreshold if that options struct is
available), use that threshold instead of 0.15, and update all callers to pass
the configured value (falling back to DefaultExtremeRegressionThreshold when
reqopts.Advanced is nil or the field is zero); also add a short comment in
getRegressionStatus explaining the threshold semantics.

In `@pkg/api/componentreadiness/middleware/newtestpassrate/newtestpassrate.go`:
- Around line 52-54: Add an inline comment above the condition that mutates
testStats.ReportStatus to crtest.MissingBasis to explain the semantics: when a
test's pass-rate analysis yields crtest.NotSignificant but historical baseline
checking is disabled (opts.PassRateRequiredAllTests == 0), we treat the result
as missing basis rather than "no regression" to avoid false negatives; reference
the symbols testStats.ReportStatus, crtest.NotSignificant,
opts.PassRateRequiredAllTests, and crtest.MissingBasis in the comment so future
maintainers understand why the status is changed.

In `@pkg/api/componentreadiness/middleware/spotcheckjobs/spotcheckjobs.go`:
- Around line 277-292: The comment in componentCapabilityFromTestID is
misleading about component casing — update the function's comment to remove the
"we need title case for component" phrase and state that the component is
returned as-is (lowercased) while the capability is converted from
dash-separated to space-separated for BigQuery matching; if title case is
actually required instead, modify the function to convert component (parts[1])
to title case (e.g., via strings.Title or equivalent) and update the comment
accordingly.

In `@pkg/api/componentreadiness/utils/queryparamparser.go`:
- Around line 59-71: The spot-check resolution block (constructing
spotCheckRelative and calling GetViewReleaseOptions) is duplicated; extract it
into a helper like ResolveSpotCheckSample in pkg/api/componentreadiness/utils so
both call sites reuse it. Implement ResolveSpotCheckSample to accept releases,
sampleReleaseName, view.SpotCheckSample, crTimeRoundingFactor and
crTimeRoundingOffset, return either a resolved reqopts.Release (or nil) and
error, and replace the inline code that builds spotCheckRelative and calls
GetViewReleaseOptions with a call to ResolveSpotCheckSample that assigns
opts.SpotCheckSample and propagates any error.

In `@sippy-ng/src/component_readiness/CompReadyTestDetailRow.js`:
- Around line 1-271: Run ESLint and Prettier to fix formatting and lint issues
introduced in CompReadyTestDetailRow.js: execute "npx eslint . --fix" and "npx
prettier --write ." then review and stage the resulting changes (focus on
formatting and lint fixes around the CompReadyTestDetailRow component and helper
functions like getJobRunColor, isMassFailure, testJobDetailCell, and
getSelectingCheckBox), resolve any remaining lint errors reported (e.g., unused
imports or console statements), and commit the auto-fixed files.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 183bc474-56cf-4f22-9ec0-c05022cdf713

📥 Commits

Reviewing files that changed from the base of the PR and between 775169c and 05bf3f0.

📒 Files selected for processing (33)
  • config/views.yaml
  • docs/plans/spot-check-jobs.md
  • pkg/api/componentreadiness/component_report.go
  • pkg/api/componentreadiness/component_report_test.go
  • pkg/api/componentreadiness/dataprovider/bigquery/provider.go
  • pkg/api/componentreadiness/dataprovider/interface.go
  • pkg/api/componentreadiness/dataprovider/postgres/provider.go
  • pkg/api/componentreadiness/middleware/alltestspassrate/alltestspassrate.go
  • pkg/api/componentreadiness/middleware/analysis/passrate.go
  • pkg/api/componentreadiness/middleware/fisherexact/fisherexact.go
  • pkg/api/componentreadiness/middleware/interface.go
  • pkg/api/componentreadiness/middleware/linkinjector/linkinjector.go
  • pkg/api/componentreadiness/middleware/list.go
  • pkg/api/componentreadiness/middleware/newtestpassrate/newtestpassrate.go
  • pkg/api/componentreadiness/middleware/regressionallowances/regressionallowances.go
  • pkg/api/componentreadiness/middleware/regressiontracker/regressiontracker.go
  • pkg/api/componentreadiness/middleware/releasefallback/releasefallback.go
  • pkg/api/componentreadiness/middleware/spotcheckjobs/spotcheckjobs.go
  • pkg/api/componentreadiness/middleware/spotcheckjobs/spotcheckjobs_test.go
  • pkg/api/componentreadiness/test_details.go
  • pkg/api/componentreadiness/utils/queryparamparser.go
  • pkg/apis/api/componentreport/crtest/types.go
  • pkg/apis/api/componentreport/crview/types.go
  • pkg/apis/api/componentreport/reqopts/types.go
  • pkg/bigquery/bqlabel/labels.go
  • pkg/dataloader/regressioncacheloader/regressioncacheloader.go
  • pkg/variantregistry/ocp.go
  • pkg/variantregistry/ocp_test.go
  • pkg/variantregistry/snapshot.yaml
  • sippy-ng/src/component_readiness/CompReadyTestDetailRow.js
  • sippy-ng/src/component_readiness/CompReadyTestPanel.js
  • sippy-ng/src/component_readiness/CompReadyUtils.js
  • sippy-ng/src/component_readiness/TestDetailsReport.js

Comment on lines +41 to +51
// Analyze claims all tests when PassRateRequiredAllTests is configured, applying a raw
// pass rate comparison instead of Fisher's exact test.
func (a *AllTestsPassRate) Analyze(_ crtest.Identification, testStats *testdetails.TestComparison) (bool, error) {
opts := a.reqOptions.AdvancedOption
if opts.PassRateRequiredAllTests == 0 {
return false, nil
}

analysis.BuildPassRateTestStats(testStats, float64(opts.PassRateRequiredAllTests), opts)
return true, nil
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add unit tests for the Analyze method.

The Analyze method, while simpler than newtestpassrate, still contains conditional logic and delegates to analysis.BuildPassRateTestStats. As per coding guidelines, new Go functions and methods should have corresponding unit tests.

Consider adding test cases:

  1. PassRateRequiredAllTests == 0 (should return false, not handle)
  2. PassRateRequiredAllTests > 0 with passing test (should return true, set NotSignificant)
  3. PassRateRequiredAllTests > 0 with failing test (should return true, set regression status)

As per coding guidelines: "New or modified functionality must include test coverage: New Go functions and methods should have corresponding unit tests."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/api/componentreadiness/middleware/alltestspassrate/alltestspassrate.go`
around lines 41 - 51, Add unit tests for AllTestsPassRate.Analyze covering the
conditional branch and delegation to analysis.BuildPassRateTestStats: create
tests that (1) set reqOptions.AdvancedOption.PassRateRequiredAllTests == 0 and
assert Analyze returns (false, nil) and does not modify testStats; (2) set
PassRateRequiredAllTests > 0 with testdetails.TestComparison representing a
passing case and assert Analyze returns (true, nil) and that
testStats.Status/Significance reflect NotSignificant; and (3) set
PassRateRequiredAllTests > 0 with a failing case and assert Analyze returns
(true, nil) and that testStats reflects a regression status; use
crtest.Identification zero value for the id param and validate that
analysis.BuildPassRateTestStats effects on testdetails.TestComparison are
applied (you can stub or construct expected testStats values). Ensure tests
reference AllTestsPassRate, Analyze,
reqOptions.AdvancedOption.PassRateRequiredAllTests,
analysis.BuildPassRateTestStats and testdetails.TestComparison.

Source: Coding guidelines

Comment on lines +46 to +137
// Analyze is the catch-all analysis middleware. It always claims the test and performs
// Fisher's exact test to determine regression significance.
func (f *FisherExact) Analyze(_ crtest.Identification, testStats *testdetails.TestComparison) (bool, error) {
logger := log.WithField("middleware", "FisherExact")
opts := f.reqOptions.AdvancedOption

if testStats.RequiredConfidence == 0 {
testStats.RequiredConfidence = opts.Confidence
}

fisherExactVal := 0.0
testStats.Comparison = crtest.FisherExact

status := crtest.MissingBasis
if testStats.SampleStats.Total() == 0 {
if opts.IgnoreMissing {
status = crtest.NotSignificant
} else {
status = crtest.MissingSample
}
testStats.ReportStatus = status
testStats.FisherExact = thrift.Float64Ptr(0.0)
testStats.Explanations = append(testStats.Explanations, analysis.ExplanationNoRegression)
return true, nil
} else if testStats.BaseStats != nil && testStats.BaseStats.Total() != 0 {
samplePass := testStats.SampleStats.Passes(opts.FlakeAsFailure)
basePass := testStats.BaseStats.Passes(opts.FlakeAsFailure)
basisPassPercentage := float64(basePass) / float64(testStats.BaseStats.Total())
effectivePityFactor := float64(opts.PityFactor) + testStats.PityAdjustment
effectiveMinimumFailure := opts.MinimumFailure

status = crtest.NotSignificant

samplePassPercentage := float64(samplePass) / float64(testStats.SampleStats.Total())

if effectiveMinimumFailure != 0 &&
(testStats.SampleStats.Total()-samplePass) < effectiveMinimumFailure {
if status <= crtest.SignificantTriagedRegression {
testStats.Explanations = append(testStats.Explanations,
fmt.Sprintf("%s regression detected.", crtest.StringForStatus(status)))
}
testStats.ReportStatus = status
testStats.FisherExact = thrift.Float64Ptr(0.0)
return true, nil
}
significant := false
improved := samplePassPercentage >= basisPassPercentage

if improved {
significant, fisherExactVal = fisherExactTest(testStats.RequiredConfidence, testStats.BaseStats.Total()-basePass, basePass, testStats.SampleStats.Total()-samplePass, samplePass)
} else if basisPassPercentage-samplePassPercentage > effectivePityFactor/100 {
significant, fisherExactVal = fisherExactTest(testStats.RequiredConfidence, testStats.SampleStats.Total()-samplePass, samplePass, testStats.BaseStats.Total()-basePass, basePass)
}
logger.Debugf("computed Fisher info: signifcant: %v, fisherExact: %v", significant, fisherExactVal)
if significant {
if improved {
status = crtest.SignificantImprovement
} else {
status = getRegressionStatus(basisPassPercentage, samplePassPercentage)
}
}
}
logger.Debugf("computed status: %d", int(status))
testStats.ReportStatus = status
testStats.FisherExact = thrift.Float64Ptr(fisherExactVal)

baseRelease := "no basis"
if testStats.BaseStats != nil {
baseRelease = testStats.BaseStats.Release
}
if testStats.ReportStatus <= crtest.SignificantTriagedRegression {
logger.Debugf("regression detected against: %s", baseRelease)

if testStats.ReportStatus <= crtest.SignificantRegression {
testStats.Explanations = append(testStats.Explanations,
fmt.Sprintf("%s regression detected.", crtest.StringForStatus(testStats.ReportStatus)))
testStats.Explanations = append(testStats.Explanations,
fmt.Sprintf("Fishers Exact probability of a regression: %.2f%%.", float64(100)-*testStats.FisherExact))
testStats.Explanations = append(testStats.Explanations,
fmt.Sprintf("Test pass rate dropped from %.2f%% to %.2f%%.",
testStats.BaseStats.SuccessRate*float64(100),
testStats.SampleStats.SuccessRate*float64(100)))
} else {
testStats.Explanations = append(testStats.Explanations,
fmt.Sprintf("%s regression detected.", crtest.StringForStatus(testStats.ReportStatus)))
}
} else {
logger.Debugf("NO regression detected against: %s", baseRelease)
}

return true, nil
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Add comprehensive unit tests for the Fisher exact analysis logic.

The Analyze method is the most complex middleware with multiple conditional branches and policy decisions. As per coding guidelines, new functions with non-trivial behavior should have unit tests.

The method should be tested with cases covering:

  1. Missing data scenarios: sample=0 with IgnoreMissing true/false → NotSignificant/MissingSample
  2. Minimum failure triage: degraded but below MinimumFailure threshold → NotSignificant
  3. Improved performance: sample pass rate > base → SignificantImprovement when Fisher test passes
  4. Degraded performance within pity: drop ≤ pity factor → NotSignificant
  5. Degraded performance beyond pity: Fisher significant + drop < 15% → SignificantRegression
  6. Extreme degradation: Fisher significant + drop ≥ 15% → ExtremeRegression
  7. Confidence requirement: Fisher test respects RequiredConfidence parameter

As per coding guidelines: "New or modified functionality must include test coverage: New Go functions and methods should have corresponding unit tests."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/api/componentreadiness/middleware/fisherexact/fisherexact.go` around
lines 46 - 137, Add comprehensive unit tests for the FisherExact.Analyze method
to cover all conditional branches and policy decisions: create table-driven
tests that construct testdetails.TestComparison inputs (including BaseStats,
SampleStats, RequiredConfidence, PityAdjustment) and
FisherExact.reqOptions.AdvancedOption settings to exercise missing-data (sample
total 0 with IgnoreMissing true/false), minimum-failure triage (MinimumFailure
threshold), improved performance (sample pass rate > base with fisherExactTest
returning significant), degraded within-pity (drop <= PityFactor →
NotSignificant), degraded beyond-pity (fisherExactTest significant producing
SignificantRegression or ExtremeRegression depending on drop < 15% vs >= 15%),
and confidence threshold behavior (RequiredConfidence overrides option). For
each case assert returned bool/error, TestComparison.ReportStatus, FisherExact
value, and Explanations; mock or control fisherExactTest outputs where needed to
deterministically simulate significance.

Source: Coding guidelines

Comment on lines +41 to +56
// Analyze claims tests that have no base stats (new tests) when PassRateRequiredNewTests is configured.
func (n *NewTestPassRate) Analyze(_ crtest.Identification, testStats *testdetails.TestComparison) (bool, error) {
opts := n.reqOptions.AdvancedOption
if opts.PassRateRequiredNewTests == 0 {
return false, nil
}
if testStats.BaseStats != nil && testStats.BaseStats.Total() > 0 {
return false, nil
}

analysis.BuildPassRateTestStats(testStats, float64(opts.PassRateRequiredNewTests), opts)
if testStats.ReportStatus == crtest.NotSignificant && opts.PassRateRequiredAllTests == 0 {
testStats.ReportStatus = crtest.MissingBasis
}
return true, nil
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add unit tests for the Analyze method.

The Analyze method contains non-trivial conditional logic with multiple code paths:

  • Early return when PassRateRequiredNewTests is 0
  • Early return when BaseStats exists
  • Call to analysis.BuildPassRateTestStats
  • Conditional override to MissingBasis status

As per coding guidelines, new Go functions with non-trivial behavior should have corresponding unit tests. Consider adding test cases covering:

  1. PassRateRequiredNewTests == 0 (should not handle)
  2. BaseStats present (should not handle)
  3. New test with passing sample → NotSignificant
  4. New test with failing sample → SignificantRegression/ExtremeRegression
  5. Status override to MissingBasis when PassRateRequiredAllTests == 0

As per coding guidelines: "New or modified functionality must include test coverage: New Go functions and methods should have corresponding unit tests."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/api/componentreadiness/middleware/newtestpassrate/newtestpassrate.go`
around lines 41 - 56, Add unit tests exercising NewTestPassRate.Analyze covering
all branches: (1) when reqOptions.AdvancedOption.PassRateRequiredNewTests == 0
it should return handled=false; (2) when testStats.BaseStats != nil and
BaseStats.Total()>0 it should return handled=false; (3) for a new test
(BaseStats nil) with passing samples assert analysis.BuildPassRateTestStats
yields ReportStatus == crtest.NotSignificant; (4) for a new test with failing
samples assert BuildPassRateTestStats results in a regression status (e.g.,
SignificantRegression/ExtremeRegression); and (5) when ReportStatus remains
crtest.NotSignificant and PassRateRequiredAllTests == 0 assert Analyze sets
ReportStatus to crtest.MissingBasis. Create test fixtures for
testdetails.TestComparison inputs, stub or exercise
analysis.BuildPassRateTestStats as appropriate, and assert the returned bool and
testStats.ReportStatus for each case.

Source: Coding guidelines

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/api/componentreadiness/middleware/fisherexact/fisherexact_test.go (1)

191-202: ⚡ Quick win

Use strings.Contains for substring search.

The manual substring search implementation reimplements strings.Contains. As per coding guidelines, favor clarity over cleverness and prefer standard library functions.

♻️ Proposed refactor
+import (
+	"strings"
+)
+
 			if tt.explanationContains != "" {
-				assert.True(t, func() bool {
-					for _, e := range tt.inputStats.Explanations {
-						if len(e) >= len(tt.explanationContains) {
-							for i := 0; i <= len(e)-len(tt.explanationContains); i++ {
-								if e[i:i+len(tt.explanationContains)] == tt.explanationContains {
-									return true
-								}
-							}
-						}
-					}
-					return false
-				}(), "expected an explanation containing %q, got: %v", tt.explanationContains, tt.inputStats.Explanations)
+				found := false
+				for _, e := range tt.inputStats.Explanations {
+					if strings.Contains(e, tt.explanationContains) {
+						found = true
+						break
+					}
+				}
+				assert.True(t, found, "expected an explanation containing %q, got: %v", tt.explanationContains, tt.inputStats.Explanations)
 			}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/api/componentreadiness/middleware/fisherexact/fisherexact_test.go` around
lines 191 - 202, Replace the manual substring search in the test assertion with
strings.Contains: instead of the anonymous loop over tt.inputStats.Explanations
comparing slices, call strings.Contains(e, tt.explanationContains) for each
explanation e (and ensure the test imports the "strings" package). Update the
assertion to assert that any e in tt.inputStats.Explanations satisfies
strings.Contains(e, tt.explanationContains) and keep the existing failure
message and parameters (tt.explanationContains, tt.inputStats.Explanations).

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/api/componentreadiness/middleware/fisherexact/fisherexact_test.go`:
- Around line 191-202: Replace the manual substring search in the test assertion
with strings.Contains: instead of the anonymous loop over
tt.inputStats.Explanations comparing slices, call strings.Contains(e,
tt.explanationContains) for each explanation e (and ensure the test imports the
"strings" package). Update the assertion to assert that any e in
tt.inputStats.Explanations satisfies strings.Contains(e, tt.explanationContains)
and keep the existing failure message and parameters (tt.explanationContains,
tt.inputStats.Explanations).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 3de19042-3be6-4114-b3dc-ce92770ed6b8

📥 Commits

Reviewing files that changed from the base of the PR and between 05bf3f0 and c98c197.

📒 Files selected for processing (7)
  • pkg/api/componentreadiness/middleware/alltestspassrate/alltestspassrate_test.go
  • pkg/api/componentreadiness/middleware/fisherexact/fisherexact_test.go
  • pkg/api/componentreadiness/middleware/newtestpassrate/newtestpassrate_test.go
  • pkg/variantregistry/ocp.go
  • pkg/variantregistry/ocp_test.go
  • pkg/variantregistry/snapshot.go
  • pkg/variantregistry/snapshot.yaml
✅ Files skipped from review due to trivial changes (1)
  • pkg/api/componentreadiness/middleware/newtestpassrate/newtestpassrate_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • pkg/variantregistry/ocp_test.go
  • pkg/variantregistry/ocp.go
  • pkg/variantregistry/snapshot.yaml

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e

dgoodwin and others added 5 commits June 12, 2026 08:25
The transition period is over: all spot-check jobs now carry
SpotCheckComponent and SpotCheckCapability variants in the job_variants
table. Remove the COALESCE job-name-based fallbacks and drop 'rare' from
the JobTier filter in both spot-check queries.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/api/componentreadiness/dataprovider/bigquery/provider.go (1)

401-408: ⚠️ Potential issue | 🔴 Critical

Use parameterized queries to prevent SQL injection from variant values.

The variant name value v is inserted directly into the SQL string via fmt.Sprintf without parameterization. Although table aliases use param.Cleanse(v) for safety, the raw v is used in the filter condition (AND jv_%s.variant_name = '%s'), which creates an SQL injection risk. Refactor to use parameterized queries or placeholders instead of string concatenation.

This applies to: lines 401-408, 435-451, 455-468

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/api/componentreadiness/dataprovider/bigquery/provider.go` around lines
401 - 408, The variant value `v` is being inserted directly into the SQL string
via fmt.Sprintf without parameterization, creating an SQL injection
vulnerability. Although table aliases use param.Cleanse(v) for safety, the raw
`v` is used directly in the filter condition within the joinVariants string
building logic. Refactor the BigQuery provider to use parameterized queries or
placeholder-based query construction instead of string concatenation for the
variant values, ensuring all SQL injection risks from user-provided variant data
are eliminated throughout the variant filtering and selection logic.
🧹 Nitpick comments (1)
pkg/api/componentreadiness/dataprovider/bigquery/provider.go (1)

520-524: 💤 Low value

Type assertions on BigQuery values could panic on schema mismatch.

The type assertions (e.g., val.(string), val.(int64)) will panic if BigQuery returns an unexpected type. While the query schema makes this unlikely in practice, adding type checks would improve robustness.

Example defensive pattern
 		if val, ok := rawRow["spot_check_component"]; ok && val != nil {
-			group.Component = val.(string)
+			if s, ok := val.(string); ok {
+				group.Component = s
+			}
 		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/api/componentreadiness/dataprovider/bigquery/provider.go` around lines
520 - 524, The type assertions for BigQuery values (like `val.(string)`) can
panic if the database returns an unexpected type. Replace the unsafe type
assertions with type checks that validate the value is the expected type before
assignment. For each field assignment in the rawRow map access (such as those
for spot_check_component and spot_check_capability), use a secondary return
value from the type assertion to verify it succeeded, and only assign the value
if the type check passes. Consider logging a warning or skipping the field
assignment if the type check fails rather than allowing a panic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/api/componentreadiness/middleware/spotcheckjobs/spotcheckjobs.go`:
- Around line 303-313: The componentCapabilityFromTestID function is performing
a lossy transformation on the capability token by converting hyphens to spaces
in the capability field assignment. This causes capabilities with hyphens (like
`etcd-scaling`) to be converted to different values (like `etcd scaling`), which
then fail to match in QuerySpotCheckJobRunDetails. Remove the strings.ReplaceAll
call on parts[2] and return the capability token exactly as extracted from the
test ID without any character transformation.

In `@pkg/api/componentreadiness/utils/queryparamparser.go`:
- Around line 59-75: Move the spot-check sample resolution loop that iterates
over view.SpotCheckJobSamples to a position after URL sample overrides
(sampleRelease, sampleStartTime, sampleEndTime) are applied in the function.
When constructing the spotCheckRelative object for GetViewReleaseOptions, use
opts.SampleRelease.Name instead of view.SampleRelease.Name to ensure spot-check
samples are resolved using the final, override-adjusted release name and time
context, not the original values. This ensures spot-check queries use the same
release/time values as the main sample after all overrides are finalized.

---

Outside diff comments:
In `@pkg/api/componentreadiness/dataprovider/bigquery/provider.go`:
- Around line 401-408: The variant value `v` is being inserted directly into the
SQL string via fmt.Sprintf without parameterization, creating an SQL injection
vulnerability. Although table aliases use param.Cleanse(v) for safety, the raw
`v` is used directly in the filter condition within the joinVariants string
building logic. Refactor the BigQuery provider to use parameterized queries or
placeholder-based query construction instead of string concatenation for the
variant values, ensuring all SQL injection risks from user-provided variant data
are eliminated throughout the variant filtering and selection logic.

---

Nitpick comments:
In `@pkg/api/componentreadiness/dataprovider/bigquery/provider.go`:
- Around line 520-524: The type assertions for BigQuery values (like
`val.(string)`) can panic if the database returns an unexpected type. Replace
the unsafe type assertions with type checks that validate the value is the
expected type before assignment. For each field assignment in the rawRow map
access (such as those for spot_check_component and spot_check_capability), use a
secondary return value from the type assertion to verify it succeeded, and only
assign the value if the type check passes. Consider logging a warning or
skipping the field assignment if the type check fails rather than allowing a
panic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: e872c935-4d79-4b2b-9808-0380ce4410a2

📥 Commits

Reviewing files that changed from the base of the PR and between 0b71bbe and 8893bd3.

📒 Files selected for processing (12)
  • config/views.yaml
  • pkg/api/componentreadiness/component_report.go
  • pkg/api/componentreadiness/dataprovider/bigquery/provider.go
  • pkg/api/componentreadiness/dataprovider/interface.go
  • pkg/api/componentreadiness/dataprovider/postgres/provider.go
  • pkg/api/componentreadiness/middleware/spotcheckjobs/spotcheckjobs.go
  • pkg/api/componentreadiness/middleware/spotcheckjobs/spotcheckjobs_test.go
  • pkg/api/componentreadiness/test_details.go
  • pkg/api/componentreadiness/utils/queryparamparser.go
  • pkg/apis/api/componentreport/crview/types.go
  • pkg/apis/api/componentreport/reqopts/types.go
  • pkg/dataloader/regressioncacheloader/regressioncacheloader.go
🚧 Files skipped from review as they are similar to previous changes (5)
  • pkg/api/componentreadiness/dataprovider/postgres/provider.go
  • pkg/api/componentreadiness/dataprovider/interface.go
  • pkg/api/componentreadiness/middleware/spotcheckjobs/spotcheckjobs_test.go
  • pkg/api/componentreadiness/component_report.go
  • pkg/api/componentreadiness/test_details.go

Comment thread pkg/api/componentreadiness/utils/queryparamparser.go Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/api/componentreadiness/dataprovider/bigquery/querygenerators.go`:
- Around line 399-401: The JOIN clause in the queryGenerator is using the
cleansed variant value (cleanV) in the WHERE condition for variant_name
comparison, which can cause rows to be missed when Cleanse() modifies the
original value. Fix this by keeping cleanV only for the table alias portions
(jv_cleanV), and use the original uncleansed parameter value v for the actual
variant_name matching predicate. This ensures the database comparison uses the
exact original key value, not an altered version.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 4654cf90-4e64-4b3f-84e7-04a8513f953e

📥 Commits

Reviewing files that changed from the base of the PR and between 8893bd3 and a912e88.

📒 Files selected for processing (7)
  • apm.lock.yaml
  • config/views.yaml
  • pkg/api/componentreadiness/component_report.go
  • pkg/api/componentreadiness/dataprovider/bigquery/provider.go
  • pkg/api/componentreadiness/dataprovider/bigquery/querygenerators.go
  • pkg/api/componentreadiness/test_details.go
  • pkg/api/componentreadiness/utils/queryparamparser.go
✅ Files skipped from review due to trivial changes (1)
  • apm.lock.yaml
🚧 Files skipped from review as they are similar to previous changes (5)
  • pkg/api/componentreadiness/utils/queryparamparser.go
  • config/views.yaml
  • pkg/api/componentreadiness/dataprovider/bigquery/provider.go
  • pkg/api/componentreadiness/test_details.go
  • pkg/api/componentreadiness/component_report.go

Comment on lines +399 to +401
cleanV := param.Cleanse(v)
joinVariants += fmt.Sprintf("LEFT JOIN %s.job_variants jv_%s ON junit_data.variant_registry_job_name = jv_%s.job_name AND jv_%s.variant_name = '%s'\n",
client.Dataset, v, v, v, v)
client.Dataset, cleanV, cleanV, cleanV, cleanV)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep the join predicate on the original variant key, not the cleansed alias key.

Line 400 now compares jv_*.variant_name to cleanV, which can silently miss rows when param.Cleanse(v) alters the key. Sanitize only the alias, and bind the original v as a query parameter for variant_name matching.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/api/componentreadiness/dataprovider/bigquery/querygenerators.go` around
lines 399 - 401, The JOIN clause in the queryGenerator is using the cleansed
variant value (cleanV) in the WHERE condition for variant_name comparison, which
can cause rows to be missed when Cleanse() modifies the original value. Fix this
by keeping cleanV only for the table alias portions (jv_cleanV), and use the
original uncleansed parameter value v for the actual variant_name matching
predicate. This ensures the database comparison uses the exact original key
value, not an altered version.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e

@openshift-ci

openshift-ci Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

@dgoodwin: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@dgoodwin dgoodwin changed the title Proposed Implementation for Spot Check Jobs OCPQE-32065: Proposed Implementation for Spot Check Jobs Jun 22, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 22, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 22, 2026

Copy link
Copy Markdown

@dgoodwin: This pull request references OCPQE-32065 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

From the next-gen QE effort, we are looking to implement spot check jobs, jobs which are much less sensitive, have an owning component/capability, test a specific install configuration or stable feature, once in awhile, to confirm it still works. The goal is for them to run just once a month, be periodically retried if they fail every day or so up to perhaps four attempts, and after roughly 2 failures the component goes regressed in component readiness. One pass is all we require.

This is intended to provide a mechanism to land the many jobs QE has for obscure configurations, and for feature jobs that ran often for GA promotion to continue producing signal without a lot of cost. Examples today would include etcd-scaling job (which has never passed, but this is due to monitortests, spot check jobs will not run failable monitortests: openshift/origin#31212)

This PR adds the mechanism to have them mark as regressions in component readiness.

This attempts something we talked about for a long time, what if component readiness could show signal for other things.

Middleware concept pays off as this stays fairly localized to middleware that gets it's chance to inject the synthetic test results and handle all drilldown/reports.

Jobs must have capabilities/components for this system, I did this via the variant registry.

This replaces the admittedly terrible prior attempt, rarely run jobs: #3562

Implications:

  • This uses synthetic test IDs. The PR proves we can make this work in component readiness, but any downstream tooling that were to try to use that test ID in another system is going to find no results.

Summary by CodeRabbit

  • New Features

  • Added configurable spot-check job analysis windows to component readiness reports (including automatic default spot-check-30d when not explicitly set).

  • Added Spot-Check comparison output and drill-down for underlying spot-check job runs.

  • Introduced additional statistical analysis for regression detection (Fisher’s exact and pass-rate based).

  • Updated the UI to render spot-check mode with spot-check–specific layout and labels.

  • Refactor

  • Migrated per-test analysis from generator logic to a middleware-driven flow, enabling targeted analyses per scenario.

  • Bug Fixes

  • Improved report generation to continue when per-test detail inputs are partially missing; spot-check tests no longer require all baseline variants to be present.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants