Batch BigQuery label fetching and skip GCS for cached job runs by mstaeble · Pull Request #3685 · openshift/sippy

mstaeble · 2026-06-24T17:12:14Z

Summary

Replace per-job BigQuery label queries with a single bulk prefetch before the worker loop, eliminating thousands of individual BQ round-trips per load cycle
Move the prowJobRunCache check before the GCS FindAllMatches listing so already-processed runs skip the expensive object listing entirely
Keep createOrUpdateProwJob before the cache check to ensure job definitions (variants, release) stay current

Evidence from production logs

Production fetchdata runs (hourly, June 24 2026) show that the prow loader spends 25-29 minutes processing 16-18K jobs from BigQuery, but only 2-3% actually need GCS processing. The rest are duplicates from the
12-hour lookback overlap.

The current code flow in prowJobToJobRun fetches GCS artifacts (path resolution, bucket client, JUnit file matching) before checking the in-memory cache. This means ~3,500 jobs per run go through full GCS
I/O only to be discarded as already processed.

Metric	14:00 run	15:03 run	16:08 run
Jobs from BigQuery	16,368	17,292	18,644
Skipped by in-memory cache (before `prowJobToJobRun`)	~12,604 (77%)	~13,444 (78%)	~14,699 (79%)
GCS fetched then found already processed	3,297 (20%)	3,327 (19%)	3,537 (19%)
GCS fetched and actually needed	467 (3%)	521 (3%)	408 (2%)
Prow loader time	24m 31s	25m 18s	28m 49s

Moving the cache check before the GCS fetch eliminates ~3,500 unnecessary GCS round-trips per run (~19% of total jobs).

Evidence from staging

Deployed the PR image to staging and ran two load cycles against the staging database.

Run 1 (cold start, empty staging DB):

205,482 jobs from BigQuery
Bulk label prefetch: 205K build IDs returned 4,636 labels in 12.7 seconds (single BQ query)
Processed 29,705 of 205K jobs in ~43 minutes before being stopped (all jobs required full GCS processing since the DB was empty)
7,291 new job runs inserted

Run 2 (warm start, 7K runs already cached):

21,353 jobs from BigQuery
Bulk label prefetch: 21K build IDs returned 363 labels in 2.75 seconds
Of 17,041 jobs processed (at time of observation), only 2,977 (17%) needed GCS processing
14,064 jobs (83%) skipped GCS listing entirely via the early cache check
Cache-hit jobs processed at the rate of thousands per second (no I/O)

Label verification:

Staging database shows 55,667 of 544,340 job runs have labels applied
Most recent labeled runs (June 24) show correct values (e.g., ImagePullNeverCompletes, TestFailureDuringHighCPUEvents)

Test plan

go vet ./pkg/dataloader/prowloader/... passes
go test ./pkg/dataloader/prowloader/... passes
go build ./cmd/sippy/... compiles
Verify in staging that bulk label prefetch logs show a single BQ query with count and duration
Verify cached job runs skip GCS listing (83% of jobs skipped in warm run)
Verify newly imported job runs still have correct labels in the database

🤖 Generated with Claude Code

The prow loader was making an individual BigQuery query per job run to fetch labels, resulting in thousands of round-trips during each load cycle. Replace with a single bulk query before the worker loop. Also move the prowJobRunCache check before the GCS FindAllMatches listing so already-processed runs skip the expensive object listing entirely. The createOrUpdateProwJob call remains before the cache check to keep job definitions current. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

openshift-merge-bot · 2026-06-24T17:12:17Z

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

coderabbitai · 2026-06-24T17:12:24Z

Warning

Review limit reached

@mstaeble, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 52 minutes and 53 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: d621747b-c714-49ce-916c-01c6d94f2341

📥 Commits

Reviewing files that changed from the base of the PR and between 09af781 and cb42fa3.

📒 Files selected for processing (1)

pkg/dataloader/prowloader/prow.go

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

openshift-ci · 2026-06-24T17:13:30Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mstaeble
Once this PR has been reviewed and has the lgtm label, please assign dgoodwin for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-merge-bot · 2026-06-24T17:25:56Z

Scheduling required tests:
/test e2e

openshift-ci · 2026-06-24T17:40:41Z

@mstaeble: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci Bot requested review from deads2k and xueqzhan June 24, 2026 17:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Batch BigQuery label fetching and skip GCS for cached job runs#3685

Batch BigQuery label fetching and skip GCS for cached job runs#3685
mstaeble wants to merge 1 commit into
openshift:mainfrom
mstaeble:worktree-batch-bq-labels

mstaeble commented Jun 24, 2026 •

edited

Loading

Uh oh!

openshift-merge-bot Bot commented Jun 24, 2026

Uh oh!

coderabbitai Bot commented Jun 24, 2026

Review limit reached

Uh oh!

openshift-ci Bot commented Jun 24, 2026

Uh oh!

openshift-merge-bot Bot commented Jun 24, 2026

Uh oh!

openshift-ci Bot commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mstaeble commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Evidence from production logs

Evidence from staging

Test plan

Uh oh!

openshift-merge-bot Bot commented Jun 24, 2026

Uh oh!

coderabbitai Bot commented Jun 24, 2026

Review limit reached

Uh oh!

openshift-ci Bot commented Jun 24, 2026

Uh oh!

openshift-merge-bot Bot commented Jun 24, 2026

Uh oh!

openshift-ci Bot commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mstaeble commented Jun 24, 2026 •

edited

Loading