Batch BigQuery label fetching and skip GCS for cached job runs#3685
Batch BigQuery label fetching and skip GCS for cached job runs#3685mstaeble wants to merge 1 commit into
Conversation
The prow loader was making an individual BigQuery query per job run to fetch labels, resulting in thousands of round-trips during each load cycle. Replace with a single bulk query before the worker loop. Also move the prowJobRunCache check before the GCS FindAllMatches listing so already-processed runs skip the expensive object listing entirely. The createOrUpdateProwJob call remains before the cache check to keep job definitions current. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
Warning Review limit reached
More reviews will be available in 52 minutes and 53 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: mstaeble The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Scheduling required tests: |
|
@mstaeble: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
prowJobRunCachecheck before the GCSFindAllMatcheslisting so already-processed runs skip the expensive object listing entirelycreateOrUpdateProwJobbefore the cache check to ensure job definitions (variants, release) stay currentEvidence from production logs
Production fetchdata runs (hourly, June 24 2026) show that the prow loader spends 25-29 minutes processing 16-18K jobs from BigQuery, but only 2-3% actually need GCS processing. The rest are duplicates from the
12-hour lookback overlap.
The current code flow in
prowJobToJobRunfetches GCS artifacts (path resolution, bucket client, JUnit file matching) before checking the in-memory cache. This means ~3,500 jobs per run go through full GCSI/O only to be discarded as already processed.
prowJobToJobRun)Moving the cache check before the GCS fetch eliminates ~3,500 unnecessary GCS round-trips per run (~19% of total jobs).
Evidence from staging
Deployed the PR image to staging and ran two load cycles against the staging database.
Run 1 (cold start, empty staging DB):
Run 2 (warm start, 7K runs already cached):
Label verification:
ImagePullNeverCompletes,TestFailureDuringHighCPUEvents)Test plan
go vet ./pkg/dataloader/prowloader/...passesgo test ./pkg/dataloader/prowloader/...passesgo build ./cmd/sippy/...compiles🤖 Generated with Claude Code