Skip to content

fix(central): improve compliance scan watcher diagnostic logging#20903

Draft
guzalv wants to merge 2 commits into
masterfrom
worktree-improve-compliance-watcher-logging
Draft

fix(central): improve compliance scan watcher diagnostic logging#20903
guzalv wants to merge 2 commits into
masterfrom
worktree-improve-compliance-watcher-logging

Conversation

@guzalv
Copy link
Copy Markdown
Contributor

@guzalv guzalv commented Jun 1, 2026

Description

Improves diagnostic logging across the compliance scan watcher system to enable faster triage of scan timeouts in customer environments.

Changes:

  • Scan watcher timeout/cancel logs now include received vs expected check result counts and watcher ID
  • Scan config watcher timeout/cancel logs now include received vs expected scan counts, watcher ID, and list of pending scans
  • handleReadyScanConfig log promoted from Debug to Info and now includes the error state
  • DeleteOldResultsFromMissingScans now logs which scans are being cleaned up (scan name, scanRefId, cluster ID) and short-circuits when no scans are missing
  • Added scanName() nil-safe accessor that falls back to watcher ID when Scan is nil (can happen when a watcher is created by check result events but never receives a scan event)

Why: The existing timeout log ("Timeout waiting for the scan X to finish") gives no indication of how close the scan was to completing. Knowing received/expected count (e.g., 0/150 vs 148/150) immediately narrows the root cause: missing annotation, lost results, or check-count mismatch.

Before/After comparison (verified on cluster ga-acp)

Scan watcher timeout:

# BEFORE (master)
Warn: Timeout waiting for the scan ocp4-cis to finish

# AFTER (this PR)
Warn: Timeout waiting for the scan ocp4-cis to finish. Received 0/0 check results (watcher id: c08b8ce5-...:1f895818-...)

Scan config watcher timeout:

# BEFORE (master)
Warn: Timeout waiting for the ScanConfiguration test-multi-scan's scans to finish

# AFTER (this PR)
Warn: Timeout waiting for the ScanConfiguration test-multi-scan's scans to finish. Received 2/3 scan results (watcher id: 4dc31086-..., pending: [c08b8ce5-...:9e90b4ba-...])

handleReadyScanConfig completion:

# BEFORE (master) — Debug level, not visible at default log level

# AFTER (this PR) — Info level
Info: Scan config test-multi-scan completed with 2 scans, 0 reports, error: scan config watcher timed out

DeleteOldResultsFromMissingScans:

# BEFORE (master) — no logging

# AFTER (this PR)
Info: Deleting old check results from 1 scans that did not report results for scan config test-multi-scan (missing: [c08b8ce5-...:9e90b4ba-...])
Info: Deleting all check results for missing scan ocp4-cis-node-worker (scanRefId: c27c94f6-..., cluster: c08b8ce5-...)

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • added unit tests
  • added e2e tests
  • added regression tests
  • added compatibility tests
  • modified existing tests

How I validated my change

  • Logging-only change, no behavioral impact
  • Existing unit tests pass
  • Verified on real cluster ga-acp: deployed master baseline, captured BEFORE logs during scan watcher timeout (2 min) and scan config watcher timeout (3 min), then deployed PR branch and captured AFTER logs under the same conditions. All 4 improvements confirmed visible — see Before/After comparison above.

Timeout and cancellation log messages now include received vs expected
counts and watcher IDs. DeleteOldResultsFromMissingScans now logs which
scans are being cleaned up and their cluster/scanRefId. These diagnostics
are needed to triage scan watcher timeouts in customer environments.

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Jun 1, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 1, 2026

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: improving diagnostic logging in the compliance scan watcher system.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description is comprehensive and well-structured, following the template with detailed sections covering changes, rationale, validation, and documentation status.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch worktree-improve-compliance-watcher-logging

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@central/complianceoperator/v2/report/manager/watcher/scanwatcher.go`:
- Around line 274-275: The log call in Stop may dereference s.scanResults.Scan
which is nil until handleScan sets it; update Stop (or the logging site) to
guard access by checking if s.scanResults.Scan != nil before calling
GetScanName(), and use a safe fallback (e.g. "<unknown>" or empty string) when
nil; locate NewScanWatcher, Stop, handleScan and the scanResults struct/field to
implement the guard and ensure no other code paths call GetScanName() without a
nil check.
- Around line 285-286: The timeout warning in scan watcher can nil-dereference
because s.scanResults.Scan may still be nil if the timeout fires before
handleScan initializes it. Update the timeout path in scanwatcher.go to guard
access to s.scanResults.Scan before calling GetScanName(), and fall back to a
safe message when the scan is not yet set. Use the existing scanResults/Scan
fields in the timeout logging block to keep the fix localized and consistent
with the stop-path nil-safety handling.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 284e3443-6aa6-4fd9-8aa9-b39932ebc45f

📥 Commits

Reviewing files that changed from the base of the PR and between ca9a346 and 122699e.

📒 Files selected for processing (3)
  • central/complianceoperator/v2/report/manager/manager_impl.go
  • central/complianceoperator/v2/report/manager/watcher/scanconfigwatcher.go
  • central/complianceoperator/v2/report/manager/watcher/scanwatcher.go

Comment thread central/complianceoperator/v2/report/manager/watcher/scanwatcher.go Outdated
Comment thread central/complianceoperator/v2/report/manager/watcher/scanwatcher.go Outdated
Add scanName() helper that falls back to watcher ID when Scan is nil.
This can happen when a watcher is created by a check result event but
never receives a scan event before timing out.

Addresses CodeRabbit review feedback on PR #20903.

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

🚀 Build Images Ready

Images are ready for commit 4bd0677. To use with deploy scripts:

export MAIN_IMAGE_TAG=4.12.x-65-g4bd0677249

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant