Skip to content

NO-JIRA: Exclude KubeJobFailed for periodic-gathering jobs in openshift-insights#30810

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
petr-muller:allowlist-insights-periodic-gathering-kubejobfailed
Feb 26, 2026
Merged

NO-JIRA: Exclude KubeJobFailed for periodic-gathering jobs in openshift-insights#30810
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
petr-muller:allowlist-insights-periodic-gathering-kubejobfailed

Conversation

@petr-muller
Copy link
Member

@petr-muller petr-muller commented Feb 25, 2026

Mitigates impact of OCPBUGS-77314 / CCXDEV-16087.

Summary

  • The periodic-gathering jobs in the openshift-insights namespace can fail because of the cluster-external API failures, causing KubeJobFailed alerts that make the "shouldn't report any alerts in firing state" e2e test fail.
  • Uses a PromQL unless clause to exclude specifically KubeJobFailed alerts from openshift-insights namespace where job_name matches periodic-gathering-.*, while still catching any other unexpected alerts.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Tests
    • Refined alert-exclusion logic in Prometheus test queries to more precisely ignore specific transient job alerts, with explanatory comments and a temporary TODO for future reversion.

@openshift-ci-robot
Copy link

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@coderabbitai
Copy link

coderabbitai bot commented Feb 25, 2026

Walkthrough

A Prometheus test in test/extended/prometheus/prometheus.go changed alert-matching logic: the previous exclusion set was replaced by a PromQL unless clause that conditionally excludes KubeJobFailed alerts originating from periodic-gathering jobs in openshift-insights.

Changes

Cohort / File(s) Summary
Prometheus test
test/extended/prometheus/prometheus.go
Replaced a simple alert-name exclusion matcher with a PromQL unless clause to exclude KubeJobFailed alerts from periodic-gathering jobs in openshift-insights; added explanatory comments and a TODO.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Test Structure And Quality ⚠️ Warning Test assertion at line 885 lacks meaningful failure message, violating criterion 4 requiring diagnostic context in assertions. Add failure message to assertion: change o.Expect(err).NotTo(o.HaveOccurred()) to o.Expect(err).NotTo(o.HaveOccurred(), "failed to verify no unexpected Prometheus alerts are firing")
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: excluding KubeJobFailed alerts for periodic-gathering jobs in openshift-insights, which matches the core modification in the PromQL test expression.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Stable And Deterministic Test Names ✅ Passed All test names in the modified prometheus.go file use stable and deterministic strings without dynamic information, with PromQL query modifications correctly placed in test bodies.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 25, 2026
@petr-muller
Copy link
Member Author

/hold

Should not be needed anymore, just wanted to have a record

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 25, 2026
@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@neisw
Copy link
Contributor

neisw commented Feb 26, 2026

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 26, 2026
The periodic-gathering jobs in the openshift-insights namespace can
transiently fail, causing KubeJobFailed alerts that make the
"shouldn't report any alerts in firing state" e2e test fail.

Use a PromQL `unless` clause to exclude specifically KubeJobFailed
alerts from openshift-insights namespace where job_name matches
periodic-gathering-*, while still catching any other unexpected
alerts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@petr-muller petr-muller force-pushed the allowlist-insights-periodic-gathering-kubejobfailed branch from da20cdd to 18f455a Compare February 26, 2026 12:29
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Feb 26, 2026
@petr-muller petr-muller changed the title Exclude KubeJobFailed for periodic-gathering jobs in openshift-insights NO-JIRA: Exclude KubeJobFailed for periodic-gathering jobs in openshift-insights Feb 26, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 26, 2026
@openshift-ci-robot
Copy link

@petr-muller: This pull request explicitly references no jira issue.

Details

In response to this:

Summary

  • The periodic-gathering jobs in the openshift-insights namespace can transiently fail, causing KubeJobFailed alerts that make the "shouldn't report any alerts in firing state" e2e test fail.
  • Uses a PromQL unless clause to exclude specifically KubeJobFailed alerts from openshift-insights namespace where job_name matches periodic-gathering-.*, while still catching any other unexpected alerts.
  • This is the most targeted exclusion possible: only this specific alert + namespace + job name pattern is excluded. Any other KubeJobFailed (different namespace/job) or any other alert in openshift-insights will still fail the test.

Test plan

  • Verify the modified PromQL query compiles and is syntactically valid
  • Run the "shouldn't report any alerts in firing state" test on a cluster with a failing periodic-gathering job in openshift-insights and confirm it no longer causes a test failure
  • Verify that other KubeJobFailed alerts (in different namespaces) still cause the test to fail

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Tests
  • Updated Prometheus alert exclusion logic in test queries to refine alert handling for specific monitoring scenarios.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@neisw
Copy link
Contributor

neisw commented Feb 26, 2026

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 26, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 26, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: neisw, petr-muller

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@petr-muller
Copy link
Member Author

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 26, 2026
@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@petr-muller
Copy link
Member Author

No signs of trouble in the jobs

/verified by CI

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Feb 26, 2026
@openshift-ci-robot
Copy link

@petr-muller: This PR has been marked as verified by CI.

Details

In response to this:

No signs of trouble in the jobs

/verified by CI

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-trt
Copy link

openshift-trt bot commented Feb 26, 2026

Job Failure Risk Analysis for sha: 18f455a

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6 IncompleteTests
Tests for this run (25) are below the historical average (2350): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@petr-muller
Copy link
Member Author

petr-muller commented Feb 26, 2026

/override ci/prow/e2e-aws-ovn-fips

The job actually failed due to TRT-2560

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 26, 2026

@petr-muller: Overrode contexts on behalf of petr-muller: ci/prow/e2e-aws-ovn-fips

Details

In response to this:

/override ci/prow/e2e-aws-ovn-fips

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@petr-muller
Copy link
Member Author

/override ci/prow/e2e-gcp-ovn-upgrade ci/prow/e2e-metal-ipi-ovn-ipv6

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 26, 2026

@petr-muller: Overrode contexts on behalf of petr-muller: ci/prow/e2e-gcp-ovn-upgrade, ci/prow/e2e-metal-ipi-ovn-ipv6

Details

In response to this:

/override ci/prow/e2e-gcp-ovn-upgrade ci/prow/e2e-metal-ipi-ovn-ipv6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 26, 2026

@petr-muller: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit e88b30e into openshift:main Feb 26, 2026
21 checks passed
@petr-muller petr-muller deleted the allowlist-insights-periodic-gathering-kubejobfailed branch February 27, 2026 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants