OCPBUGS-86221: Update the well-known readiness controller to properly set cluster-operator conditions by everettraven · Pull Request #901 · openshift/cluster-authentication-operator

everettraven · 2026-05-27T17:19:28Z

As part of investigating OCPBUGS-86221 it was found that the likely culprit was the well-known readiness controller setting Progressing=True when a kube-apiserver endpoint was not yet accessible during a node reboot being done by the MCO.

This PR attempts to resolve this by aligning the status conditions that are set by this controller to better align with the status condition type meanings as outlined in https://github.com/openshift/api/blob/09730f85d8835712c0412a585685ddadb01e171b/config/v1/types_cluster_operator.go#L151.

To facilitate this, the following changes were made:

Available gets set to True when any of the kube-apiservers returns a valid response for the well-known endpoint.
Progressing is replaced by DegradationObserved when any of the kube-apiservers is not sufficiently serving their well-known endpoint. This controller has no operands, making Progressing a condition it should not be setting.
Degraded is set after 5 minutes of a given DegradationObserved state continuing to be observed.

To ensure the behavior is working as expected, additional unit tests were added to exercise this new behavior.

Summary by CodeRabbit

New Features
- Introduced time-based controller degradation tracking and reporting for clearer health visibility
- Refined well-known endpoint readiness checks with improved aggregation and error classification
Bug Fixes / Behavior Changes
- Replaced previous "progressing" condition with a "degradation observed" condition for readiness reporting
Tests
- Added comprehensive unit and end-to-end tests covering endpoint scenarios and degradation timeouts

openshift-ci-robot · 2026-05-27T17:19:35Z

@everettraven: This pull request references Jira Issue OCPBUGS-86221, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (5.0.0) matches configured target version for branch (5.0.0)
bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

As part of investigating OCPBUGS-86221 it was found that the likely culprit was the well-known readiness controller setting Progressing=True when a kube-apiserver endpoint was not yet accessible during a node reboot being done by the MCO.

This PR attempts to resolve this by aligning the status conditions that are set by this controller to better align with the status condition type meanings as outlined in https://github.com/openshift/api/blob/09730f85d8835712c0412a585685ddadb01e171b/config/v1/types_cluster_operator.go#L151.

To facilitate this, the following changes were made:

Available gets set to True when any of the kube-apiservers returns a valid response for the well-known endpoint.

Progressing is replaced by DegradationObserved when any of the kube-apiservers is not sufficiently serving their well-known endpoint. This controller has no operands, making Progressing a condition it should not be setting.

Degraded is set after 5 minutes of a given DegradationObserved state continuing to be observed.

To ensure the behavior is working as expected, additional unit tests were added to exercise this new behavior.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai · 2026-05-27T17:19:41Z

Walkthrough

This PR replaces the progressing-error model with ControllerDegradationObservedError, refactors well-known readiness to inject CA data and aggregate per-IP endpoint checks returning (available, error), updates controller status wiring to emit degradation-observed conditions for partial failures, and adds TLS-based tests exercising endpoint health and sync condition behavior.

Changes

Controller Degradation-Observed Error Model

Layer / File(s)	Summary
Remove ControllerProgressingError `pkg/controllers/common/conditions.go`	Deletes the old `ControllerProgressingError` type and related helpers; removes the `time` import.
ControllerDegradationObservedError type definition and tests `pkg/controllers/readiness/error.go`, `pkg/controllers/readiness/error_test.go`	Adds `ControllerDegradationObservedError` with reason, wrapped error, maxAge, `Error()/Unwrap()`, `ToCondition()`, `IsDegraded()`, and `ControllerDegradationObservedConditionName()`; updates unit tests to validate degraded timing logic.
Dependency injection and wiring `pkg/controllers/readiness/wellknown_ready_controller.go`	Introduces `oidcAvailabler` and `caDataGetter` interfaces, updates controller struct and constructor to accept injected auth checker and default CA-data getter.
isWellknownEndpointsReady endpoint checking refactor `pkg/controllers/readiness/wellknown_ready_controller.go`	Rewrites `isWellknownEndpointsReady` to return `(bool, error)`, use injected CA data, build TLS transport, check each API server IP, aggregate per-IP errors with `errors.Join`, and return degradation-observed on partial failures.
Sync method integration with degradation conditions `pkg/controllers/readiness/wellknown_ready_controller.go`	Updates deferred operator status to apply `WellKnownAvailable` plus the degradation-observed condition; consumes `(available, err)` results, unwraps/constructs `ControllerDegradationObservedError` into conditions, and classifies partial failures as available-with-degradation.
Test infrastructure for TLS endpoint testing `pkg/controllers/readiness/wellknown_ready_controller_test.go`	Adds crypto/TLS helpers, fake CA-data getter, sequential HTTP handler, TLS test server bootstrap, and Service/Endpoints fixtures for HTTPS endpoint tests.
Endpoint readiness and sync test coverage `pkg/controllers/readiness/wellknown_ready_controller_test.go`, `test/e2e-oidc/external_oidc_test.go`	Adds `Test_wellKnownReadyController_isWellknownEndpointsReady` and `Test_wellKnownReadyController_sync` table-driven tests covering healthy, partial, total failure, and stale-metadata scenarios; updates e2e expected condition name to `WellKnownReadyControllerDegradationObserved`.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 13 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality	⚠️ Warning	Tests lack context timeouts on operations that interact with the cluster (sync, endpoint checks) and sequentialHandler has unguarded index access that can panic with extra requests.	Add context.WithTimeout() wrapping around context.Background() in both test functions; fix sequentialHandler.ServeHTTP() to guard against index out of bounds as suggested in review comments.

✅ Passed checks (13 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately describes the main change: updating the well-known readiness controller to set proper cluster-operator conditions, moving from Progressing to DegradationObserved semantics. The title is specific, concise, and directly reflects the core objective.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	All test names are static and deterministic without dynamic values, UUIDs, timestamps, IPs, or generated suffixes.
Microshift Test Compatibility	✅ Passed	No new Ginkgo e2e tests were added. PR contains only standard Go unit tests and modifies existing e2e tests using the testing package, not Ginkgo.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	No new Ginkgo e2e tests were added. PR contains standard Go unit tests and updates to existing tests only.
Topology-Aware Scheduling Compatibility	✅ Passed	PR modifies only Go controller logic with no deployment manifests or scheduling constraints. No pod affinity, anti-affinity, topology spread constraints, or topology-dependent logic introduced.
Ote Binary Stdout Contract	✅ Passed	PR does not modify OTE binary or packages it imports. Changed files are controller code and non-OTE tests with no process-level stdout writes.
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	No new Ginkgo e2e tests were added in this PR. Tests added use standard Go testing package, not Ginkgo, so check does not apply.
No-Weak-Crypto	✅ Passed	No weak crypto algorithms found. Uses only modern crypto: ECDSA P256, RSA-256, TLS, X.509. No custom crypto or insecure secret comparisons.
Container-Privileges	✅ Passed	PR modifies only Go source files; no Kubernetes or container manifest files changed, so container-privilege check is not applicable.
No-Sensitive-Data-In-Logs	✅ Passed	No passwords, tokens, API keys, PII, or other sensitive data are exposed in logs. Internal cluster IPs in diagnostic messages are standard operator practice.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-05-27T17:20:42Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign gangwgr for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2026-05-27T17:21:17Z

@everettraven: This pull request references Jira Issue OCPBUGS-86221, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (5.0.0) matches configured target version for branch (5.0.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Details

In response to this:

As part of investigating OCPBUGS-86221 it was found that the likely culprit was the well-known readiness controller setting Progressing=True when a kube-apiserver endpoint was not yet accessible during a node reboot being done by the MCO.

This PR attempts to resolve this by aligning the status conditions that are set by this controller to better align with the status condition type meanings as outlined in https://github.com/openshift/api/blob/09730f85d8835712c0412a585685ddadb01e171b/config/v1/types_cluster_operator.go#L151.

To facilitate this, the following changes were made:

Available gets set to True when any of the kube-apiservers returns a valid response for the well-known endpoint.

Progressing is replaced by DegradationObserved when any of the kube-apiservers is not sufficiently serving their well-known endpoint. This controller has no operands, making Progressing a condition it should not be setting.

Degraded is set after 5 minutes of a given DegradationObserved state continuing to be observed.

To ensure the behavior is working as expected, additional unit tests were added to exercise this new behavior.

Summary by CodeRabbit

New Features

Enhanced controller degradation tracking with time-based monitoring windows

Improved operator condition reporting for better visibility into controller health status

Refined well-known endpoint readiness checks with comprehensive error handling

Tests

Added comprehensive end-to-end controller readiness testing scenarios

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

pkg/controllers/readiness/wellknown_ready_controller_test.go (1)

51-52: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Handle http.ResponseWriter.Write errors instead of discarding them.

Line 51 and Line 222 ignore Write errors, which violates the Go guideline and can mask handler failures in tests.

Proposed fix

 func (trt *testServer) ServeHTTP(w http.ResponseWriter, req *http.Request) {
 	if trt.responseDelay > 0 {
 		time.Sleep(trt.responseDelay)
 	}

 	w.WriteHeader(trt.sendStatus)
-	w.Write(trt.sendData)
+	if _, err := w.Write(trt.sendData); err != nil {
+		return
+	}
 }
@@
 func (h *sequentialHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
 	h.mu.Lock()
 	resp := h.responses[h.idx]
 	h.idx++
 	h.mu.Unlock()
 	w.WriteHeader(resp.status)
-	w.Write(resp.body)
+	if _, err := w.Write(resp.body); err != nil {
+		return
+	}
 }

As per coding guidelines, **/*.go: Go security (prodsec-skills): "Never ignore error returns".

Also applies to: 221-223

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controllers/readiness/wellknown_ready_controller_test.go` around lines 51
- 52, The test currently discards the error returned by
http.ResponseWriter.Write when writing trt.sendData; update the test to capture
that error and fail the test on non-nil (e.g., if _, err :=
w.Write(trt.sendData); err != nil { t.Fatalf("write failed: %v", err) } or use
require.NoError(t, err)). Locate the occurrences that write trt.sendData (the
Write call in the readiness test helper and the other similar write) and replace
the ignored writes with error-checked writes so any handler write failures cause
the test to fail.

🧹 Nitpick comments (2)

pkg/controllers/readiness/wellknown_ready_controller_test.go (1)

216-220: ⚡ Quick win

Guard sequentialHandler against out-of-range response indexing.

Line 218 can panic when requests exceed configured responses (e.g., retry behavior changes), making tests brittle.

Proposed fix

 func (h *sequentialHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
 	h.mu.Lock()
-	resp := h.responses[h.idx]
-	h.idx++
+	if len(h.responses) == 0 {
+		h.mu.Unlock()
+		w.WriteHeader(http.StatusInternalServerError)
+		return
+	}
+	resp := h.responses[len(h.responses)-1]
+	if h.idx < len(h.responses) {
+		resp = h.responses[h.idx]
+		h.idx++
+	}
 	h.mu.Unlock()
 	w.WriteHeader(resp.status)
 	if _, err := w.Write(resp.body); err != nil {
 		return
 	}
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controllers/readiness/wellknown_ready_controller_test.go` around lines
216 - 220, The ServeHTTP method on sequentialHandler currently indexes
h.responses by h.idx without bounds checking which can panic; update ServeHTTP
to acquire the mutex, check if h.idx is >= len(h.responses) (or len==0) before
accessing, and if out-of-range either cap h.idx to len(h.responses)-1 (so the
last configured response is reused) or return a safe default response (e.g.,
500) — then increment or leave the index appropriately; make this change inside
the sequentialHandler.ServeHTTP function to prevent panics when requests exceed
configured responses.

pkg/controllers/readiness/wellknown_ready_controller.go (1)

294-307: ⚡ Quick win

defer cancel() inside loop accumulates deferred calls.

The defer cancel() on line 296 runs inside the retry loop, causing deferred cancellations to accumulate until the function returns. Call cancel() explicitly after each iteration or use a closure.

♻️ Proposed fix

 	for i := 0; i < 3; i++ {
 		reqCtx, cancel := context.WithTimeout(ctx, 5*time.Second)
-		defer cancel()
 
 		req := req.WithContext(reqCtx)
 		resp, err = rt.RoundTrip(req)
 		if err != nil {
 			connError = fmt.Errorf("failed to GET kube-apiserver oauth endpoint %s: %w%s", wellKnown, err, wellKnownRoundtripErrorHint(err))
 			klog.V(2).Info(connError)
+			cancel()
 			continue
 		}
 		connError = nil
-		defer resp.Body.Close()
+		cancel()
 		break
 	}
+	if resp != nil {
+		defer resp.Body.Close()
+	}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controllers/readiness/wellknown_ready_controller.go` around lines 294 -
307, The loop creates a context with `reqCtx, cancel := context.WithTimeout(ctx,
5*time.Second)` and uses `defer cancel()` inside the retry loop causing deferred
cancels to accumulate; change this by removing `defer cancel()` and calling
`cancel()` explicitly at the end of each iteration (or wrap the loop body in an
anonymous func so `defer cancel()` is scoped per-iteration) where `reqCtx` and
`cancel` are created (symbols: `reqCtx`, `cancel`, `req.WithContext(reqCtx)`,
`rt.RoundTrip(req)`) to ensure the timeout context is released immediately after
each roundtrip attempt.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/controllers/readiness/error.go`:
- Around line 45-53: The IsDegraded method on ControllerDegradationObservedError
doesn't guard against a nil lastStatus and will panic when dereferencing
lastStatus.Conditions; update ControllerDegradationObservedError.IsDegraded to
return false immediately if the lastStatus parameter is nil before calling
v1helpers.FindOperatorCondition(…) or using
ControllerDegradationObservedConditionName(controllerName), ensuring callers
that pass a nil operator status are safely handled.

In `@pkg/controllers/readiness/wellknown_ready_controller.go`:
- Around line 195-203: The current type assertion on err for
*ControllerDegradationObservedError can miss wrapped errors (e.g., errors.Join),
so replace the direct assertion with errors.As to unwrap and find a
*ControllerDegradationObservedError instance; then call
degradationErr.IsDegraded(controllerName, operatorStatus) and return
degradationErr.Unwrap() when degraded, otherwise set degradationObserved =
degradationErr.ToCondition(controllerName) and return nil, preserving the
existing logic that used Unwrap, ToCondition, controllerName, operatorStatus and
degradationObserved.

---

Outside diff comments:
In `@pkg/controllers/readiness/wellknown_ready_controller_test.go`:
- Around line 51-52: The test currently discards the error returned by
http.ResponseWriter.Write when writing trt.sendData; update the test to capture
that error and fail the test on non-nil (e.g., if _, err :=
w.Write(trt.sendData); err != nil { t.Fatalf("write failed: %v", err) } or use
require.NoError(t, err)). Locate the occurrences that write trt.sendData (the
Write call in the readiness test helper and the other similar write) and replace
the ignored writes with error-checked writes so any handler write failures cause
the test to fail.

---

Nitpick comments:
In `@pkg/controllers/readiness/wellknown_ready_controller_test.go`:
- Around line 216-220: The ServeHTTP method on sequentialHandler currently
indexes h.responses by h.idx without bounds checking which can panic; update
ServeHTTP to acquire the mutex, check if h.idx is >= len(h.responses) (or
len==0) before accessing, and if out-of-range either cap h.idx to
len(h.responses)-1 (so the last configured response is reused) or return a safe
default response (e.g., 500) — then increment or leave the index appropriately;
make this change inside the sequentialHandler.ServeHTTP function to prevent
panics when requests exceed configured responses.

In `@pkg/controllers/readiness/wellknown_ready_controller.go`:
- Around line 294-307: The loop creates a context with `reqCtx, cancel :=
context.WithTimeout(ctx, 5*time.Second)` and uses `defer cancel()` inside the
retry loop causing deferred cancels to accumulate; change this by removing
`defer cancel()` and calling `cancel()` explicitly at the end of each iteration
(or wrap the loop body in an anonymous func so `defer cancel()` is scoped
per-iteration) where `reqCtx` and `cancel` are created (symbols: `reqCtx`,
`cancel`, `req.WithContext(reqCtx)`, `rt.RoundTrip(req)`) to ensure the timeout
context is released immediately after each roundtrip attempt.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1f880a7b-0e29-40d2-9a3c-6aab5fde03d6

📥 Commits

Reviewing files that changed from the base of the PR and between 67a3b6a and 9f9865b.

📒 Files selected for processing (3)

pkg/controllers/readiness/error.go
pkg/controllers/readiness/wellknown_ready_controller.go
pkg/controllers/readiness/wellknown_ready_controller_test.go

liouk · 2026-05-29T08:13:19Z

nit: Would it be worth replacing ioutil with io since the former is deprecated, since we're touching this file?

Done in the latest commit

liouk · 2026-05-29T08:18:38Z

+		}
+		degradationObserved = degradationObserved.
 			WithStatus(operatorv1.ConditionTrue)
-		if progressingErr, ok := err.(*common.ControllerProgressingError); ok {


nit: common.ControllerProgressingError doesn't look like it's used anymore; maybe worth cleaning up?

Ah, good catch. I didn't even check with it being in the common package - just assumed something else would be using it.

Done in the latest push.

liouk · 2026-05-29T08:18:53Z

 	available := applyoperatorv1.OperatorCondition().
 		WithType("WellKnownAvailable")
-	progressing := applyoperatorv1.OperatorCondition().
-		WithType(common.ControllerProgressingConditionName(controllerName))


nit: common.ControllerProgressingConditionName doesn't look like it's used anymore; maybe worth cleaning up?

Done in the latest push.

liouk · 2026-05-29T08:29:18Z

+		return false, errors.Join(errs...)
+	}
+
+	// TODO(everettraven): verify that this is no longer necessary.


Are you planning in verifying this in the scope of this PR, or at a later stage? To me it also looks like this is no longer necessary; if I understand the alreadyTrueOnce guard correctly, we won't need it any more because we are now going available when at least one endpoint is healthy, and if some are still rolling out, we're capturing that with the new partial-failure model.

If you're planning on revisiting this after this PR merges, maybe it'd be better to drop the code already and open an issue to track the verification.

I'd like to verify this within the scope of this PR by running some payload jobs to verify that this change won't trigger any regressions in existing tests (once I tackle additional work to take care of existing CI failures here)

…DegradationObserved' so that the WellKnownReadyController no longer uses the 'Progressing' suffixed condition type that causes the overall cluster operator 'Progressing' condition to be set to True. This prevents the WellKnownReadyController from flapping the overall 'Progressing' condition because it does not actually have any operands. Instead, to maintain the eventual setting of the Degraded condition, we replace it with WellKnownReadyControllerDegradationObserved and monitor that condition state to determine if we should flip the overall cluster operator to Degraded after some time. Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

…ast one well-known endpoint is available so that we align with the Available condition semantics that define availability as being accessible but can also be in a degraded state. Availability should not flap based on temporary degradation like a node going down for a reboot. Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

…ting implementation Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

@everettraven

…tsReady and sync methods. Implementation done/assisted by Claude Code and reviewed for accurracy/correctness by @everettraven Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

everettraven · 2026-05-29T19:33:26Z

/testwith openshift/cluster-authentication-operator/e2e-agnostic openshift/origin#31236

openshift-ci · 2026-05-29T19:33:30Z

@everettraven, testwith: Error processing request. ERROR:

could not determine job runs: requested job is invalid. needs to be formatted like: <org>/<repo>/<branch>/<variant?>/<job>. instead it was: openshift/cluster-authentication-operator/e2e-agnostic

everettraven · 2026-05-29T19:34:02Z

/testwith openshift/cluster-authentication-operator/master/e2e-agnostic openshift/origin#31236

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

pkg/controllers/readiness/wellknown_ready_controller.go (1)

195-203: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use errors.As instead of direct type assertion for degradation-observed errors.

err.(*ControllerDegradationObservedError) misses wrapped/joined cases (Line 255), so maxAge-based degradation gating can be skipped when all endpoint errors are joined. Use errors.As before deciding immediate return.

Suggested fix

-		if degradationErr, ok := err.(*ControllerDegradationObservedError); ok {
+		var degradationErr *ControllerDegradationObservedError
+		if errors.As(err, &degradationErr) {
 			if degradationErr.IsDegraded(controllerName, operatorStatus) {
 				return degradationErr.Unwrap()
 			}
 			degradationObserved = degradationErr.ToCondition(controllerName)
 			return nil

Also applies to: 254-256, 368-368

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controllers/readiness/wellknown_ready_controller.go` around lines 195 -
203, The code currently uses a direct type assertion
err.(*ControllerDegradationObservedError) which fails to detect wrapped
instances; replace those assertions with errors.As to properly handle
wrapped/joined errors: in the blocks that reference
ControllerDegradationObservedError (the branches that call
IsDegraded(controllerName, operatorStatus), ToCondition(controllerName) and
Unwrap()), use var degradationErr *ControllerDegradationObservedError; if
errors.As(err, &degradationErr) { ... } else { return err } so IsDegraded,
ToCondition and Unwrap are invoked on the extracted degradationErr and
degradationObserved is set the same way; apply this change to the three
occurrences that currently use the direct type assertion.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/controllers/readiness/wellknown_ready_controller_test.go`:
- Around line 216-220: The ServeHTTP implementation in sequentialHandler reads
h.responses[h.idx] without bounds checking which can panic if more requests
arrive than entries; update ServeHTTP (method ServeHTTP on type
sequentialHandler) to guard the index by checking h.idx against len(h.responses)
while holding h.mu and use a fallback response (e.g., the last response or a
default) when h.idx is out of range, incrementing/clamping h.idx appropriately
to keep behavior deterministic and avoid panics under extra requests.

---

Duplicate comments:
In `@pkg/controllers/readiness/wellknown_ready_controller.go`:
- Around line 195-203: The code currently uses a direct type assertion
err.(*ControllerDegradationObservedError) which fails to detect wrapped
instances; replace those assertions with errors.As to properly handle
wrapped/joined errors: in the blocks that reference
ControllerDegradationObservedError (the branches that call
IsDegraded(controllerName, operatorStatus), ToCondition(controllerName) and
Unwrap()), use var degradationErr *ControllerDegradationObservedError; if
errors.As(err, &degradationErr) { ... } else { return err } so IsDegraded,
ToCondition and Unwrap are invoked on the extracted degradationErr and
degradationObserved is set the same way; apply this change to the three
occurrences that currently use the direct type assertion.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3c2cb4a6-0db0-44b2-9b69-09e785682a43

📥 Commits

Reviewing files that changed from the base of the PR and between 9f9865b and dbdad61.

📒 Files selected for processing (6)

pkg/controllers/common/conditions.go
pkg/controllers/readiness/error.go
pkg/controllers/readiness/error_test.go
pkg/controllers/readiness/wellknown_ready_controller.go
pkg/controllers/readiness/wellknown_ready_controller_test.go
test/e2e-oidc/external_oidc_test.go

💤 Files with no reviewable changes (1)

pkg/controllers/common/conditions.go

🚧 Files skipped from review as they are similar to previous changes (1)

pkg/controllers/readiness/error.go

coderabbitai · 2026-05-29T19:47:25Z

+func (h *sequentialHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
+	h.mu.Lock()
+	resp := h.responses[h.idx]
+	h.idx++
+	h.mu.Unlock()


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard sequentialHandler response index to avoid panic.

If the handler gets more requests than responses, h.responses[h.idx] panics. Add a bounds fallback to keep tests deterministic.

Suggested fix

func (h *sequentialHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) { h.mu.Lock() - resp := h.responses[h.idx] - h.idx++ + idx := h.idx + if idx >= len(h.responses) { + idx = len(h.responses) - 1 + } + resp := h.responses[idx] + if h.idx < len(h.responses) { + h.idx++ + } h.mu.Unlock() w.WriteHeader(resp.status) if _, err := w.Write(resp.body); err != nil { return } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func (h *sequentialHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {

h.mu.Lock()

resp := h.responses[h.idx]

h.idx++

h.mu.Unlock()

func (h *sequentialHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {

h.mu.Lock()

idx := h.idx

if idx >= len(h.responses) {

idx = len(h.responses) - 1

}

resp := h.responses[idx]

if h.idx < len(h.responses) {

h.idx++

}

h.mu.Unlock()

w.WriteHeader(resp.status)

if _, err := w.Write(resp.body); err != nil {

return

}

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/controllers/readiness/wellknown_ready_controller_test.go` around lines 216 - 220, The ServeHTTP implementation in sequentialHandler reads h.responses[h.idx] without bounds checking which can panic if more requests arrive than entries; update ServeHTTP (method ServeHTTP on type sequentialHandler) to guard the index by checking h.idx against len(h.responses) while holding h.mu and use a fallback response (e.g., the last response or a default) when h.idx is out of range, incrementing/clamping h.idx appropriately to keep behavior deterministic and avoid panics under extra requests.

openshift-ci · 2026-05-29T23:21:09Z

@everettraven: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-agnostic	`dbdad61`	link	true	`/test e2e-agnostic`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels May 27, 2026

openshift-ci Bot requested review from flavianmissi and p0lyn0mial May 27, 2026 17:20

coderabbitai Bot reviewed May 27, 2026

View reviewed changes

Comment thread pkg/controllers/readiness/error.go

Comment thread pkg/controllers/readiness/wellknown_ready_controller.go

liouk reviewed May 29, 2026

View reviewed changes

everettraven mentioned this pull request May 29, 2026

OCPBUGS-86221: update authentication operator conditions to no longer expect WellKnownReadyProgressing openshift/origin#31236

Open

everettraven force-pushed the bugfix/wellknownready-status branch 2 times, most recently from b097b41 to 455035e Compare May 29, 2026 19:25

everettraven added 7 commits May 29, 2026 15:29

refactor: controllers/readiness: prefactoring for additional unit tes…

c413c40

…ting implementation Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

testing: controllers/readiness: add unit tests for isWellKnownEndpoin…

6aa73a0

…tsReady and sync methods. Implementation done/assisted by Claude Code and reviewed for accurracy/correctness by @everettraven Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

test: e2e-oidc: update expected conditions

dd9e7cc

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

common: remove unused common.ControllerProgressingError type and tests

5ee57da

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

chore: migrate deprecated ioutil package usage to io package

dbdad61

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

everettraven force-pushed the bugfix/wellknownready-status branch from 455035e to dbdad61 Compare May 29, 2026 19:30

coderabbitai Bot reviewed May 29, 2026

View reviewed changes

Conversation

everettraven commented May 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-ci-robot commented May 27, 2026

Uh oh!

coderabbitai Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (2 warnings)

Uh oh!

openshift-ci Bot commented May 27, 2026

Uh oh!

openshift-ci-robot commented May 27, 2026

Summary by CodeRabbit

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

everettraven commented May 29, 2026

Uh oh!

openshift-ci Bot commented May 29, 2026

Uh oh!

everettraven commented May 29, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

openshift-ci Bot commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

everettraven commented May 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 27, 2026 •

edited

Loading