Skip to content

OCPBUGS-33994: Don't unnecessarily re-reconcile when secrets are externally managed#1042

Open
dlom wants to merge 1 commit into
openshift:masterfrom
dlom:OCPBUGS-33994
Open

OCPBUGS-33994: Don't unnecessarily re-reconcile when secrets are externally managed#1042
dlom wants to merge 1 commit into
openshift:masterfrom
dlom:OCPBUGS-33994

Conversation

@dlom

@dlom dlom commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

xref: OCPBUGS-33994

When the cloud-credential-operator is used in manual mode, and awsSTSIAMRoleARN is not present in a credentialRequest (such as core operators), it continually attempts to reconcile. This started happening when we began filtering the secrets. Because the manually managed secrets do not have the proper annotation, cloud-credential-operator is no longer able to see them. As a result, when attempting to see if the credentialRequest needs to be updated (needsupdate), it always returns true due to no secret found.

Summary by CodeRabbit

  • Bug Fixes
    • Improved secret existence verification to properly handle manually-managed credentials with fallback mechanism
    • Optimized credential synchronization workflow in secure token service environments, reducing unnecessary operations
    • Enhanced error resilience in credential request processing with better condition monitoring and early validation checks

@openshift-ci-robot openshift-ci-robot added jira/severity-low Referenced Jira bug's severity is low for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 22, 2026
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@dlom: This pull request references Jira Issue OCPBUGS-33994, which is invalid:

  • expected the bug to target the "5.0.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

xref: OCPBUGS-33994

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown

Walkthrough

The PR fixes STS credential handling in two files: the AWS actuator's Exists method gains a LiveClient fallback for cache-filtered secrets and a correct context pass-through; sync moves STS detection earlier with an early return for empty ARNs and simplifies the STS branch; and the credentials request controller adds a skip-sync guard for already-provisioned STS requests.

Changes

STS Credential Handling Fixes

Layer / File(s) Summary
Exists: context fix and LiveClient fallback
pkg/aws/actuator/actuator.go
Exists now passes the request ctx to a.Client.Get and, on IsNotFound, falls back to a.LiveClient.Get when non-nil, handling secrets absent from the cache due to label filters.
sync: early STS detection, early return for empty ARN, simplified STS branch
pkg/aws/actuator/actuator.go
sync moves IsTimedTokenCluster before needsUpdate and returns early when awsSTSIAMRoleARN is empty. The later STS branch removes the now-redundant IsTimedTokenCluster call and ignores the ARN decode error.
Controller: STS skip-sync guard in Reconcile
pkg/operator/credentialsrequest/credentialsrequest_controller.go
Adds an early-exit guard in the STS path of Reconcile: when the CredentialsRequest is not stale, was synced within syncPeriod, has no failure conditions, credentials exist, and it is already provisioned, the loop returns immediately with RequeueAfter: defaultRequeueTime.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 15
✅ Passed checks (15 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: preventing unnecessary reconciliation of externally-managed secrets, which is the core optimization across both modified files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All 17 Ginkgo test names in test/extend/cloudcredential.go are stable and deterministic. No test uses fmt.Sprintf, string concatenation, or variables to build test titles. All names are static, des...
Test Structure And Quality ✅ Passed PR contains no test file modifications. Custom check for Ginkgo test quality is not applicable; only implementation files (actuator.go, credentialsrequest_controller.go) were changed.
Microshift Test Compatibility ✅ Passed No new Ginkgo e2e tests were added. The PR added e2e tests using sigs.k8s.io/e2e-framework (not Ginkgo), so the MicroShift compatibility check does not apply.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No new Ginkgo e2e tests added; PR only modifies application code files (actuator and controller), not test files.
Topology-Aware Scheduling Compatibility ✅ Passed This PR modifies only controller logic (actuator and credentials request controller) to optimize reconciliation behavior. No deployment manifests, scheduling constraints, pod affinity/anti-affinity...
Ote Binary Stdout Contract ✅ Passed The PR adds new files in pkg/aws/actuator and pkg/operator/credentialsrequest directories, and cmd/cloud-credential-tests-ext/main.go. All process-level code uses proper logging that writes to stde...
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This PR does not add any new Ginkgo e2e tests. Changes are only to implementation files (actuator.go and credentialsrequest_controller.go), with no test code or Ginkgo patterns present. The custom...
No-Weak-Crypto ✅ Passed No weak crypto algorithms (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementations, or non-constant-time secret/token comparisons detected in the changed files.
Container-Privileges ✅ Passed No privileged container configurations found. All containers use secure defaults: allowPrivilegeEscalation=false, capabilities dropped, runAsNonRoot=true, and seccompProfile=RuntimeDefault.
No-Sensitive-Data-In-Logs ✅ Passed All new logging statements in the PR log only non-sensitive data: boolean values, static messages, and standard Kubernetes paths. No passwords, tokens, API keys, PII, or credentials are exposed in...

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@openshift-ci openshift-ci Bot requested a review from jstuever June 22, 2026 22:20
@openshift-ci

openshift-ci Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dlom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 22, 2026
@dlom

dlom commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 22, 2026
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@dlom: This pull request references Jira Issue OCPBUGS-33994, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @jianping-shu

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci

openshift-ci Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

@openshift-ci-robot: GitHub didn't allow me to request PR reviews from the following users: jianping-shu.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

@dlom: This pull request references Jira Issue OCPBUGS-33994, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @jianping-shu

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@dlom: This pull request references Jira Issue OCPBUGS-33994, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @jianping-shu

Details

In response to this:

xref: OCPBUGS-33994

When the cloud-credential-operator is used in manual mode, and awsSTSIAMRoleARN is not present in a credentialRequest (such as core operators), it continually attempts to reconcile. This started happening when we began filtering the secrets. Because the manually managed secrets do not have the proper annotation, cloud-credential-operator is no longer able to see them. As a result, when attempting to see if the credentialRequest needs to be updated (needsupdate), it always returns true due to no secret found.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci

openshift-ci Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

@openshift-ci-robot: GitHub didn't allow me to request PR reviews from the following users: jianping-shu.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

@dlom: This pull request references Jira Issue OCPBUGS-33994, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @jianping-shu

In response to this:

xref: OCPBUGS-33994

When the cloud-credential-operator is used in manual mode, and awsSTSIAMRoleARN is not present in a credentialRequest (such as core operators), it continually attempts to reconcile. This started happening when we began filtering the secrets. Because the manually managed secrets do not have the proper annotation, cloud-credential-operator is no longer able to see them. As a result, when attempting to see if the credentialRequest needs to be updated (needsupdate), it always returns true due to no secret found.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@codecov

codecov Bot commented Jun 22, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 24.13793% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 47.01%. Comparing base (9d6f071) to head (3282afd).

Files with missing lines Patch % Lines
pkg/aws/actuator/actuator.go 30.43% 12 Missing and 4 partials ⚠️
...redentialsrequest/credentialsrequest_controller.go 0.00% 6 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1042      +/-   ##
==========================================
- Coverage   47.06%   47.01%   -0.05%     
==========================================
  Files          97       97              
  Lines       12560    12578      +18     
==========================================
+ Hits         5911     5914       +3     
- Misses       5994     6008      +14     
- Partials      655      656       +1     
Files with missing lines Coverage Δ
...redentialsrequest/credentialsrequest_controller.go 45.76% <0.00%> (-0.38%) ⬇️
pkg/aws/actuator/actuator.go 67.15% <30.43%> (-0.64%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/aws/actuator/actuator.go (1)

361-363: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Consider reusing the ARN from the earlier decode instead of calling awsSTSIAMRoleARN twice.

The function awsSTSIAMRoleARN is called at line 336 and again at line 363 for the same cr. Since the first call already succeeded (otherwise we'd have returned), the second call is redundant. Consider hoisting the variable from the first call's scope so it can be reused here.

♻️ Proposed refactor to avoid redundant decode
 	if stsDetected {
-		awsSTSIAMRoleARN, err := awsSTSIAMRoleARN(minterv1.Codec, cr)
+		roleARN, err := awsSTSIAMRoleARN(minterv1.Codec, cr)
 		if err != nil {
 			return err
 		}
-		if awsSTSIAMRoleARN == "" {
+		if roleARN == "" {
 			logger.Debug("CredentialsRequest has no awsSTSIAMRoleARN, no reason to sync")
 			return nil
 		}
+		// Store for later use in the STS branch
+		_ = roleARN // will be used below
 	}
 
 	// ... needsUpdate check ...
 
 	if stsDetected {
 		logger.Debug("actuator detected STS enabled cluster, enabling STS secret brokering for CredentialsRequests providing an IAM Role ARN")
-		awsSTSIAMRoleARN, _ := awsSTSIAMRoleARN(minterv1.Codec, cr)
+		// roleARN was already decoded above

Note: This requires restructuring to keep roleARN in scope across both blocks, which may involve declaring it before the first if stsDetected block.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/aws/actuator/actuator.go` around lines 361 - 363, The function
awsSTSIAMRoleARN is being called redundantly on the same credential request
object cr at two different locations. Refactor by declaring the awsSTSIAMRoleARN
variable outside and before the first if stsDetected block where it is initially
computed, then reuse this variable in the second if stsDetected block instead of
calling awsSTSIAMRoleARN again. This eliminates the redundant function call
while keeping the ARN value available for both code paths that need it.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/aws/actuator/actuator.go`:
- Around line 361-363: The function awsSTSIAMRoleARN is being called redundantly
on the same credential request object cr at two different locations. Refactor by
declaring the awsSTSIAMRoleARN variable outside and before the first if
stsDetected block where it is initially computed, then reuse this variable in
the second if stsDetected block instead of calling awsSTSIAMRoleARN again. This
eliminates the redundant function call while keeping the ARN value available for
both code paths that need it.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c03ba1fd-9efd-4004-a1d5-30e86d86e43f

📥 Commits

Reviewing files that changed from the base of the PR and between 9d6f071 and 3282afd.

📒 Files selected for processing (2)
  • pkg/aws/actuator/actuator.go
  • pkg/operator/credentialsrequest/credentialsrequest_controller.go

@dlom

dlom commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

/assign @jstuever

@openshift-ci

openshift-ci Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

@dlom: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/security 3282afd link true /test security
ci/prow/e2e-aws-ovn 3282afd link true /test e2e-aws-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-low Referenced Jira bug's severity is low for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants