Skip to content

OCPBUGS-64688: use ObservedGeneration to determine Progressing status#1169

Open
sg00dwin wants to merge 1 commit into
openshift:mainfrom
sg00dwin:OCPBUGS-64688-progressing-true-during-node-reboot
Open

OCPBUGS-64688: use ObservedGeneration to determine Progressing status#1169
sg00dwin wants to merge 1 commit into
openshift:mainfrom
sg00dwin:OCPBUGS-64688-progressing-true-during-node-reboot

Conversation

@sg00dwin
Copy link
Copy Markdown
Member

@sg00dwin sg00dwin commented May 29, 2026

Summary

  • Replace IsAvailableAndUpdated with ObservedGeneration < Generation check for the Progressing condition
  • IsAvailableAndUpdated used UpdatedReplicas == Replicas, which falsely triggers during node reboots when pods are temporarily disrupted — even though the operator made no spec changes
  • ObservedGeneration < Generation only fires when the operator has updated the deployment spec and the deployment controller hasn't processed it yet

Root Cause

During node reboots (MCO draining nodes), console pods are terminated and replacements are created. During the ~30-second gap, UpdatedReplicas != Replicas, causing IsAvailableAndUpdated to return false and the operator to report Progressing=True. The operator made no changes — the disruption is external.

Why This Fix

ObservedGeneration < Generation precisely detects operator-initiated spec changes:

  • Node reboot (no spec change): Generation unchanged → Progressing=False ✓
  • Real upgrade (operator updates spec): Generation bumps → Progressing=True ✓
  • Fresh install: Generation=1, ObservedGeneration=0 → Progressing=True ✓

The Available condition (IsAvailable at line 231) independently monitors pod readiness and is unaffected.

Test plan

  • make builds successfully
  • make test-unit all tests pass
  • gofmt / go vet clean
  • CI e2e upgrade test [Monitor:legacy-cvo-invariants][bz-Management Console] clusteroperator/console should stay Progressing=False while MCO is Progressing=True should stop failing (~8% failure rate on Sippy currently)

Assisted-by: Claude Code (Opus 4.6)

Summary by CodeRabbit

Release Notes

  • Bug Fixes
    • Enhanced operator deployment progress tracking by refining how deployment generation state is observed and reported during rollouts and version changes.

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 29, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@sg00dwin: This pull request references Jira Issue OCPBUGS-64688, which is invalid:

  • expected the bug to target either version "5.0." or "openshift-5.0.", but it targets "4.21.z" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Summary

  • Replace IsAvailableAndUpdated with ObservedGeneration < Generation check for the Progressing condition
  • IsAvailableAndUpdated used UpdatedReplicas == Replicas, which falsely triggers during node reboots when pods are temporarily disrupted — even though the operator made no spec changes
  • ObservedGeneration < Generation only fires when the operator has updated the deployment spec and the deployment controller hasn't processed it yet

Root Cause

During node reboots (MCO draining nodes), console pods are terminated and replacements are created. During the ~30-second gap, UpdatedReplicas != Replicas, causing IsAvailableAndUpdated to return false and the operator to report Progressing=True. The operator made no changes — the disruption is external.

Why This Fix

ObservedGeneration < Generation precisely detects operator-initiated spec changes:

  • Node reboot (no spec change): Generation unchanged → Progressing=False ✓
  • Real upgrade (operator updates spec): Generation bumps → Progressing=True ✓
  • Fresh install: Generation=1, ObservedGeneration=0 → Progressing=True ✓

The Available condition (IsAvailable at line 231) independently monitors pod readiness and is unaffected.

Test plan

  • make builds successfully
  • make test-unit all tests pass
  • gofmt / go vet clean
  • CI e2e upgrade test [Monitor:legacy-cvo-invariants][bz-Management Console] clusteroperator/console should stay Progressing=False while MCO is Progressing=True should stop failing (~8% failure rate on Sippy currently)

Assisted-by: Claude Code (Opus 4.6)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 74f527c1-03f6-4715-a7f5-2b71ded6d5b1

📥 Commits

Reviewing files that changed from the base of the PR and between cba2bb3 and e6d0bb7.

📒 Files selected for processing (1)
  • pkg/console/operator/sync_v400.go
📜 Recent review details
🧰 Additional context used
📓 Path-based instructions (4)
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*.go: Follow Go coding standards and patterns documented in CONVENTIONS.md
Organize imports according to conventions documented in CONVENTIONS.md
Use gofmt to format Go code with standard formatting
Run go vet checks on all Go packages

Follow Go coding standards and patterns as documented in CONVENTIONS.md, including proper import organization

Organize Go code following the repository structure: main entry point in cmd/console/main.go, API constants in pkg/api/, operator command setup in pkg/cmd/operator/, and version command in pkg/cmd/version/

**/*.go: Use gofmt for formatting Go code
Follow standard Go naming conventions
Group imports in order: standard lib, 3rd party, kube/openshift, internal (marked with comments)
Use meaningful error messages with context in Go code
Set status conditions using status.Handle* functions with type prefixes (*Degraded, *Progressing, *Available, *Upgradeable)
Use typed errors and wrap errors to preserve stack context

Files:

  • pkg/console/operator/sync_v400.go

⚙️ CodeRabbit configuration file

**/*.go: Review Go code following OpenShift operator patterns.
See CONVENTIONS.md for coding standards and patterns.

Refer to the following skills based on CODE PATTERNS, not just file paths:

Refer to /controller-review when code contains:

  • Controller struct types (e.g., type *Controller struct)
  • func New*Controller( factory functions
  • factory.New().WithFilteredEventsInformers( pattern
  • .ToController( method calls
  • Sync(ctx context.Context, controllerContext factory.SyncContext) methods
  • operatorConfig.Spec.ManagementState checks
  • status.NewStatusHandler or status.Handle* functions

Refer to /sync-handler-review when code contains:

  • Main operator sync functions (e.g., sync_v400.go content)
  • Sequential resource syncing with early returns
  • Incremental reconciliation loops
  • Multiple resourceapply.Apply*() calls in sequence
  • Dependency ordering of ConfigMaps → Secrets → Service Accounts → RBAC → Services → Deployments → Routes
  • Feature gate conditional logic

Refer to /go-quality-review for all Go code to check:

  • Deprecated imports: ioutil.ReadFile, ioutil.WriteFile, ioutil.ReadAll
  • Deprecated patterns: Dial without DialContext
  • Error handling: missing %w in fmt.Errorf
  • Code smells: deep nesting (4+ levels), functions >100 lines
  • Magic values: unexplained numbers/strings
  • Context propagation: context.Background() instead of passed ctx
  • Missing godoc on exported functions

Files:

  • pkg/console/operator/sync_v400.go
{pkg,cmd}/**/*.go

📄 CodeRabbit inference engine (CLAUDE.md)

Use gofmt for code formatting on pkg and cmd directories

{pkg,cmd}/**/*.go: Format code using gofmt -w ./pkg ./cmd
Run go vet checks on all Go packages in ./pkg and ./cmd

Files:

  • pkg/console/operator/sync_v400.go
**/*sync*.go

📄 CodeRabbit inference engine (CONVENTIONS.md)

Implement sync loops (sync_v400) incrementally: start from zero, create/update missing requirements, and return to continue on next loop

Files:

  • pkg/console/operator/sync_v400.go
**/*.{py,js,ts,go,rs,java,rb,php,kt,swift,cs}

⚙️ CodeRabbit configuration file

**/*.{py,js,ts,go,rs,java,rb,php,kt,swift,cs}: Injection prevention (prodsec-skills):

  • SQL: parameterized queries only; no string concatenation
  • Command: no shell=True, os.system, or backtick exec with user input
  • LDAP/XPath: escape special characters in filters
  • Path traversal: canonicalize paths, reject ../
  • Deserialization: no pickle/yaml.load()/eval on untrusted data
  • Prototype pollution: no recursive merge of untrusted objects
  • Validate at trust boundaries with allow-lists, not deny-lists
  • Normalize Unicode and anchor regexes (^$); watch for ReDoS

Files:

  • pkg/console/operator/sync_v400.go
🔇 Additional comments (1)
pkg/console/operator/sync_v400.go (1)

214-221: LGTM!


Walkthrough

The change modifies the "Progressing" condition logic in SyncLoopRefresh within the console operator sync handler. Instead of checking if the deployment is available and fully updated, the condition now explicitly validates whether the deployment controller has observed the latest generation specification before allowing progression.

Changes

Deployment Generation Observation Check

Layer / File(s) Summary
Generation Observation Check
pkg/console/operator/sync_v400.go
The "Progressing" condition in SyncLoopRefresh now fails with an error when the deployment's ObservedGeneration is less than its current Generation, replacing the previous IsAvailableAndUpdated check that reported replica counts and operator version.

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 13 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Ipv6 And Disconnected Network Test Compatibility ⚠️ Warning New e2e tests have IPv6 and connectivity issues: metrics_test.go line 184 concatenates URLs without net.JoinHostPort (fails on IPv6); downloads_test.go requests external CLI binary URLs. Use net.JoinHostPort for IPv6 URL construction in metrics_test.go; add [Skipped:Disconnected] tag to downloads_test.go TestDownloadsEndpoint for disconnected environments.
Test Structure And Quality ❓ Inconclusive The custom check requests review of Ginkgo test code quality, but this repository uses standard Go testing (testing package), not Ginkgo/Gomega framework. No Ginkgo tests exist in the codebase. Clarify whether the check applies to standard Go tests (*_test.go) or only to Ginkgo-based tests (Describe/It blocks). This repository contains no Ginkgo tests.
✅ Passed checks (13 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and accurately summarizes the main change: replacing IsAvailableAndUpdated logic with ObservedGeneration-based determination for the Progressing status.
Description check ✅ Passed The description provides comprehensive analysis, root cause, solution details, and test plan, but lacks some required template sections like browser conformance, setup/test cases structure, and reviewer assignments.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed The PR modifies sync_v400.go with no test naming changes. The test suite uses standard Go testing (t.Run), not Ginkgo, so the Ginkgo test naming check does not apply.
Microshift Test Compatibility ✅ Passed This PR does not add any Ginkgo e2e tests. The repository uses standard Go testing package, and the changes are limited to operator logic in sync_v400.go with no new test additions.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No new Ginkgo e2e tests were added in this PR. The changes are limited to operator code in sync_v400.go. The check is not applicable to code-only modifications.
Topology-Aware Scheduling Compatibility ✅ Passed Change only modifies condition-reporting logic in sync_v400.go, not deployment manifests or scheduling. Deployments already use topology-aware scheduling via ShouldDeployHA() checks.
Ote Binary Stdout Contract ✅ Passed This PR modifies operational code in console-operator's sync_v400.go, not test binary infrastructure. OTE Binary Stdout Contract applies to test binaries, not operational components.
No-Weak-Crypto ✅ Passed No weak crypto patterns (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB) found. Changes involve deployment metadata comparison, not cryptography.
Container-Privileges ✅ Passed No privileged container settings found. All deployments use allowPrivilegeEscalation: false, runAsNonRoot: true, and drop all capabilities.
No-Sensitive-Data-In-Logs ✅ Passed The PR changes only log Kubernetes deployment generation counters (integers), not passwords, tokens, API keys, PII, session IDs, or hostnames.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from TheRealJon and spadgett May 29, 2026 13:46
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 29, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sg00dwin
Once this PR has been reviewed and has the lgtm label, please assign therealjon for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sg00dwin
Copy link
Copy Markdown
Member Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 29, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@sg00dwin: This pull request references Jira Issue OCPBUGS-64688, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 29, 2026

@sg00dwin: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-console e6d0bb7 link true /test e2e-aws-console
ci/prow/e2e-azure-ovn-upgrade e6d0bb7 link true /test e2e-azure-ovn-upgrade

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@sg00dwin
Copy link
Copy Markdown
Member Author

/retest-required

@Leo6Leo
Copy link
Copy Markdown
Contributor

Leo6Leo commented May 29, 2026

/cc @Leo6Leo

Spinning cluster to do QE currently

@openshift-ci openshift-ci Bot requested a review from Leo6Leo May 29, 2026 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants