OCPBUGS-86588: vSphere boot image hot loop detection is non-functional due to stable template names#6094
OCPBUGS-86588: vSphere boot image hot loop detection is non-functional due to stable template names#6094djoshy wants to merge 1 commit into
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@djoshy: This pull request references Jira Issue OCPBUGS-86588, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
WalkthroughRefactors hot-loop detection to derive comparison values from vSphere stream metadata, makes checks non-mutating, runs checks against the proposed MachineSet before patching, records state only after successful patches, and updates tests to the new check signature. ChangesMAPI Hot-Loop Detection Enhancement
🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 14 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (14 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)Command failed Comment |
|
Skipping CI for Draft Pull Request. |
|
@djoshy: This pull request references Jira Issue OCPBUGS-86588, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: djoshy The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
5943240 to
99aba52
Compare
|
/payload-job periodic-ci-openshift-release-main-nightly-5.0-e2e-vsphere-ovn-csi-vcf9 |
|
@djoshy: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/022744a0-59fa-11f1-8a7a-1a6b8917247f-0 |
|
/payload-job periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-vsphere-mco-disruptive |
|
@djoshy: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/8f8d1540-59fa-11f1-81c8-01dd4b3aaccc-0 |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pkg/controller/bootimage/vsphere_helpers.go (1)
237-243: ⚡ Quick winFetch only the VM properties this helper actually uses.
Passing
niltovm.Propertiesasks vCenter for the entiremo.VirtualMachine, but this path only reads the product version and disk backing. Narrowing the property list keeps this reconciliation check cheaper and avoids pulling unrelated fields on every pass.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/controller/bootimage/vsphere_helpers.go` around lines 237 - 243, The vm.Properties call currently requests the entire mo.VirtualMachine by passing nil; change it to request only the needed properties (at minimum "summary.config.product.version" and the fields read by getDiskTypeFromExistingVM, e.g. "config.hardware.device") so the function still reads vmMo.Summary.Config.Product.Version and getDiskTypeFromExistingVM(vmMo) but without pulling unrelated fields; replace the nil argument with a []string literal listing these properties and add any additional specific property names that getDiskTypeFromExistingVM requires.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/controller/bootimage/vsphere_helpers.go`:
- Around line 495-501: The current truncation can chop off the hash suffix in
the VM name; instead compute the hash suffix via ovaTemplateHash(ova.Sha256)
(including the leading dash), calculate allowedPrefix = 80 - len(hashSuffix),
truncate the base name (constructed from infraID and failureDomain.Name) to
allowedPrefix if needed, then set name = truncatedBase + hashSuffix so the
content-addressed hash is always preserved; update the len(name) > 80 branch
around the name assignment where infraID, failureDomain.Name and ovaTemplateHash
are used and replace the simple slice truncate with this suffix-preserving
truncation logic.
---
Nitpick comments:
In `@pkg/controller/bootimage/vsphere_helpers.go`:
- Around line 237-243: The vm.Properties call currently requests the entire
mo.VirtualMachine by passing nil; change it to request only the needed
properties (at minimum "summary.config.product.version" and the fields read by
getDiskTypeFromExistingVM, e.g. "config.hardware.device") so the function still
reads vmMo.Summary.Config.Product.Version and getDiskTypeFromExistingVM(vmMo)
but without pulling unrelated fields; replace the nil argument with a []string
literal listing these properties and add any additional specific property names
that getDiskTypeFromExistingVM requires.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 07e13907-706a-4655-960f-34d935c21b8d
📒 Files selected for processing (1)
pkg/controller/bootimage/vsphere_helpers.go
|
/payload-abort |
99aba52 to
f2279c6
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/controller/bootimage/ms_helpers.go`:
- Around line 221-227: The code dereferences infra.Status.PlatformStatus without
checking it for nil, which can panic; update the conditional in the block that
reads infra, streamData and infra.Status.PlatformStatus.Type to also guard that
infra.Status.PlatformStatus != nil before comparing Type (i.e., ensure infra !=
nil && infra.Status.PlatformStatus != nil && streamData != nil &&
infra.Status.PlatformStatus.Type == osconfigv1.VSpherePlatformType), then
proceed to call streamData.GetArchitecture and read
streamArch.Artifacts["vmware"].Release to set value.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 536d1541-8258-466a-a44a-98c5940a4aa5
📒 Files selected for processing (2)
pkg/controller/bootimage/boot_image_controller_test.gopkg/controller/bootimage/ms_helpers.go
fdafdad to
1ee90b9
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/controller/bootimage/ms_helpers.go`:
- Around line 209-214: The log "No patching required for MAPI machineset" is
executed even after a successful patch because control falls through when
patchRequired is true; update the logic in the function containing variables
patchRequired and machineSet so that the klog.Infof("No patching required for
MAPI machineset %s", machineSet.Name) line is only executed when patchRequired
is false (e.g., wrap it in an else branch or return immediately after a
successful patch), and ensure the hot-loop check via
ctrl.checkMAPIMachineSetHotLoop(newMachineSet, streamData, infra, arch) still
runs only when a patch was attempted.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b4bd28e0-6ffc-45cd-ba2f-53578060e1a0
📒 Files selected for processing (2)
pkg/controller/bootimage/boot_image_controller_test.gopkg/controller/bootimage/ms_helpers.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/controller/bootimage/boot_image_controller_test.go
1ee90b9 to
35c629c
Compare
|
Actionable comments posted: 0 |
|
@coderabbitai resume |
✅ Actions performedReviews resumed. |
|
Actionable comments posted: 0 |
|
/payload-job periodic-ci-openshift-release-main-nightly-5.0-e2e-vsphere-ovn-csi-vcf9 periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-vsphere-mco-disruptive |
|
@djoshy: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/938adf70-5aad-11f1-81ba-0b05c5412494-0 |
|
/payload-job periodic-ci-openshift-release-main-nightly-5.0-e2e-vsphere-ovn-csi |
|
@djoshy: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/b3cdfa30-5abf-11f1-9f48-2fe9acf07461-0 |
35c629c to
dbea5e5
Compare
|
Actionable comments posted: 0 |
- What I did
vSphere template names are derived solely from
<infraID>-rhcos-<failureDomain.Name>, which never changes between upgrades. This meant the hot loop detector always saw identical provider spec bytes, making it unable to distinguish a real upgrade from a patch loop. Fixed by storing the RHCOS version associated with that update for hot loop detection for vSphere. All other platforms will use the traditional path of usingproviderSpecbytes. I have also broken down the hotloop detection check so the read and write happen in separate routines.- How to verify it
vSphere disruptive e2es should pass and not cause degrades when the installer artifacts are out of sync( https://redhat.atlassian.net/browse/OCPBUGS-86475 should stop happening, but this is not a fix for the underlying issue of the configmap bouncing, so I chose not to link this fix to that bug)
Summary by CodeRabbit