OCPBUGS-86418: sync CPMS template from master machine providerSpecs to prevent unintended rolling update#10569
Conversation
When users edit master machine manifests after `create manifests` (e.g. changing instanceType) but do not also edit the CPMS manifest, the ControlPlaneMachineSet template retains defaults. Post-install the CPMS operator detects drift between its template and the actual Machine providerSpecs, triggering an unintended rolling update that silently reverts the user's customization. Add a MasterCPMSSync asset that runs after Master asset generation. It compares provisioning-relevant AWS fields (instanceType, AMI, rootVolume, IAM profile, metadata auth, publicIP) between the first MAPI master machine and the CPMS template, syncing any drift and emitting a warning. Zone-specific fields (Subnet, Placement AZ) are intentionally excluded since they are managed by CPMS FailureDomains. Bug: https://issues.redhat.com/browse/OCPBUGS-86418 Co-authored-by: Cursor <cursoragent@cursor.com>
|
@chdeshpa-hue: This pull request references Jira Issue OCPBUGS-86418, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @chdeshpa-hue. Thanks for your PR. I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Summary
When users customize master machine manifests after
openshift-install create manifests(e.g. changing
instanceTypefromm6i.xlargetom6i.4xlarge) but do not also edit theControlPlaneMachineSet (CPMS) manifest, the CPMS template retains defaults. Post-install,
the CPMS operator detects drift between its template and the actual Machine providerSpecs,
triggering an unintended rolling update that silently reverts the user's customization.
This PR adds a
MasterCPMSSyncasset that runs afterMasterasset generation. It comparesprovisioning-relevant AWS fields between master machines and the CPMS template, syncing any
drift and emitting a warning.
Fields synced:
instanceType,ami.id,rootVolume(size, type, iops, encrypted, kmsKey),iamInstanceProfile,metadataServiceOptions.authentication,publicIPFields intentionally excluded:
subnet,placement.availabilityZone— these are managedby CPMS FailureDomains and are expected to differ per-machine.
Bug
https://issues.redhat.com/browse/OCPBUGS-86418
Root Cause
The installer generates the CPMS manifest as an independent asset from
install-configdefaults.When users edit per-machine manifests after
create manifests, the CPMS template is neverupdated to reflect those edits. Post-install, the CPMS operator reconciles machines against its
stale template, causing an unwanted rolling replacement.
Changes
pkg/asset/machines/mastercpmssync.goMasterCPMSSyncasset: decodes CPMS and first MAPI master providerSpecs, compares fields, syncs drift, emits warningspkg/asset/machines/mastercpmssync_test.gopkg/asset/cluster/cluster.goMasterCPMSSyncintoCluster.Dependencies()pkg/asset/ignition/bootstrap/common.goMasterCPMSSyncintoCommon.Dependencies()Test Plan
Unit tests
TestSyncCPMSAWSFields_NoDrift— no drift when specs matchTestSyncCPMSAWSFields_InstanceTypeDrift— instanceType syncedTestSyncCPMSAWSFields_RootVolumeDrift— volumeSize, volumeType, iops syncedTestSyncCPMSAWSFields_AMIDrift— AMI ID syncedTestSyncCPMSAWSFields_IAMProfileDrift— IAM instance profile syncedTestSyncCPMSAWSFields_MetadataAuthDrift— metadata auth syncedTestSyncCPMSAWSFields_MultiFieldDrift— multiple simultaneous driftsTestSyncCPMSAWSFields_KMSKeyDrift— KMS key syncedTestSyncCPMSAWSFields_PublicIPDrift— publicIP syncedTestSyncCPMSAWSFields_SubnetNotSynced— subnet excluded (FailureDomains)TestSyncCPMSAWSFields_PlacementNotSynced— AZ excluded (FailureDomains)TestDecodeCPMSProviderSpec_NilInput— nil input handlingTestDecodeCPMSProviderSpec_FromRawBytes— raw JSON byte decodingManual validation (AWS IPI, OCP 4.22-rc.4)
instanceType: m6i.4xlargeaftercreate manifestscreate clusterwith patched binaryRelated
Made with Cursor