OCPBUGS-86471: baremetal: fix hostSelector for default stream#10571
OCPBUGS-86471: baremetal: fix hostSelector for default stream#10571honza wants to merge 1 commit into
Conversation
Commit edd7753 ("baremetal: Add coreos.openshift.io/stream label to BMH and hostSelector") unconditionally set HostSelector.MatchLabels on BareMetalMachineProviderSpec to require coreos.openshift.io/stream=rhel-9 on every BareMetalHost. This works for installer-created BMHs which get the label stamped during installation, but breaks scaling and upgrade scenarios where MachineSets need to claim pre-existing BareMetalHosts from the CI pool that don't carry this label. Since MatchLabels is a hard requirement (the label must exist AND match), unlabeled hosts are never selected and provisioning hangs. Only set HostSelector when a non-default stream (e.g. rhel-10) is explicitly configured in the install-config. For the default rhel-9 case (or when osImageStream is unset), leave HostSelector empty so MachineSets can claim any available BMH regardless of labeling. The stream labels on installer-created BMHs are left unchanged — they remain informational and correct. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@honza: This pull request references Jira Issue OCPBUGS-86471, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
WalkthroughThe ChangesHostSelector Configuration Logic
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 11 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/jira refresh |
|
@honza: This pull request references Jira Issue OCPBUGS-86471, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@honza: This pull request references Jira Issue OCPBUGS-86471, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/lgtm |
|
@honza: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
|
||
| // Only set HostSelector when a non-default stream is explicitly configured. | ||
| // Unlabeled BMHs (e.g. spare hosts in a CI pool) won't match a selector, | ||
| // so we leave it empty for the default stream to preserve compatibility. |
There was a problem hiding this comment.
The problem with this is that it will freely pick up e.g. BMHs that are explicitly labelled for rhel-10 and happily try to install rhel-9 on them.
Ideally the semantics we want are to match any BMHs either with the label value we want or without any label of that type. That doesn't seem to be possible with the Selector API, as all of the match conditions must be true. We would have to change the API to allow multiple selectors. Or k8s needs to add a NotOut operator to selectors 🙂.
What we can do is use the NotEquals or NotIn operator to exclude the non-default value (i.e. currently rhel-10) - that does still match in the case where the label is not set. Unfortunately this is not super future-proof, as it requires us to enumerate all of the values we don't want to match.
The alternative would be to not apply the label to any hosts, and explicitly match hosts without the label. So that the default pool would always use unlabelled hosts and for the non-default pools you would have to label the hosts and add a selector to the pool. I kind of hate this because as you upgrade into a version with new default streams, the correct labelling continues to depend on the cluster's born-in version.
It looks like past-me was at least dimly aware of the problem:
The hardest part will be handling the default case that there is no label. Worst case scenario, we could run a small controller that adds the default label value to any BMHs that don't have the label. It would be better not to have to, but at least we know that a viable solution exists.
So many options, all fairly bad.
There was a problem hiding this comment.
What we can do is use the NotEquals or NotIn operator to exclude the non-default value (i.e. currently rhel-10) - that does still match in the case where the label is not set. Unfortunately this is not super future-proof, as it requires us to enumerate all of the values we don't want to match.
By the time we get to RHEL 11, maybe it will become less of a problem :)
Commit edd7753 ("baremetal: Add coreos.openshift.io/stream label to BMH and hostSelector") unconditionally set HostSelector.MatchLabels on BareMetalMachineProviderSpec to require coreos.openshift.io/stream=rhel-9 on every BareMetalHost. This works for installer-created BMHs which get the label stamped during installation, but breaks scaling and upgrade scenarios where MachineSets need to claim pre-existing BareMetalHosts from the CI pool that don't carry this label. Since MatchLabels is a hard requirement (the label must exist AND match), unlabeled hosts are never selected and provisioning hangs.
Only set HostSelector when a non-default stream (e.g. rhel-10) is explicitly configured in the install-config. For the default rhel-9 case (or when osImageStream is unset), leave HostSelector empty so MachineSets can claim any available BMH regardless of labeling.
The stream labels on installer-created BMHs are left unchanged — they remain informational and correct.
Summary by CodeRabbit
Bug Fixes