Skip to content

OCPBUGS-86471: baremetal: fix hostSelector for default stream#10571

Open
honza wants to merge 1 commit into
openshift:mainfrom
honza:host-labels
Open

OCPBUGS-86471: baremetal: fix hostSelector for default stream#10571
honza wants to merge 1 commit into
openshift:mainfrom
honza:host-labels

Conversation

@honza
Copy link
Copy Markdown
Member

@honza honza commented May 25, 2026

Commit edd7753 ("baremetal: Add coreos.openshift.io/stream label to BMH and hostSelector") unconditionally set HostSelector.MatchLabels on BareMetalMachineProviderSpec to require coreos.openshift.io/stream=rhel-9 on every BareMetalHost. This works for installer-created BMHs which get the label stamped during installation, but breaks scaling and upgrade scenarios where MachineSets need to claim pre-existing BareMetalHosts from the CI pool that don't carry this label. Since MatchLabels is a hard requirement (the label must exist AND match), unlabeled hosts are never selected and provisioning hangs.

Only set HostSelector when a non-default stream (e.g. rhel-10) is explicitly configured in the install-config. For the default rhel-9 case (or when osImageStream is unset), leave HostSelector empty so MachineSets can claim any available BMH regardless of labeling.

The stream labels on installer-created BMHs are left unchanged — they remain informational and correct.

Summary by CodeRabbit

Bug Fixes

  • Bug Fixes
    • Corrected bare metal host selection behavior to only apply custom filtering when explicitly configured, restoring compatibility with unlabeled bare metal hosts in deployments.

Commit edd7753 ("baremetal: Add coreos.openshift.io/stream label to
BMH and hostSelector") unconditionally set HostSelector.MatchLabels on
BareMetalMachineProviderSpec to require coreos.openshift.io/stream=rhel-9
on every BareMetalHost. This works for installer-created BMHs which get
the label stamped during installation, but breaks scaling and upgrade
scenarios where MachineSets need to claim pre-existing BareMetalHosts
from the CI pool that don't carry this label. Since MatchLabels is a
hard requirement (the label must exist AND match), unlabeled hosts are
never selected and provisioning hangs.

Only set HostSelector when a non-default stream (e.g. rhel-10) is
explicitly configured in the install-config. For the default rhel-9
case (or when osImageStream is unset), leave HostSelector empty so
MachineSets can claim any available BMH regardless of labeling.

The stream labels on installer-created BMHs are left unchanged — they
remain informational and correct.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 25, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@honza: This pull request references Jira Issue OCPBUGS-86471, which is invalid:

  • expected the bug to target the "5.0.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Commit edd7753 ("baremetal: Add coreos.openshift.io/stream label to BMH and hostSelector") unconditionally set HostSelector.MatchLabels on BareMetalMachineProviderSpec to require coreos.openshift.io/stream=rhel-9 on every BareMetalHost. This works for installer-created BMHs which get the label stamped during installation, but breaks scaling and upgrade scenarios where MachineSets need to claim pre-existing BareMetalHosts from the CI pool that don't carry this label. Since MatchLabels is a hard requirement (the label must exist AND match), unlabeled hosts are never selected and provisioning hangs.

Only set HostSelector when a non-default stream (e.g. rhel-10) is explicitly configured in the install-config. For the default rhel-9 case (or when osImageStream is unset), leave HostSelector empty so MachineSets can claim any available BMH regardless of labeling.

The stream labels on installer-created BMHs are left unchanged — they remain informational and correct.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label May 25, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a49c19e5-499d-4e00-b057-76482d1b0fc2

📥 Commits

Reviewing files that changed from the base of the PR and between 7a869bf and 7609e14.

📒 Files selected for processing (1)
  • pkg/asset/machines/baremetal/machines.go

Walkthrough

The provider function in baremetal machines is updated to conditionally set HostSelector based on the osImageStream configuration. The logic removes automatic defaulting and only applies HostSelector when an explicit non-default stream is specified, preserving compatibility with unlabeled Bare Metal Hosts.

Changes

HostSelector Configuration Logic

Layer / File(s) Summary
Conditional HostSelector setting based on explicit stream
pkg/asset/machines/baremetal/machines.go
The provider function removes unconditional stream derivation and default logic, replacing it with a conditional that sets config.HostSelector only when osImageStream is explicitly provided and differs from rhcos.DefaultOSImageStream, leaving HostSelector unset for the default case.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

  • openshift/installer#10502: Both PRs modify the baremetal machine provider and HostSelector logic to handle coreos.openshift.io/stream configuration based on OSImageStream defaults.

Suggested labels

approved, lgtm

Suggested reviewers

  • dtantsur
  • iurygregory
🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: fixing the hostSelector behavior for the default stream case in baremetal machines configuration.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR modifies only non-test code (machines.go). Repository contains no Ginkgo tests. No test files were added or modified, so check is not applicable.
Test Structure And Quality ✅ Passed PR contains no Ginkgo tests. The test file (hosts_test.go) uses standard Go testing with testing.T, not Ginkgo framework. Check is not applicable.
Microshift Test Compatibility ✅ Passed PR modifies only non-test code (baremetal machine asset generation). No new Ginkgo e2e tests are added, so MicroShift compatibility check is not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR contains no new Ginkgo e2e tests; it only modifies baremetal configuration generation code in machines.go, making the SNO test compatibility check not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed PR changes baremetal machine provisioning, not pod scheduling. No affinity, topology spread, nodeSelector, or topology-dependent constraints are introduced.
Ote Binary Stdout Contract ✅ Passed This PR modifies the OpenShift Installer's bare metal configuration code, not OTE binary code. The OTE Binary Stdout Contract check applies only to test infrastructure binaries.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No new Ginkgo e2e tests were added in this PR. The changes are limited to infrastructure code in pkg/asset/machines/baremetal/machines.go that modifies HostSelector configuration logic.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from elfosardo and iurygregory May 25, 2026 19:32
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 25, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign honza for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@honza
Copy link
Copy Markdown
Member Author

honza commented May 25, 2026

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 25, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@honza: This pull request references Jira Issue OCPBUGS-86471, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@honza: This pull request references Jira Issue OCPBUGS-86471, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

Commit edd7753 ("baremetal: Add coreos.openshift.io/stream label to BMH and hostSelector") unconditionally set HostSelector.MatchLabels on BareMetalMachineProviderSpec to require coreos.openshift.io/stream=rhel-9 on every BareMetalHost. This works for installer-created BMHs which get the label stamped during installation, but breaks scaling and upgrade scenarios where MachineSets need to claim pre-existing BareMetalHosts from the CI pool that don't carry this label. Since MatchLabels is a hard requirement (the label must exist AND match), unlabeled hosts are never selected and provisioning hangs.

Only set HostSelector when a non-default stream (e.g. rhel-10) is explicitly configured in the install-config. For the default rhel-9 case (or when osImageStream is unset), leave HostSelector empty so MachineSets can claim any available BMH regardless of labeling.

The stream labels on installer-created BMHs are left unchanged — they remain informational and correct.

Summary by CodeRabbit

Bug Fixes

  • Bug Fixes
  • Corrected bare metal host selection behavior to only apply custom filtering when explicitly configured, restoring compatibility with unlabeled bare metal hosts in deployments.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@iurygregory
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 26, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 26, 2026

@honza: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn 7609e14 link true /test e2e-aws-ovn
ci/prow/e2e-metal-ipi-ovn 7609e14 link false /test e2e-metal-ipi-ovn
ci/prow/e2e-metal-single-node-live-iso 7609e14 link false /test e2e-metal-single-node-live-iso
ci/prow/e2e-metal-ipi-ovn-swapped-hosts 7609e14 link false /test e2e-metal-ipi-ovn-swapped-hosts
ci/prow/e2e-metal-ipi-ovn-virtualmedia 7609e14 link false /test e2e-metal-ipi-ovn-virtualmedia

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.


// Only set HostSelector when a non-default stream is explicitly configured.
// Unlabeled BMHs (e.g. spare hosts in a CI pool) won't match a selector,
// so we leave it empty for the default stream to preserve compatibility.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this is that it will freely pick up e.g. BMHs that are explicitly labelled for rhel-10 and happily try to install rhel-9 on them.

Ideally the semantics we want are to match any BMHs either with the label value we want or without any label of that type. That doesn't seem to be possible with the Selector API, as all of the match conditions must be true. We would have to change the API to allow multiple selectors. Or k8s needs to add a NotOut operator to selectors 🙂.

What we can do is use the NotEquals or NotIn operator to exclude the non-default value (i.e. currently rhel-10) - that does still match in the case where the label is not set. Unfortunately this is not super future-proof, as it requires us to enumerate all of the values we don't want to match.

The alternative would be to not apply the label to any hosts, and explicitly match hosts without the label. So that the default pool would always use unlabelled hosts and for the non-default pools you would have to label the hosts and add a selector to the pool. I kind of hate this because as you upgrade into a version with new default streams, the correct labelling continues to depend on the cluster's born-in version.

It looks like past-me was at least dimly aware of the problem:

The hardest part will be handling the default case that there is no label. Worst case scenario, we could run a small controller that adds the default label value to any BMHs that don't have the label. It would be better not to have to, but at least we know that a viable solution exists.

So many options, all fairly bad.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we can do is use the NotEquals or NotIn operator to exclude the non-default value (i.e. currently rhel-10) - that does still match in the case where the label is not set. Unfortunately this is not super future-proof, as it requires us to enumerate all of the values we don't want to match.

By the time we get to RHEL 11, maybe it will become less of a problem :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants