Skip to content

SPLAT-2172: AWS dedicate host support#10079

Merged
openshift-merge-bot[bot] merged 5 commits intoopenshift:mainfrom
vr4manta:SPLAT-2172
Mar 20, 2026
Merged

SPLAT-2172: AWS dedicate host support#10079
openshift-merge-bot[bot] merged 5 commits intoopenshift:mainfrom
vr4manta:SPLAT-2172

Conversation

@vr4manta
Copy link
Copy Markdown
Contributor

@vr4manta vr4manta commented Nov 13, 2025

SPLAT-2172

Changes

  • Bumped openshift/api
  • Added logic to allow HostPlacement of dedicated hosts

Dependencies

Notes

MAO and CAO changes are needed for it to fully work. For now, this PR is adding the ability to generate the needed outputs for bootstrapping.

Summary by CodeRabbit

  • New Features

    • Introduced hostPlacement (affinity + dedicated host selection) across platform machine platforms to support AWS dedicated-host placement.
  • Improvements

    • Added install-time validations and feature-gate activation for dedicated-host usage, including region/zone and role constraints.
    • Threaded dedicated-host info into machine set provisioning, tagging during install, and release handling during cleanup.
    • Added required permissions for dedicated-host operations when applicable.
  • Tests

    • Added tests covering hostPlacement scenarios and feature-gate behavior.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 13, 2025
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Nov 13, 2025

@vr4manta: This pull request references SPLAT-2172 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

SPLAT-2172

Changes

  • Bumped openshift/api
  • Added logic to allow HostPlacement of dedicated hosts

Dependencies

Notes

MAO and CAO changes are needed for it to fully work. For now, this PR is adding the ability to generate the needed outputs for bootstrapping.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@tthvo
Copy link
Copy Markdown
Member

tthvo commented Nov 14, 2025

/cc

@openshift-ci openshift-ci Bot requested a review from tthvo November 14, 2025 00:35
@vr4manta vr4manta force-pushed the SPLAT-2172 branch 3 times, most recently from 24500b6 to 342ab2d Compare November 17, 2025 17:44
@tthvo
Copy link
Copy Markdown
Member

tthvo commented Nov 18, 2025

/retest-required

Copy link
Copy Markdown
Member

@tthvo tthvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool feature 👍 🚀

I have quite some comments 😅 Sorry if I review a little too soon...

Comment thread pkg/types/aws/validation/machinepool.go Outdated
Comment thread pkg/types/aws/validation/machinepool.go Outdated
switch *p.HostPlacement.Affinity {
case aws.HostAffinityAnyAvailable:
if p.HostPlacement.DedicatedHost != nil {
allErrs = append(allErrs, field.Required(fldPath.Child("dedicatedHost"), "dedicatedHost is required when 'affinity' is set to DedicatedHost, and forbidden otherwise"))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean hostPlacement.dedicatedHost is forbidden if affinity is AnyAvailable?

 allErrs = append(allErrs, field.Forbidden(fldPath.Child("dedicatedHost"), "dedicatedHost must not be set when affinity is AnyAvailable"))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tthvo So this was an interesting one. In openshift/api and openshift/machine-api-operator, it was suggested to error this way for this scenario. I was doing this to keep it consistent.

Now that said, I am happy to make your suggestion if you prefer installer say it this way. Just let me know.

Comment thread pkg/types/aws/validation/machinepool.go Outdated
Comment thread pkg/types/aws/validation/machinepool.go Outdated
Comment thread pkg/types/aws/validation/machinepool_test.go
Comment thread pkg/asset/installconfig/aws/validation.go Outdated
if err != nil {
allErrs = append(allErrs, field.InternalError(placementPath.Child("dedicatedHost"), err))
} else {
// Check the returned configured hosts to see if the dedicated hosts defined in install-config exists.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 ❓ I got 2 questions here:

  1. Do the zones of dedicated hosts need to match machinepool zone field if defined?

    type MachinePool struct {
    // Zones is list of availability zones that can be used.
    //
    // +optional
    Zones []string `json:"zones,omitempty"`

  2. Do the user-input zones for dedicated hosts need to match with the actual zones returned from AWS?

    // If user specified a zone, validate it matches AWS
    if host.Zone != "" && host.Zone != hostDetails.Zone {
      allErrs = append(allErrs, field.Invalid(
          fldPath.Child("hostPlacement", "dedicatedHost").Index(i).Child("zone"),
          host.Zone,
          fmt.Sprintf("specified zone %s does not match actual host zone %s",
              host.Zone, hostDetails.Zone)))
    }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect them to not include one thats not specified (or in the generic case where we read all zones). So I can add logic when checking to see if its is one that we are interested in or not.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a stab at adding logic for if zone for found dedicated host is valid for pool config. I am gonna add another check shortly to make sure if user configured the dedicatedHost to have zone, that its actually correct as well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the user-input zones for dedicated hosts need to match with the actual zones returned from AWS?

On that point, I'm curious if users should need to specify the zone for the dedicated host? It seems like we can just look up the dedicated host by id and then use that to determine the zone (which the pr is already doing IIUC). Seems like we might be able to shed zone from the install config.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic was updated. We no longer have field for config to specify the zone. The validation logic does make sure that the configured DHs do belong to a zone that the install-config is configured for.

Comment thread pkg/asset/machines/worker.go Outdated
Comment thread pkg/asset/machines/aws/awsmachines.go Outdated
Comment thread pkg/asset/machines/aws/machinesets.go Outdated
@vr4manta
Copy link
Copy Markdown
Contributor Author

/hold
Need to rebase and add validation logic to make sure defined dedicated hosts are in zones based on the machinepool config.

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 18, 2025
@vr4manta vr4manta force-pushed the SPLAT-2172 branch 5 times, most recently from 82915c2 to 8ef8679 Compare November 21, 2025 13:54
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 21, 2025
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 21, 2025
@vr4manta
Copy link
Copy Markdown
Contributor Author

/payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-aws-ovn-dedicated

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Nov 21, 2025

@vr4manta: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.21-e2e-aws-ovn-dedicated

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/1c5c6d80-c717-11f0-93e9-d918c346504c-0

@vr4manta
Copy link
Copy Markdown
Contributor Author

/unhold
/retest

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 24, 2025
@vr4manta
Copy link
Copy Markdown
Contributor Author

/retest

@vr4manta
Copy link
Copy Markdown
Contributor Author

vr4manta commented Dec 2, 2025

/hold
based on discussions with @patrickdillon and @JoelSpeed , I am adding MAPA support and then will enhance this PR to leverage that code path without being dependent on CAPI.

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 2, 2025
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/asset/machines/worker.go (1)

537-555: ⚠️ Potential issue | 🟠 Major

Fail fast when a resolved AZ has no matching dedicated host.

mpool.Zones may already have been defaulted to every worker subnet/AZ before this block runs. If the user only configured dedicated hosts for a subset of those zones, aws.MachineSets will still generate a MachineSet for the extra AZs, and aws.DedicatedHost(...) resolves to "" for them. That leaves a DedicatedHost pool emitting invalid provider specs instead of surfacing a clear installer error.

Proposed guard
 			dHosts := map[string]icaws.Host{}
 			if mpool.HostPlacement != nil && mpool.HostPlacement.Affinity != nil && *mpool.HostPlacement.Affinity == awstypes.HostAffinityDedicatedHost {
 				dHosts, err = installConfig.AWS.DedicatedHosts(ctx, mpool.HostPlacement.DedicatedHost)
 				if err != nil {
 					return fmt.Errorf("failed to retrieve dedicated hosts for compute pool: %w", err)
 				}
+				for _, zone := range mpool.Zones {
+					if aws.DedicatedHost(dHosts, mpool.HostPlacement, zone) == "" {
+						return fmt.Errorf("no dedicated host matches zone %s for compute pool %s", zone, pool.Name)
+					}
+				}
 			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/asset/machines/worker.go` around lines 537 - 555, If dedicated hosts were
requested (mpool.HostPlacement != nil && mpool.HostPlacement.Affinity != nil &&
*mpool.HostPlacement.Affinity == awstypes.HostAffinityDedicatedHost), after
calling installConfig.AWS.DedicatedHosts(...) and before calling
aws.MachineSets(...), validate that every AZ in mpool.Zones has a corresponding
entry in the returned dHosts map (or that aws.DedicatedHost(...) would not
return an empty string for that AZ); if any AZ is missing a dedicated host,
return an error (fmt.Errorf) describing the specific AZ(s) with no matching
dedicated host so we fail fast instead of producing MachineSets with empty
DedicatedHost values.
♻️ Duplicate comments (7)
pkg/asset/machines/aws/machines.go (1)

370-383: ⚠️ Potential issue | 🟠 Major

Silent fallback to shared placement when no host matches the zone.

When placement != nil (i.e., DedicatedHost affinity is configured) but no host ID matches the given zone, returning "" causes the caller to omit Placement.Host entirely. This silently produces a non-dedicated instance for a pool that explicitly requested dedicated host placement.

Consider returning an error when placement is configured but no matching host is found:

🔧 Suggested approach
-func DedicatedHost(hosts map[string]aws.Host, placement *awstypes.HostPlacement, zone string) string {
+func DedicatedHost(hosts map[string]aws.Host, placement *awstypes.HostPlacement, zone string) (string, error) {
 	// If install-config has HostPlacements configured, lets check the DedicatedHosts to see if one matches our region & zone.
 	if placement != nil {
+		if *placement.Affinity == awstypes.HostAffinityDedicatedHost && len(placement.DedicatedHost) > 0 {
+			// Expect to find a matching host
+		}
 		// We only support one host ID currently for an instance.  Need to also get host that matches the zone the machines will be put into.
 		for _, host := range placement.DedicatedHost {
 			hostDetails, found := hosts[host.ID]
 			if found && hostDetails.Zone == zone {
-				return hostDetails.ID
+				return hostDetails.ID, nil
 			}
 		}
+		if *placement.Affinity == awstypes.HostAffinityDedicatedHost {
+			return "", fmt.Errorf("no dedicated host found for zone %s", zone)
+		}
 	}
-	return ""
+	return "", nil
 }

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/asset/machines/aws/machines.go` around lines 370 - 383, The DedicatedHost
function currently returns an empty string when placement is non-nil but no host
in placement.DedicatedHost matches the given zone, which causes callers to omit
Placement.Host; change DedicatedHost to propagate an error in that case (e.g.,
return an error alongside the host ID or convert the function to (string,
error)) so callers can fail fast when DedicatedHost affinity is configured but
no matching host is found; update callers to handle the new error return and
surface a clear error message referencing DedicatedHost, placement.DedicatedHost
and the zone.
pkg/types/aws/machinepool.go (1)

238-239: ⚠️ Potential issue | 🟡 Minor

Fix the stale YAML path in the public comment.

The comment references dedicatedHost.host.id but the actual schema path is dedicatedHost[].id (since DedicatedHost is a slice).

📝 Suggested fix
-	// HostAffinityDedicatedHost requires specifying a particular host via dedicatedHost.host.id.
+	// HostAffinityDedicatedHost requires specifying at least one host via dedicatedHost[].id.
 	HostAffinityDedicatedHost HostAffinity = "DedicatedHost"

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/types/aws/machinepool.go` around lines 238 - 239, Update the public
comment for HostAffinityDedicatedHost to correct the stale YAML path: change the
referenced schema path from dedicatedHost.host.id to dedicatedHost[].id
(reflecting that DedicatedHost is a slice) so the comment accurately points to
dedicatedHost[].id.
pkg/types/aws/validation/featuregates.go (1)

28-38: ⚠️ Potential issue | 🟠 Major

Gate hostPlacement on defaultMachinePlatform as well.

This loop only gates compute[*].platform.aws.hostPlacement. If a user sets platform.aws.defaultMachinePlatform.hostPlacement, it will be merged into compute pools via Set() but won't trigger the feature gate check.

Consider adding a check before the loop:

+	// Check if defaultMachinePlatform has dedicated hosts configured
+	if c.Platform.AWS != nil && c.Platform.AWS.DefaultMachinePlatform != nil && c.Platform.AWS.DefaultMachinePlatform.HostPlacement != nil {
+		gatedFeatures = append(gatedFeatures, featuregates.GatedInstallConfigFeature{
+			FeatureGateName: features.FeatureGateAWSDedicatedHosts,
+			Condition:       true,
+			Field:           field.NewPath("platform", "aws", "defaultMachinePlatform", "hostPlacement"),
+		})
+	}

 	// Check if any compute pool has dedicated hosts configured
 	for idx, compute := range c.Compute {

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/types/aws/validation/featuregates.go` around lines 28 - 38, The loop
currently only checks compute pools (c.Compute) for AWS HostPlacement; add a
pre-check for c.Platform.AWS.DefaultMachinePlatform.HostPlacement and, if
non-nil, append the same feature gate (featuregates.GatedInstallConfigFeature
with FeatureGateName features.FeatureGateAWSDedicatedHosts, Condition true)
using field.NewPath("platform","aws","defaultMachinePlatform","hostPlacement")
before iterating compute so the defaultMachinePlatform case is gated (avoid
duplicating the gate by performing this check first and only falling back to the
existing compute loop if the defaultMachinePlatform check is not set).
pkg/types/aws/validation/machinepool_test.go (1)

300-323: ⚠️ Potential issue | 🟡 Minor

Test cases use host IDs that would fail CRD validation.

These tests are named "valid dedicated host - 8 character format" but use IDs like h-1a2b3c4d (8 hex chars) which don't match the kubebuilder validation pattern ^h-[0-9a-f]{17}$ (17 hex chars) defined in pkg/types/aws/machinepool.go.

While these tests pass because the Go validation only checks for empty IDs (the regex is enforced by kubebuilder), the test names are misleading. Consider either:

  1. Using valid 17-character IDs: h-1a2b3c4d5e6f78901
  2. Renaming tests to clarify they only test Go-level validation
🔧 Suggested fix using valid IDs
 		{
-			name: "valid dedicated host - 8 character format",
+			name: "valid dedicated host",
 			pool: &aws.MachinePool{
 				HostPlacement: &aws.HostPlacement{
 					Affinity: ptr.To(aws.HostAffinityDedicatedHost),
 					DedicatedHost: []aws.DedicatedHost{
-						{ID: "h-1a2b3c4d"},
+						{ID: "h-1a2b3c4d5e6f78901"},
 					},
 				},
 			},
 		},
 		{
-			name: "valid dedicated host - 8 character format with zone",
+			name: "valid dedicated host - another example",
 			pool: &aws.MachinePool{
 				HostPlacement: &aws.HostPlacement{
 					Affinity: ptr.To(aws.HostAffinityDedicatedHost),
 					DedicatedHost: []aws.DedicatedHost{
 						{
-							ID: "h-9876abcd",
+							ID: "h-9876abcdef0123456",
 						},
 					},
 				},
 			},
 		},

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/types/aws/validation/machinepool_test.go` around lines 300 - 323, The
test cases named "valid dedicated host - 8 character format" and "valid
dedicated host - 8 character format with zone" use DedicatedHost.ID values like
"h-1a2b3c4d" which do not match the kubebuilder CRD regex ^h-[0-9a-f]{17}$;
update the DedicatedHost.ID fields in those aws.MachinePool test entries (the
structs under HostPlacement -> DedicatedHost -> ID) to use valid
17-hex-character host IDs (e.g., "h-1a2b3c4d5e6f78901") or, if you prefer to
keep short IDs, rename the test names to indicate they only exercise Go-level
(non-CRD) validation rather than CRD-valid IDs.
pkg/types/aws/validation/machinepool.go (1)

105-108: ⚠️ Potential issue | 🟡 Minor

Use field.Forbidden instead of field.Required for AnyAvailable case.

When affinity is AnyAvailable and DedicatedHost is set, the field should be marked as forbidden, not required. The error message says "forbidden otherwise" but uses field.Required.

🐛 Proposed fix
 	case aws.HostAffinityAnyAvailable:
 		if len(p.HostPlacement.DedicatedHost) > 0 {
-			allErrs = append(allErrs, field.Required(fldPath.Child("dedicatedHost"), "dedicatedHost is required when 'affinity' is set to DedicatedHost, and forbidden otherwise"))
+			allErrs = append(allErrs, field.Forbidden(fldPath.Child("dedicatedHost"), "dedicatedHost must not be set when affinity is AnyAvailable"))
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/types/aws/validation/machinepool.go` around lines 105 - 108, The
validation for the aws.HostAffinityAnyAvailable case is using field.Required but
should use field.Forbidden; update the code in the switch case handling
aws.HostAffinityAnyAvailable to append
field.Forbidden(fldPath.Child("dedicatedHost"), "<message>") instead of
field.Required, keeping or clarifying the existing message (e.g., "dedicatedHost
is forbidden when 'affinity' is set to AnyAvailable") so the error matches the
logic applied to allErrs and fldPath.Child("dedicatedHost").
pkg/asset/installconfig/aws/validation.go (1)

592-597: ⚠️ Potential issue | 🟡 Minor

Inconsistent field path in ownership validation error.

The ownership validation error at line 595 uses fldPath (the pool-level path) instead of dhPath (the dedicated host entry path). Other errors in this loop consistently use dhPath for better error localization.

Suggested fix
 				// Check to make sure dedicated host is not owned by another cluster
 				clusterIDs := foundHost.Tags.GetClusterIDs(TagValueOwned)
 				if len(clusterIDs) > 0 {
-					allErrs = append(allErrs, field.Forbidden(fldPath,
+					allErrs = append(allErrs, field.Forbidden(dhPath,
 						fmt.Sprintf("Dedicated host %s is owned by other cluster %v and cannot be used for new installations, another Dedicated Host must be created separately", foundHost.ID, clusterIDs)))
 				}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/asset/installconfig/aws/validation.go` around lines 592 - 597, The
ownership validation appends an error with the wrong field path (uses the
pool-level path variable instead of the dedicated-host entry path), so update
the field.Forbidden call in the dedicated host ownership check to use dhPath
(the dedicated-host specific path) rather than fldPath; locate the block using
variables clusterIDs, foundHost, TagValueOwned, dhPath, allErrs and replace the
path argument passed to field.Forbidden so the error is associated with the
dedicated host entry.
data/data/install.openshift.io_installconfigs.yaml (1)

219-251: ⚠️ Potential issue | 🟠 Major

Reject empty dedicatedHost lists when affinity: DedicatedHost.

Lines 250, 1860, 3410, and 5160 only test field presence, so dedicatedHost: [] still satisfies the CEL rule and an unusable config gets accepted. Add minItems: 1 to the array schema in each copy of this block. If this YAML is generated, the fix should be made in the source schema/marker and then regenerated.

Suggested fix
                           dedicatedHost:
+                            minItems: 1
                             description: |-

Also applies to: 1829-1861, 3379-3411, 5129-5161

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@data/data/install.openshift.io_installconfigs.yaml` around lines 219 - 251,
The CEL validation only checks for presence of dedicatedHost, so an empty array
passes; update the dedicatedHost array schema (the dedicatedHost property
definition that contains items -> properties -> id and the surrounding type:
array) to include minItems: 1 so empty lists are rejected when affinity ==
"DedicatedHost"; apply the same change to each duplicated block (the
dedicatedHost array definitions that accompany the
affinity/x-kubernetes-validations rule referencing affinity == 'DedicatedHost')
or, if generated, fix the source schema/marker and regenerate the YAML.
🧹 Nitpick comments (2)
pkg/types/aws/validation/machinepool_test.go (1)

380-392: The poolName defaulting logic is fragile.

The logic relies on string matching in tc.expected to determine if a test is for control plane or edge pools. This is brittle and could break if error messages change.

Consider making this explicit by always setting poolName in the test cases:

♻️ Suggested approach
 	for _, tc := range cases {
 		t.Run(tc.name, func(t *testing.T) {
 			poolName := tc.poolName
-			// If poolName is not specified, default to worker pool unless the test
-			// is explicitly testing control plane or edge pool behavior
-			if poolName == "" && tc.expected != "" &&
-				!strings.Contains(tc.expected, "control plane") &&
-				!strings.Contains(tc.expected, "edge pool") {
-				poolName = types.MachinePoolComputeRoleName
-			} else if poolName == "" && tc.expected == "" {
+			if poolName == "" {
 				poolName = types.MachinePoolComputeRoleName
 			}

Then update the control plane test case to explicitly set poolName: types.MachinePoolControlPlaneRoleName instead of relying on empty string.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/types/aws/validation/machinepool_test.go` around lines 380 - 392, The
current defaulting logic for poolName in the test loop is brittle because it
inspects tc.expected text; instead make poolName explicit in each test case and
simplify the test setup: remove the string-matching branches around tc.expected
and just set poolName := tc.poolName; if poolName == "" set it to
types.MachinePoolComputeRoleName (default), and update the control plane test
case to set poolName: types.MachinePoolControlPlaneRoleName (and any edge-pool
tests to their explicit names) so ValidateMachinePool calls use explicit pool
names rather than parsing tc.expected.
pkg/asset/machines/worker_test.go (1)

130-155: Add one AWS dedicated-host regression case.

These tests now initialize AWS metadata, but they still never exercise HostPlacement. A case with hostPlacement.affinity=DedicatedHost would cover the new branch in Worker.Generate and catch zone-to-host mapping regressions much earlier.

Also applies to: 189-217

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/asset/machines/worker_test.go` around lines 130 - 155, Add a new test
case in pkg/asset/machines/worker_test.go that exercises AWS HostPlacement by
setting the MachinePool AWS HostPlacement affinity to DedicatedHost (e.g. set
cfg.Compute[0].Platform.AWS.HostPlacement.Affinity =
awstypes.HostAffinityDedicatedHost) before creating the installconfig asset
(installconfig.MakeAsset) and initializing icAsset.AWS; this will force the
Worker.Generate branch that handles HostPlacement and validate zone-to-host
mapping behavior alongside the existing cases (reuse cfg, icAsset.AWS =
icaws.NewMetadata(...), and existing tc variants).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/asset/installconfig/aws/dedicatedhosts.go`:
- Around line 18-19: Update the grammatical typo in the field comment for the
Tags field: change "Hosts's" to "Host's" in the comment above the Tags
declaration (the Tags field in dedicatedhosts.go / the Tags struct field) so it
reads "Tags is the map of the Host's tags." Ensure only the comment text is
modified and no code logic is changed.

In `@pkg/asset/installconfig/aws/validation_test.go`:
- Around line 2431-2443: The function withComputeHostPlacementAndZone accepts a
zone but never uses it; update the function to populate the compute pool's AWS
Zones so the zone parameter is applied: inside withComputeHostPlacementAndZone
(which currently sets HostPlacement on ic.Compute[index].Platform.AWS), also set
ic.Compute[index].Platform.AWS.Zones = []string{zone} (or append if multiple
zones are intended); alternatively, if you don’t need zone semantics, remove
withComputeHostPlacementAndZone and use withComputeHostPlacement to avoid
duplication.

---

Outside diff comments:
In `@pkg/asset/machines/worker.go`:
- Around line 537-555: If dedicated hosts were requested (mpool.HostPlacement !=
nil && mpool.HostPlacement.Affinity != nil && *mpool.HostPlacement.Affinity ==
awstypes.HostAffinityDedicatedHost), after calling
installConfig.AWS.DedicatedHosts(...) and before calling aws.MachineSets(...),
validate that every AZ in mpool.Zones has a corresponding entry in the returned
dHosts map (or that aws.DedicatedHost(...) would not return an empty string for
that AZ); if any AZ is missing a dedicated host, return an error (fmt.Errorf)
describing the specific AZ(s) with no matching dedicated host so we fail fast
instead of producing MachineSets with empty DedicatedHost values.

---

Duplicate comments:
In `@data/data/install.openshift.io_installconfigs.yaml`:
- Around line 219-251: The CEL validation only checks for presence of
dedicatedHost, so an empty array passes; update the dedicatedHost array schema
(the dedicatedHost property definition that contains items -> properties -> id
and the surrounding type: array) to include minItems: 1 so empty lists are
rejected when affinity == "DedicatedHost"; apply the same change to each
duplicated block (the dedicatedHost array definitions that accompany the
affinity/x-kubernetes-validations rule referencing affinity == 'DedicatedHost')
or, if generated, fix the source schema/marker and regenerate the YAML.

In `@pkg/asset/installconfig/aws/validation.go`:
- Around line 592-597: The ownership validation appends an error with the wrong
field path (uses the pool-level path variable instead of the dedicated-host
entry path), so update the field.Forbidden call in the dedicated host ownership
check to use dhPath (the dedicated-host specific path) rather than fldPath;
locate the block using variables clusterIDs, foundHost, TagValueOwned, dhPath,
allErrs and replace the path argument passed to field.Forbidden so the error is
associated with the dedicated host entry.

In `@pkg/asset/machines/aws/machines.go`:
- Around line 370-383: The DedicatedHost function currently returns an empty
string when placement is non-nil but no host in placement.DedicatedHost matches
the given zone, which causes callers to omit Placement.Host; change
DedicatedHost to propagate an error in that case (e.g., return an error
alongside the host ID or convert the function to (string, error)) so callers can
fail fast when DedicatedHost affinity is configured but no matching host is
found; update callers to handle the new error return and surface a clear error
message referencing DedicatedHost, placement.DedicatedHost and the zone.

In `@pkg/types/aws/machinepool.go`:
- Around line 238-239: Update the public comment for HostAffinityDedicatedHost
to correct the stale YAML path: change the referenced schema path from
dedicatedHost.host.id to dedicatedHost[].id (reflecting that DedicatedHost is a
slice) so the comment accurately points to dedicatedHost[].id.

In `@pkg/types/aws/validation/featuregates.go`:
- Around line 28-38: The loop currently only checks compute pools (c.Compute)
for AWS HostPlacement; add a pre-check for
c.Platform.AWS.DefaultMachinePlatform.HostPlacement and, if non-nil, append the
same feature gate (featuregates.GatedInstallConfigFeature with FeatureGateName
features.FeatureGateAWSDedicatedHosts, Condition true) using
field.NewPath("platform","aws","defaultMachinePlatform","hostPlacement") before
iterating compute so the defaultMachinePlatform case is gated (avoid duplicating
the gate by performing this check first and only falling back to the existing
compute loop if the defaultMachinePlatform check is not set).

In `@pkg/types/aws/validation/machinepool_test.go`:
- Around line 300-323: The test cases named "valid dedicated host - 8 character
format" and "valid dedicated host - 8 character format with zone" use
DedicatedHost.ID values like "h-1a2b3c4d" which do not match the kubebuilder CRD
regex ^h-[0-9a-f]{17}$; update the DedicatedHost.ID fields in those
aws.MachinePool test entries (the structs under HostPlacement -> DedicatedHost
-> ID) to use valid 17-hex-character host IDs (e.g., "h-1a2b3c4d5e6f78901") or,
if you prefer to keep short IDs, rename the test names to indicate they only
exercise Go-level (non-CRD) validation rather than CRD-valid IDs.

In `@pkg/types/aws/validation/machinepool.go`:
- Around line 105-108: The validation for the aws.HostAffinityAnyAvailable case
is using field.Required but should use field.Forbidden; update the code in the
switch case handling aws.HostAffinityAnyAvailable to append
field.Forbidden(fldPath.Child("dedicatedHost"), "<message>") instead of
field.Required, keeping or clarifying the existing message (e.g., "dedicatedHost
is forbidden when 'affinity' is set to AnyAvailable") so the error matches the
logic applied to allErrs and fldPath.Child("dedicatedHost").

---

Nitpick comments:
In `@pkg/asset/machines/worker_test.go`:
- Around line 130-155: Add a new test case in pkg/asset/machines/worker_test.go
that exercises AWS HostPlacement by setting the MachinePool AWS HostPlacement
affinity to DedicatedHost (e.g. set
cfg.Compute[0].Platform.AWS.HostPlacement.Affinity =
awstypes.HostAffinityDedicatedHost) before creating the installconfig asset
(installconfig.MakeAsset) and initializing icAsset.AWS; this will force the
Worker.Generate branch that handles HostPlacement and validate zone-to-host
mapping behavior alongside the existing cases (reuse cfg, icAsset.AWS =
icaws.NewMetadata(...), and existing tc variants).

In `@pkg/types/aws/validation/machinepool_test.go`:
- Around line 380-392: The current defaulting logic for poolName in the test
loop is brittle because it inspects tc.expected text; instead make poolName
explicit in each test case and simplify the test setup: remove the
string-matching branches around tc.expected and just set poolName :=
tc.poolName; if poolName == "" set it to types.MachinePoolComputeRoleName
(default), and update the control plane test case to set poolName:
types.MachinePoolControlPlaneRoleName (and any edge-pool tests to their explicit
names) so ValidateMachinePool calls use explicit pool names rather than parsing
tc.expected.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e4f9a69f-8259-46bd-9425-e15a45a67d9b

📥 Commits

Reviewing files that changed from the base of the PR and between 7155f70 and e7dcc9e.

⛔ Files ignored due to path filters (1)
  • pkg/types/aws/zz_generated.deepcopy.go is excluded by !**/zz_generated*
📒 Files selected for processing (20)
  • data/data/install.openshift.io_installconfigs.yaml
  • pkg/asset/cluster/aws/aws.go
  • pkg/asset/installconfig/aws/dedicatedhosts.go
  • pkg/asset/installconfig/aws/metadata.go
  • pkg/asset/installconfig/aws/permissions.go
  • pkg/asset/installconfig/aws/validation.go
  • pkg/asset/installconfig/aws/validation_test.go
  • pkg/asset/machines/aws/machines.go
  • pkg/asset/machines/aws/machinesets.go
  • pkg/asset/machines/worker.go
  • pkg/asset/machines/worker_test.go
  • pkg/destroy/aws/ec2helpers.go
  • pkg/types/aws/machinepool.go
  • pkg/types/aws/validation/featuregates.go
  • pkg/types/aws/validation/featuregates_test.go
  • pkg/types/aws/validation/machinepool.go
  • pkg/types/aws/validation/machinepool_test.go
  • pkg/types/aws/validation/platform.go
  • pkg/types/machinepools.go
  • pkg/types/validation/machinepools.go
🚧 Files skipped from review as they are similar to previous changes (5)
  • pkg/asset/installconfig/aws/metadata.go
  • pkg/types/aws/validation/featuregates_test.go
  • pkg/types/validation/machinepools.go
  • pkg/types/aws/validation/platform.go
  • pkg/asset/cluster/aws/aws.go

Comment thread pkg/asset/installconfig/aws/dedicatedhosts.go Outdated
Comment on lines +2431 to +2443
func (icBuild icBuildForAWS) withComputeHostPlacementAndZone(hostIDs []string, zone string, index int) icOption {
return func(ic *types.InstallConfig) {
aff := aws.HostAffinityDedicatedHost
dhs := make([]aws.DedicatedHost, 0, len(hostIDs))
for _, id := range hostIDs {
dhs = append(dhs, aws.DedicatedHost{ID: id})
}
ic.Compute[index].Platform.AWS.HostPlacement = &aws.HostPlacement{
Affinity: &aff,
DedicatedHost: dhs,
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unused zone parameter in withComputeHostPlacementAndZone.

The zone parameter is accepted but never used in the function body. The function is identical to withComputeHostPlacement. Either the zone should be used (e.g., to set ic.Compute[index].Platform.AWS.Zones) or this method should be removed in favor of using withComputeHostPlacement directly.

Suggested fix (if zone should set compute pool zones)
 func (icBuild icBuildForAWS) withComputeHostPlacementAndZone(hostIDs []string, zone string, index int) icOption {
 	return func(ic *types.InstallConfig) {
 		aff := aws.HostAffinityDedicatedHost
 		dhs := make([]aws.DedicatedHost, 0, len(hostIDs))
 		for _, id := range hostIDs {
 			dhs = append(dhs, aws.DedicatedHost{ID: id})
 		}
 		ic.Compute[index].Platform.AWS.HostPlacement = &aws.HostPlacement{
 			Affinity:      &aff,
 			DedicatedHost: dhs,
 		}
+		// Set the zone for the compute pool if specified
+		if zone != "" {
+			ic.Compute[index].Platform.AWS.Zones = []string{zone}
+		}
 	}
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (icBuild icBuildForAWS) withComputeHostPlacementAndZone(hostIDs []string, zone string, index int) icOption {
return func(ic *types.InstallConfig) {
aff := aws.HostAffinityDedicatedHost
dhs := make([]aws.DedicatedHost, 0, len(hostIDs))
for _, id := range hostIDs {
dhs = append(dhs, aws.DedicatedHost{ID: id})
}
ic.Compute[index].Platform.AWS.HostPlacement = &aws.HostPlacement{
Affinity: &aff,
DedicatedHost: dhs,
}
}
}
func (icBuild icBuildForAWS) withComputeHostPlacementAndZone(hostIDs []string, zone string, index int) icOption {
return func(ic *types.InstallConfig) {
aff := aws.HostAffinityDedicatedHost
dhs := make([]aws.DedicatedHost, 0, len(hostIDs))
for _, id := range hostIDs {
dhs = append(dhs, aws.DedicatedHost{ID: id})
}
ic.Compute[index].Platform.AWS.HostPlacement = &aws.HostPlacement{
Affinity: &aff,
DedicatedHost: dhs,
}
// Set the zone for the compute pool if specified
if zone != "" {
ic.Compute[index].Platform.AWS.Zones = []string{zone}
}
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/asset/installconfig/aws/validation_test.go` around lines 2431 - 2443, The
function withComputeHostPlacementAndZone accepts a zone but never uses it;
update the function to populate the compute pool's AWS Zones so the zone
parameter is applied: inside withComputeHostPlacementAndZone (which currently
sets HostPlacement on ic.Compute[index].Platform.AWS), also set
ic.Compute[index].Platform.AWS.Zones = []string{zone} (or append if multiple
zones are intended); alternatively, if you don’t need zone semantics, remove
withComputeHostPlacementAndZone and use withComputeHostPlacement to avoid
duplication.

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Mar 11, 2026

@vr4manta: This pull request references SPLAT-2172 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

SPLAT-2172

Changes

  • Bumped openshift/api
  • Added logic to allow HostPlacement of dedicated hosts

Dependencies

Notes

MAO and CAO changes are needed for it to fully work. For now, this PR is adding the ability to generate the needed outputs for bootstrapping.

Summary by CodeRabbit

  • New Features

  • Introduced hostPlacement (affinity + dedicated host selection) across platform machine platforms to support AWS dedicated-host placement.

  • Improvements

  • Added install-time validations and feature-gate activation for dedicated-host usage, including region/zone and role constraints.

  • Threaded dedicated-host info into machine set provisioning, tagging during install, and release handling during cleanup.

  • Added required permissions for dedicated-host operations when applicable.

  • Tests

  • Added tests covering hostPlacement scenarios and feature-gate behavior.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (6)
pkg/asset/installconfig/aws/validation_test.go (1)

2431-2443: ⚠️ Potential issue | 🟡 Minor

Unused zone parameter in withComputeHostPlacementAndZone.

The zone parameter is accepted but never used—this function is identical to withComputeHostPlacement. Either use the zone (e.g., to set ic.Compute[index].Platform.AWS.Zones) or remove this method and use withComputeHostPlacement directly.

Looking at the test case at line 1593 that uses this method, the intent seems to be testing a mismatch between pool zones and host zones, but the zone parameter doesn't actually set the compute pool's zones.

Suggested fix (if zone should set compute pool zones)
 func (icBuild icBuildForAWS) withComputeHostPlacementAndZone(hostIDs []string, zone string, index int) icOption {
 	return func(ic *types.InstallConfig) {
 		aff := aws.HostAffinityDedicatedHost
 		dhs := make([]aws.DedicatedHost, 0, len(hostIDs))
 		for _, id := range hostIDs {
 			dhs = append(dhs, aws.DedicatedHost{ID: id})
 		}
 		ic.Compute[index].Platform.AWS.HostPlacement = &aws.HostPlacement{
 			Affinity:      &aff,
 			DedicatedHost: dhs,
 		}
+		// Ensure the zone is associated with the host placement
+		if zone != "" && !slices.Contains(ic.Compute[index].Platform.AWS.Zones, zone) {
+			ic.Compute[index].Platform.AWS.Zones = append(ic.Compute[index].Platform.AWS.Zones, zone)
+		}
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/asset/installconfig/aws/validation_test.go` around lines 2431 - 2443, The
helper withComputeHostPlacementAndZone currently ignores its zone parameter;
update it so that after setting HostPlacement it also assigns the provided zone
to the compute pool's AWS Zones (e.g., set ic.Compute[index].Platform.AWS.Zones
= []string{zone}) so tests that expect a mismatch between pool zones and host
zones are valid; alternatively, if the zone is not needed, remove
withComputeHostPlacementAndZone and use the existing withComputeHostPlacement
helper instead.
pkg/asset/installconfig/aws/permissions.go (1)

793-805: ⚠️ Potential issue | 🟡 Minor

Helper only checks pools named worker, missing other compute pools.

includesDedicatedHosts skips pools where mpool.Name != types.MachinePoolComputeRoleName ("worker"). This means additional compute pools with host placement (which are valid per tests in featuregates_test.go) won't trigger the dedicated host permissions.

The condition should skip only edge pools, not non-worker compute pools:

Suggested fix
 func includesDedicatedHosts(installConfig *types.InstallConfig) bool {
 	for _, mpool := range installConfig.Compute {
-		if mpool.Name != types.MachinePoolComputeRoleName {
+		if mpool.Name == types.MachinePoolEdgeRoleName {
 			continue
 		}
 		if mpool.Platform.AWS != nil && mpool.Platform.AWS.HostPlacement != nil {
 			return true
 		}
 	}
 	return false
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/asset/installconfig/aws/permissions.go` around lines 793 - 805,
includesDedicatedHosts only inspects pools named
types.MachinePoolComputeRoleName ("worker"), so other compute pools with
AWS.HostPlacement are ignored; change the logic in includesDedicatedHosts to
skip only edge pools (check for types.MachinePoolEdgeRoleName) instead of
skipping non-worker names, i.e., iterate all installConfig.Compute entries,
continue when mpool.Name == types.MachinePoolEdgeRoleName, and keep the existing
AWS/HostPlacement nil checks to return true when any non-edge pool specifies
dedicated hosts.
pkg/asset/installconfig/aws/validation.go (1)

569-581: ⚠️ Potential issue | 🟠 Major

Potential AZ format mismatch when pool.Zones is empty.

When pool.Zones is empty, meta.AvailabilityZones(ctx) is called as a fallback. In this codebase, availability zones from metadata can be short form (a, b, c) while foundHost.Zone from AWS is full form (us-east-1a). The slices.Contains check at line 578 would fail even for valid hosts.

Also, the error message at line 579 shows pool.Zones which may be empty when the fallback was used, making the error confusing.

Suggested fix: Only validate when zones are explicitly configured
 				// Is host valid for pools region and zone config?
 				// Only check if zones are explicitly configured; if pool.Zones is empty, all zones are allowed
-				zones := pool.Zones
-				if len(zones) == 0 {
-					zones, err = meta.AvailabilityZones(ctx)
-					if err != nil {
-						allErrs = append(allErrs, field.InternalError(fldPath, fmt.Errorf("unable to retrieve availability zones: %w", err)))
-					}
-				}
-				if !slices.Contains(zones, foundHost.Zone) {
-					errMsg := fmt.Sprintf("machine pool specifies zones %v but dedicated host %s is in zone %s", pool.Zones, host.ID, foundHost.Zone)
+				if len(pool.Zones) > 0 && !slices.Contains(pool.Zones, foundHost.Zone) {
+					errMsg := fmt.Sprintf("machine pool specifies zones %v but dedicated host %s is in zone %s", pool.Zones, host.ID, foundHost.Zone)
 					allErrs = append(allErrs, field.Invalid(dhPath, host, errMsg))
 				}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/asset/installconfig/aws/validation.go` around lines 569 - 581, The
validation currently uses meta.AvailabilityZones(ctx) when pool.Zones is empty,
causing mismatches between short AZs (a,b) and full zones (us-east-1a) and
producing a confusing error that prints an empty pool.Zones; change the logic to
only perform the dedicated-host zone membership check when pool.Zones is
explicitly configured (i.e. len(pool.Zones) > 0) so that
slices.Contains(pool.Zones, foundHost.Zone) is used only for user-provided
zones, and update the field.Invalid call (dhPath, host, ...) to include the
actual configured zones (pool.Zones) in the message rather than any metadata
fallback.
pkg/types/aws/machinepool.go (1)

238-239: ⚠️ Potential issue | 🟡 Minor

Stale YAML path in doc comment.

The comment references dedicatedHost.host.id but the actual schema field is dedicatedHost[].id. This will propagate to generated docs.

Suggested fix
-	// HostAffinityDedicatedHost requires specifying a particular host via dedicatedHost.host.id.
+	// HostAffinityDedicatedHost requires specifying at least one host via dedicatedHost[].id.
 	HostAffinityDedicatedHost HostAffinity = "DedicatedHost"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/types/aws/machinepool.go` around lines 238 - 239, The doc comment for the
HostAffinityDedicatedHost constant is using a stale YAML path; update the
comment text that currently references "dedicatedHost.host.id" to the correct
schema path "dedicatedHost[].id" so generated docs reflect the real field;
locate the constant HostAffinityDedicatedHost (type HostAffinity, value
"DedicatedHost") and edit its preceding comment accordingly.
pkg/asset/machines/aws/machines.go (1)

370-382: ⚠️ Potential issue | 🟠 Major

Silent fallback when no host matches the zone may mask configuration errors.

When placement != nil but no host in placement.DedicatedHost matches the requested zone, this function returns an empty string. This causes the provider to generate an instance without dedicated host placement, silently falling back to shared tenancy.

For a pool explicitly configured with Affinity == DedicatedHost, this could launch instances without the intended dedicated host isolation. While upstream validation should catch zone mismatches, consider returning an error (or at least a secondary boolean indicating whether a match was found) to surface configuration issues more explicitly.

🔧 Suggested approach

Consider changing the signature to return (string, bool) or (string, error) to indicate whether a valid host was found:

-func DedicatedHost(hosts map[string]aws.Host, placement *awstypes.HostPlacement, zone string) string {
+func DedicatedHost(hosts map[string]aws.Host, placement *awstypes.HostPlacement, zone string) (string, error) {
     if placement != nil {
         for _, host := range placement.DedicatedHost {
             hostDetails, found := hosts[host.ID]
             if found && hostDetails.Zone == zone {
-                return hostDetails.ID
+                return hostDetails.ID, nil
             }
         }
+        // Placement configured but no host matches zone
+        return "", fmt.Errorf("no dedicated host found for zone %s", zone)
     }
-    return ""
+    return "", nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/asset/machines/aws/machines.go` around lines 370 - 382, The DedicatedHost
function currently returns an empty string on no-match which silently falls back
to shared tenancy; change DedicatedHost(hosts map[string]aws.Host, placement
*awstypes.HostPlacement, zone string) to return a result indicator (either
(string, bool) or (string, error)) so callers can detect no-match. Update the
function to return the matched host ID and true (or id, nil) when a host.Zone ==
zone is found, and to return ("", false) or ("", descriptiveError) when
placement != nil but no host matches; then update all callers of DedicatedHost
to handle the new boolean/error return and fail fast or surface a clear
configuration error when a dedicated-affinity pool did not find a host.
pkg/types/aws/validation/machinepool.go (1)

104-108: ⚠️ Potential issue | 🟡 Minor

Misleading error message for forbidden dedicatedHost field.

When Affinity is AnyAvailable and DedicatedHost entries are present, the error message says:

"dedicatedHost is required when 'affinity' is set to DedicatedHost, and forbidden otherwise"

This message is more appropriate for the DedicatedHost affinity case (line 111). For AnyAvailable, use field.Forbidden with a clearer message:

🐛 Proposed fix
 	case aws.HostAffinityAnyAvailable:
 		if len(p.HostPlacement.DedicatedHost) > 0 {
-			allErrs = append(allErrs, field.Required(fldPath.Child("dedicatedHost"), "dedicatedHost is required when 'affinity' is set to DedicatedHost, and forbidden otherwise"))
+			allErrs = append(allErrs, field.Forbidden(fldPath.Child("dedicatedHost"), "dedicatedHost must not be set when affinity is AnyAvailable"))
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/types/aws/validation/machinepool.go` around lines 104 - 108, The current
validation branch for aws.HostAffinityAnyAvailable incorrectly appends a
field.Required error with a misleading message; update the branch handling
p.HostPlacement.Affinity == aws.HostAffinityAnyAvailable to append
field.Forbidden(fldPath.Child("dedicatedHost"), "<clear message>") instead,
e.g., "dedicatedHost must not be set when affinity is AnyAvailable", so replace
the field.Required call that references p.HostPlacement.DedicatedHost with
field.Forbidden(...) and ensure the error targets
fldPath.Child("dedicatedHost").
🧹 Nitpick comments (2)
pkg/types/aws/validation/machinepool_test.go (2)

382-391: Complex test defaulting logic could be simplified.

The logic to determine poolName based on error message content is fragile and harder to maintain. Consider explicitly setting poolName in each test case that needs a specific value, rather than inferring from the expected error string.

♻️ Simpler approach

Set poolName: types.MachinePoolComputeRoleName explicitly in test cases that require it, and remove the inference logic:

 	for _, tc := range cases {
 		t.Run(tc.name, func(t *testing.T) {
 			poolName := tc.poolName
-			// If poolName is not specified, default to worker pool unless the test
-			// is explicitly testing control plane or edge pool behavior
-			if poolName == "" && tc.expected != "" &&
-				!strings.Contains(tc.expected, "control plane") &&
-				!strings.Contains(tc.expected, "edge pool") {
-				poolName = types.MachinePoolComputeRoleName
-			} else if poolName == "" && tc.expected == "" {
+			if poolName == "" {
 				poolName = types.MachinePoolComputeRoleName
 			}

Then explicitly set poolName: "" for the control plane test case (already done on line 363).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/types/aws/validation/machinepool_test.go` around lines 382 - 391, The
test's fragile inference for poolName should be removed and tests should
explicitly set the poolName in their test cases; replace the conditional block
that mutates poolName based on tc.expected and strings.Contains with no-op, and
update each test case struct to include poolName:
types.MachinePoolComputeRoleName where the compute/worker default is required
and poolName: "" for control-plane/edge tests; this makes the setup
deterministic and you can delete the logic that inspects tc.expected and
strings.Contains.

300-323: Misleading test names suggest format validation that doesn't exist.

These test cases are named "valid dedicated host - 8 character format" but use IDs like h-1a2b3c4d (10 chars total) which don't match the AWS format requirement documented in pkg/types/aws/machinepool.go:

  • Pattern: ^h-[0-9a-f]{17}$
  • Length: exactly 19 characters

The tests pass because validateHostPlacement only checks len(host.ID) == 0 and doesn't validate the format. The kubebuilder annotations provide CRD-level validation, but the Go validation is lenient.

Consider either:

  1. Rename tests to clarify they test "non-empty ID" rather than "valid format"
  2. Add format validation in validateHostPlacement to match the documented constraints
♻️ Option 1: Rename tests to reflect actual validation
-		{
-			name: "valid dedicated host - 8 character format",
+		{
+			name: "valid dedicated host - non-empty ID",
			pool: &aws.MachinePool{
				HostPlacement: &aws.HostPlacement{
					Affinity: ptr.To(aws.HostAffinityDedicatedHost),
					DedicatedHost: []aws.DedicatedHost{
-						{ID: "h-1a2b3c4d"},
+						{ID: "h-09dcf61cb388b0149"},
					},
				},
			},
		},
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/asset/installconfig/aws/permissions.go`:
- Around line 581-583: RequiredPermissionGroups currently appends
PermissionDedicatedHosts for any HostPlacement via includesDedicatedHosts(ic)
but must also append PermissionDynamicHostAllocation only when the
HostPlacement's affinity is HostAffinityAnyAvailable (dynamically allocated
hosts that require ec2:ReleaseHosts). Update the logic in
RequiredPermissionGroups to inspect ic.Platform.AWS.HostPlacement (or the
function that reads HostPlacement used by includesDedicatedHosts) and: keep
appending PermissionDedicatedHosts when HostPlacement is present, and
additionally append PermissionDynamicHostAllocation only when
HostPlacement.Affinity == HostAffinityAnyAvailable; do not add
PermissionDynamicHostAllocation for HostAffinityDedicatedHost.

In `@pkg/types/aws/machinepool.go`:
- Around line 250-251: The ID field in the machine pool struct is marked
+required but its JSON tag uses omitempty; update the struct's ID field (the
symbol "ID" in the machine pool struct in machinepool.go) to remove omitempty so
the tag reads `json:"id"` (keep the `+required` comment intact) to match other
required fields like the one in platform.go.

---

Duplicate comments:
In `@pkg/asset/installconfig/aws/permissions.go`:
- Around line 793-805: includesDedicatedHosts only inspects pools named
types.MachinePoolComputeRoleName ("worker"), so other compute pools with
AWS.HostPlacement are ignored; change the logic in includesDedicatedHosts to
skip only edge pools (check for types.MachinePoolEdgeRoleName) instead of
skipping non-worker names, i.e., iterate all installConfig.Compute entries,
continue when mpool.Name == types.MachinePoolEdgeRoleName, and keep the existing
AWS/HostPlacement nil checks to return true when any non-edge pool specifies
dedicated hosts.

In `@pkg/asset/installconfig/aws/validation_test.go`:
- Around line 2431-2443: The helper withComputeHostPlacementAndZone currently
ignores its zone parameter; update it so that after setting HostPlacement it
also assigns the provided zone to the compute pool's AWS Zones (e.g., set
ic.Compute[index].Platform.AWS.Zones = []string{zone}) so tests that expect a
mismatch between pool zones and host zones are valid; alternatively, if the zone
is not needed, remove withComputeHostPlacementAndZone and use the existing
withComputeHostPlacement helper instead.

In `@pkg/asset/installconfig/aws/validation.go`:
- Around line 569-581: The validation currently uses meta.AvailabilityZones(ctx)
when pool.Zones is empty, causing mismatches between short AZs (a,b) and full
zones (us-east-1a) and producing a confusing error that prints an empty
pool.Zones; change the logic to only perform the dedicated-host zone membership
check when pool.Zones is explicitly configured (i.e. len(pool.Zones) > 0) so
that slices.Contains(pool.Zones, foundHost.Zone) is used only for user-provided
zones, and update the field.Invalid call (dhPath, host, ...) to include the
actual configured zones (pool.Zones) in the message rather than any metadata
fallback.

In `@pkg/asset/machines/aws/machines.go`:
- Around line 370-382: The DedicatedHost function currently returns an empty
string on no-match which silently falls back to shared tenancy; change
DedicatedHost(hosts map[string]aws.Host, placement *awstypes.HostPlacement, zone
string) to return a result indicator (either (string, bool) or (string, error))
so callers can detect no-match. Update the function to return the matched host
ID and true (or id, nil) when a host.Zone == zone is found, and to return ("",
false) or ("", descriptiveError) when placement != nil but no host matches; then
update all callers of DedicatedHost to handle the new boolean/error return and
fail fast or surface a clear configuration error when a dedicated-affinity pool
did not find a host.

In `@pkg/types/aws/machinepool.go`:
- Around line 238-239: The doc comment for the HostAffinityDedicatedHost
constant is using a stale YAML path; update the comment text that currently
references "dedicatedHost.host.id" to the correct schema path
"dedicatedHost[].id" so generated docs reflect the real field; locate the
constant HostAffinityDedicatedHost (type HostAffinity, value "DedicatedHost")
and edit its preceding comment accordingly.

In `@pkg/types/aws/validation/machinepool.go`:
- Around line 104-108: The current validation branch for
aws.HostAffinityAnyAvailable incorrectly appends a field.Required error with a
misleading message; update the branch handling p.HostPlacement.Affinity ==
aws.HostAffinityAnyAvailable to append
field.Forbidden(fldPath.Child("dedicatedHost"), "<clear message>") instead,
e.g., "dedicatedHost must not be set when affinity is AnyAvailable", so replace
the field.Required call that references p.HostPlacement.DedicatedHost with
field.Forbidden(...) and ensure the error targets
fldPath.Child("dedicatedHost").

---

Nitpick comments:
In `@pkg/types/aws/validation/machinepool_test.go`:
- Around line 382-391: The test's fragile inference for poolName should be
removed and tests should explicitly set the poolName in their test cases;
replace the conditional block that mutates poolName based on tc.expected and
strings.Contains with no-op, and update each test case struct to include
poolName: types.MachinePoolComputeRoleName where the compute/worker default is
required and poolName: "" for control-plane/edge tests; this makes the setup
deterministic and you can delete the logic that inspects tc.expected and
strings.Contains.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b7471aa8-4691-4293-aa40-f9992fca7013

📥 Commits

Reviewing files that changed from the base of the PR and between e7dcc9e and e6282c4.

⛔ Files ignored due to path filters (1)
  • pkg/types/aws/zz_generated.deepcopy.go is excluded by !**/zz_generated*
📒 Files selected for processing (20)
  • data/data/install.openshift.io_installconfigs.yaml
  • pkg/asset/cluster/aws/aws.go
  • pkg/asset/installconfig/aws/dedicatedhosts.go
  • pkg/asset/installconfig/aws/metadata.go
  • pkg/asset/installconfig/aws/permissions.go
  • pkg/asset/installconfig/aws/validation.go
  • pkg/asset/installconfig/aws/validation_test.go
  • pkg/asset/machines/aws/machines.go
  • pkg/asset/machines/aws/machinesets.go
  • pkg/asset/machines/worker.go
  • pkg/asset/machines/worker_test.go
  • pkg/destroy/aws/ec2helpers.go
  • pkg/types/aws/machinepool.go
  • pkg/types/aws/validation/featuregates.go
  • pkg/types/aws/validation/featuregates_test.go
  • pkg/types/aws/validation/machinepool.go
  • pkg/types/aws/validation/machinepool_test.go
  • pkg/types/aws/validation/platform.go
  • pkg/types/machinepools.go
  • pkg/types/validation/machinepools.go
✅ Files skipped from review due to trivial changes (1)
  • pkg/asset/installconfig/aws/metadata.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • pkg/asset/installconfig/aws/dedicatedhosts.go
  • pkg/types/aws/validation/featuregates_test.go
  • pkg/destroy/aws/ec2helpers.go
  • pkg/types/aws/validation/featuregates.go

Comment on lines +581 to +583
if includesDedicatedHosts(ic) {
permissionGroups = append(permissionGroups, PermissionDedicatedHosts)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if PermissionDynamicHostAllocation is used elsewhere
rg -n "PermissionDynamicHostAllocation" --type go

Repository: openshift/installer

Length of output: 473


🏁 Script executed:

#!/bin/bash
# Get context around line 442 where PermissionDynamicHostAllocation is used
sed -n '435,450p' pkg/asset/installconfig/aws/permissions.go

# Also check the PermissionDedicatedHosts definition
rg -n "PermissionDedicatedHosts" --type go -A 2 -B 2

Repository: openshift/installer

Length of output: 2301


🏁 Script executed:

#!/bin/bash
# Check the includesDedicatedHosts function to understand what it checks
rg -n "func includesDedicatedHosts" --type go -A 10

# Check how RequiredPermissionGroups is used
rg -n "RequiredPermissionGroups" --type go -B 2 -A 2

Repository: openshift/installer

Length of output: 31117


🏁 Script executed:

#!/bin/bash
# Search for dedicated hosts tests
rg -n "DedicatedHosts" pkg/asset/installconfig/aws/permissions_test.go -B 3 -A 10

# Also check if there's any logic about dynamic host allocation
rg -n "ReleaseHosts\|DynamicHostAllocation" pkg/asset/installconfig/aws/ -B 2 -A 2

Repository: openshift/installer

Length of output: 45


🏁 Script executed:

#!/bin/bash
# Search more broadly for tests and usage of dedicated hosts
rg -n "includesDedicatedHosts" pkg/asset/installconfig/aws/ -B 2 -A 2

# Check the RequiredPermissionGroups function more carefully
sed -n '500,650p' pkg/asset/installconfig/aws/permissions.go

Repository: openshift/installer

Length of output: 5696


🏁 Script executed:

#!/bin/bash
# Look for any logic about dynamic host allocation in the codebase
rg -n "dynamic.*host\|allocat.*host" --type go -i pkg/asset/installconfig/aws/

# Also search for any comments or code that explains the relationship
grep -rn "ReleaseHosts\|dynamic" pkg/asset/installconfig/aws/permissions.go

Repository: openshift/installer

Length of output: 490


🏁 Script executed:

#!/bin/bash
# Search for any test files that might test dedicated hosts scenarios
find . -name "*test.go" -type f | xargs grep -l "DedicatedHosts\|dedicat" | head -5

# Check git history or comments around the PermissionDynamicHostAllocation to understand when it was added
rg -n "PermissionDynamicHostAllocation" pkg/asset/installconfig/aws/permissions.go -B 5 -A 5

Repository: openshift/installer

Length of output: 1372


🏁 Script executed:

#!/bin/bash
# Check where RequiredPermissionGroups is used to confirm it's for both install and delete
rg -n "RequiredPermissionGroups" pkg/asset/ -B 1 -A 1 | head -40

# Also check if there's any code that allocates dynamic hosts
rg -n "allocat\|AllocateHosts" pkg/ --type go -i | head -20

Repository: openshift/installer

Length of output: 3170


🏁 Script executed:

#!/bin/bash
# Search for any code that uses HostPlacement or allocates hosts
rg -n "HostPlacement\|allocat" pkg/asset/installconfig/aws/ --type go -i -B 2 -A 2

# Check if there are any tests that specifically test the dedicated hosts permission
rg -n "PermissionDedicatedHosts\|PermissionDynamicHostAllocation" pkg/asset/installconfig/aws/permissions_test.go -B 3 -A 3

Repository: openshift/installer

Length of output: 45


🏁 Script executed:

#!/bin/bash
# Search for dedicated hosts in validation tests
rg -n "HostPlacement" pkg/ --type go -B 2 -A 2 | head -60

# Also check if there's any documentation or comments about when dynamic hosts are allocated
rg -n "HostPlacement\|DedicatedHost" pkg/types/aws/ --type go -B 2 -A 2

Repository: openshift/installer

Length of output: 4394


🏁 Script executed:

#!/bin/bash
# Check the HostPlacement struct definition to understand dedicated vs dynamic hosts
rg -n "type HostPlacement\|DedicatedHost\|Affinity" pkg/types/aws/machinepool.go -B 1 -A 5 | head -80

# Also check if there's any code that creates/releases dedicated hosts
rg -n "AllocateHosts\|ReleaseHosts" pkg/ --type go -B 2 -A 2

Repository: openshift/installer

Length of output: 45


🏁 Script executed:

#!/bin/bash
# Find the HostPlacement struct definition
find . -name "*.go" -type f | xargs grep -l "type HostPlacement struct" | head -3

# Then get its definition
rg -n "type HostPlacement struct" --type go -A 20

# Also search for what determines dynamic host allocation
rg -n "Affinity\|DedicatedHost" pkg/types/aws/ --type go | head -30

Repository: openshift/installer

Length of output: 4548


🏁 Script executed:

#!/bin/bash
# Check what HostAffinity constants are defined
rg -n "HostAffinity\|AnyAvailable\|DedicatedHost" pkg/types/aws/machinepool.go | head -30

# Look for any code that handles destroy/cleanup of hosts
rg -n "destroy\|cleanup\|release" pkg/asset/installconfig/aws/permissions.go -i -B 2 -A 2

Repository: openshift/installer

Length of output: 45


🏁 Script executed:

#!/bin/bash
# Check the HostAffinity constants
rg -n "const (" pkg/types/aws/machinepool.go -A 20 | grep -E "Affinity|Host|const"

# Also check the comment about when dynamic hosts are allocated in permissions.go
sed -n '438,445p' pkg/asset/installconfig/aws/permissions.go

# Let me check if there's any PR or documentation about this
git log --oneline -n 20 -- pkg/asset/installconfig/aws/permissions.go 2>/dev/null | head -10

Repository: openshift/installer

Length of output: 591


🏁 Script executed:

#!/bin/bash
# Check the HostAffinity constants more carefully
sed -n '230,245p' pkg/types/aws/machinepool.go

# Also check the git log more carefully
git log --oneline -n 30 -- pkg/asset/installconfig/aws/permissions.go 2>/dev/null

Repository: openshift/installer

Length of output: 857


🏁 Script executed:

#!/bin/bash
# Now I understand the distinction - let me verify the logic
# Check if includesDedicatedHosts distinguishes between AnyAvailable and DedicatedHost
sed -n '794,806p' pkg/asset/installconfig/aws/permissions.go

# Also search for any other code that adds PermissionDynamicHostAllocation
rg -n "PermissionDynamicHostAllocation" pkg/asset/installconfig/aws/permissions.go

Repository: openshift/installer

Length of output: 742


🏁 Script executed:

#!/bin/bash
# Final verification - check if there's a way to distinguish between affinity types in RequiredPermissionGroups
# or if the permissions are intentionally grouped together
rg -n "AnyAvailable\|HostAffinityAnyAvailable" pkg/asset/installconfig/aws/ --type go

# Also check if there are any recent changes or issues related to this
git log --all --grep="DynamicHostAllocation\|ReleaseHosts" --oneline 2>/dev/null | head -10

Repository: openshift/installer

Length of output: 45


Missing PermissionDynamicHostAllocation for dynamic host cleanup at destroy time, but the suggested fix needs refinement.

RequiredPermissionGroups is used for both install and delete credential validation. When dedicated hosts are configured with AnyAvailable affinity, the system dynamically allocates hosts that must be released at destroy time via ec2:ReleaseHosts. However, the current code only appends PermissionDedicatedHosts.

The suggested fix requires refinement: includesDedicatedHosts() returns true for any HostPlacement configuration but doesn't distinguish between HostAffinityAnyAvailable (dynamically allocated, requiring release) and HostAffinityDedicatedHost (user-provided, no release needed). Simply appending PermissionDynamicHostAllocation whenever HostPlacement != nil would add unnecessary permissions for the DedicatedHost affinity case.

The fix should conditionally add PermissionDynamicHostAllocation only when affinity is set to AnyAvailable, not for all dedicated host placements.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/asset/installconfig/aws/permissions.go` around lines 581 - 583,
RequiredPermissionGroups currently appends PermissionDedicatedHosts for any
HostPlacement via includesDedicatedHosts(ic) but must also append
PermissionDynamicHostAllocation only when the HostPlacement's affinity is
HostAffinityAnyAvailable (dynamically allocated hosts that require
ec2:ReleaseHosts). Update the logic in RequiredPermissionGroups to inspect
ic.Platform.AWS.HostPlacement (or the function that reads HostPlacement used by
includesDedicatedHosts) and: keep appending PermissionDedicatedHosts when
HostPlacement is present, and additionally append
PermissionDynamicHostAllocation only when HostPlacement.Affinity ==
HostAffinityAnyAvailable; do not add PermissionDynamicHostAllocation for
HostAffinityDedicatedHost.

Comment on lines +250 to +251
// +required
ID string `json:"id,omitempty"`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if other required fields in the codebase use omitempty
rg -n '\+required' -A1 --type go pkg/types/aws/ | grep -E 'omitempty|json:'

Repository: openshift/installer

Length of output: 251


🏁 Script executed:

sed -n '248,253p' pkg/types/aws/machinepool.go

Repository: openshift/installer

Length of output: 193


Remove omitempty from the required ID field's JSON tag.

The ID field is annotated with +required but includes omitempty in the JSON tag, which is contradictory—a required field should not be omitted when empty. This also inconsistent with other required fields in the same package (e.g., in platform.go), which use json:"id" without omitempty.

Suggested fix
 	// +required
-	ID string `json:"id,omitempty"`
+	ID string `json:"id"`
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// +required
ID string `json:"id,omitempty"`
// +required
ID string `json:"id"`
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/types/aws/machinepool.go` around lines 250 - 251, The ID field in the
machine pool struct is marked +required but its JSON tag uses omitempty; update
the struct's ID field (the symbol "ID" in the machine pool struct in
machinepool.go) to remove omitempty so the tag reads `json:"id"` (keep the
`+required` comment intact) to match other required fields like the one in
platform.go.

Copy link
Copy Markdown
Contributor

@patrickdillon patrickdillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

}

// Check to make sure dedicated host is not owned by another cluster
clusterIDs := foundHost.Tags.GetClusterIDs(TagValueOwned)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
clusterIDs := foundHost.Tags.GetClusterIDs(TagValueOwned)
clusterIDs := foundHost.Tags.GetClusterIDs(TagValueShared)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickdillon This is supposed to be owned, not shared. If the dedicated host we find is "owned" by another cluster, we want to throw an error since we cannot assign an instance from this cluster onto the host owned by another cluster.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on the other comment I just posted, it does not seem to me like there would ever be a scenario where the host would have an owned tag. It seems like instances on the host would be owned.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Mar 16, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: patrickdillon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 16, 2026
}

if includesDedicatedHosts(ic) {
permissionGroups = append(permissionGroups, PermissionDedicatedHosts)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coderabbit points out that we're missing the ec2:ReleaseHosts permission that is needed for destroy. Not sure if it is best to handle that by appending the PermissionDynamicHostAllocation permission here too, or collapsing that permission into the PermissionDedicatedHosts perm group

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickdillon I am not sure how we handle this one. These are dynamic in that it is not configured in the install config. I can ALWAYS add it but i am sure that is frowned upon. So what I did is just make sure the installer loops as expected when the permission is missing. What are your thoughts on this? I can add this to the above check during install for when someone is doing BYO DH, but they may not want to provide the release permission.

@vr4manta
Copy link
Copy Markdown
Contributor Author

/retest

1 similar comment
@vr4manta
Copy link
Copy Markdown
Contributor Author

/retest

}

// tagSharedDedicatedHosts tags users BYO dedicated hosts so they are not destroyed by the Installer.
func tagSharedDedicatedHosts(ctx context.Context, clusterID string, installConfig *installconfig.InstallConfig) error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the tags should be set by adding the tags on the compute node manifests... The compute node manifests would already be including the owned tag.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually after reviewing this more, I'm not sure we want a shared tag. happy to discuss more tomorrow.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickdillon not sure I follow. compute nodes are related to the AWS EC2 instances, not the hosts that the instances run on. Dedicated Hosts are host in AWS that the end user can allocate and use to assign node/machine instances onto. We are tagging them similar to subnets, iam profiles, etc to make sure we do not release them during cluster destroy since a shared (BYO) host may be used by another OCP cluster.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this helps with my understanding: let me see if I understand correctly:

  • users provision/byo dedicated hosts
  • openshift can then manage instances on those dedicated hosts, just like we would any other ec2 instances
  • it seems like the important point is that we manage the lifecycle of those instances, which may already be happening if they are getting the appropriate tags

Based on your clarification, my feedback/questions/assumptions would be:

  • since hosts are byo, they would never have the owned tag. we do not manage the lifecycle of the host
  • accordingly, we should never release hosts
  • applying shared tags to hosts would follow the conventions of VPCs, but perhaps serves no functional purpose?

Let me expand on that last point: to me it seems like at most all we should be doing is applying (at install time) a shared label, and removing that shared label at destroy time. That said, if the label is not used for resource discovery, or by any controllers; at best the label is used for monitoring, reporting. I'm totally fine with that, and seems like a good idea; but does not seem necessary for functionality. Hope that makes sense.

I think we're chatting in an hour, looking forward to it!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So no, my understanding is not correct! OpenShift actually does allocate dedicated hosts but only does that day2. In those cases, the dedicated host is rightfully considered owned

@patrickdillon
Copy link
Copy Markdown
Contributor

@vr4manta if you haven't captured this already, as followup you probably want destroy validation like we have with VPCs to make sure an owned dedicated host does not have any shared tags:

func (o *ClusterUninstaller) ValidateOwnedSubnets(ctx context.Context) error {

@tthvo
Copy link
Copy Markdown
Member

tthvo commented Mar 19, 2026

/test shellcheck

1 similar comment
@patrickdillon
Copy link
Copy Markdown
Contributor

/test shellcheck

@patrickdillon
Copy link
Copy Markdown
Contributor

/lgtm
/verified later @vr4manta

@openshift-ci-robot openshift-ci-robot added verified-later verified Signifies that the PR passed pre-merge verification criteria labels Mar 20, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@patrickdillon: This PR has been marked to be verified later by @vr4manta.

Details

In response to this:

/lgtm
/verified later @vr4manta

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Mar 20, 2026
@mtulio
Copy link
Copy Markdown
Contributor

mtulio commented Mar 20, 2026

Thanks @vr4manta for addressing the comments, and @patrickdillon for chatting today.

this looks good to me.

/lgtm

@vr4manta
Copy link
Copy Markdown
Contributor Author

/retest

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Mar 20, 2026

@vr4manta: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-ovn-xpn 9364945 link false /test e2e-gcp-ovn-xpn
ci/prow/e2e-metal-ovn-two-node-fencing 9364945 link false /test e2e-metal-ovn-two-node-fencing
ci/prow/gcp-private 9364945 link false /test gcp-private
ci/prow/e2e-gcp-secureboot 9364945 link false /test e2e-gcp-secureboot
ci/prow/gcp-custom-endpoints-proxy-wif 9364945 link false /test gcp-custom-endpoints-proxy-wif
ci/prow/e2e-metal-assisted 9364945 link false /test e2e-metal-assisted
ci/prow/e2e-metal-ipi-ovn-virtualmedia 9364945 link false /test e2e-metal-ipi-ovn-virtualmedia
ci/prow/e2e-metal-ipi-ovn 9364945 link false /test e2e-metal-ipi-ovn
ci/prow/e2e-gcp-ovn 9364945 link true /test e2e-gcp-ovn
ci/prow/e2e-metal-single-node-live-iso 9364945 link false /test e2e-metal-single-node-live-iso
ci/prow/e2e-gcp-custom-dns 9364945 link false /test e2e-gcp-custom-dns
ci/prow/e2e-azurestack 9364945 link false /test e2e-azurestack
ci/prow/e2e-gcp-default-config 9364945 link false /test e2e-gcp-default-config
ci/prow/e2e-gcp-xpn-dedicated-dns-project 9364945 link false /test e2e-gcp-xpn-dedicated-dns-project
ci/prow/e2e-openstack-ovn 9364945 link true /test e2e-openstack-ovn
ci/prow/e2e-gcp-custom-endpoints 9364945 link false /test e2e-gcp-custom-endpoints
ci/prow/e2e-aws-ovn-dualstack-ipv6-primary-techpreview 37b76cb link false /test e2e-aws-ovn-dualstack-ipv6-primary-techpreview
ci/prow/e2e-aws-ovn-dualstack-ipv4-primary-techpreview 37b76cb link false /test e2e-aws-ovn-dualstack-ipv4-primary-techpreview
ci/prow/e2e-aws-ovn-heterogeneous e6282c4 link false /test e2e-aws-ovn-heterogeneous

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit ac6111f into openshift:main Mar 20, 2026
25 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria verified-later

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants