SPLAT-2172: AWS dedicate host support#10079
SPLAT-2172: AWS dedicate host support#10079openshift-merge-bot[bot] merged 5 commits intoopenshift:mainfrom
Conversation
|
@vr4manta: This pull request references SPLAT-2172 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/cc |
24500b6 to
342ab2d
Compare
|
/retest-required |
tthvo
left a comment
There was a problem hiding this comment.
Cool feature 👍 🚀
I have quite some comments 😅 Sorry if I review a little too soon...
| switch *p.HostPlacement.Affinity { | ||
| case aws.HostAffinityAnyAvailable: | ||
| if p.HostPlacement.DedicatedHost != nil { | ||
| allErrs = append(allErrs, field.Required(fldPath.Child("dedicatedHost"), "dedicatedHost is required when 'affinity' is set to DedicatedHost, and forbidden otherwise")) |
There was a problem hiding this comment.
Did you mean hostPlacement.dedicatedHost is forbidden if affinity is AnyAvailable?
allErrs = append(allErrs, field.Forbidden(fldPath.Child("dedicatedHost"), "dedicatedHost must not be set when affinity is AnyAvailable"))There was a problem hiding this comment.
@tthvo So this was an interesting one. In openshift/api and openshift/machine-api-operator, it was suggested to error this way for this scenario. I was doing this to keep it consistent.
- https://github.com/openshift/api/blob/0598ae66682433c49adc29b1ff14a201487f3afe/machine/v1beta1/types_awsprovider.go#L406
- openshift/machine-api-operator@f669914#diff-30c686f7a9dbd2aa917e5e410401d720a2a4bf4a11163495f5610c8f4b98edaeR917
Now that said, I am happy to make your suggestion if you prefer installer say it this way. Just let me know.
| if err != nil { | ||
| allErrs = append(allErrs, field.InternalError(placementPath.Child("dedicatedHost"), err)) | ||
| } else { | ||
| // Check the returned configured hosts to see if the dedicated hosts defined in install-config exists. |
There was a problem hiding this comment.
💡 ❓ I got 2 questions here:
-
Do the zones of dedicated hosts need to match machinepool zone field if defined?
installer/pkg/types/aws/machinepool.go
Lines 5 to 9 in 7c9b71a
-
Do the user-input zones for dedicated hosts need to match with the actual zones returned from AWS?
// If user specified a zone, validate it matches AWS if host.Zone != "" && host.Zone != hostDetails.Zone { allErrs = append(allErrs, field.Invalid( fldPath.Child("hostPlacement", "dedicatedHost").Index(i).Child("zone"), host.Zone, fmt.Sprintf("specified zone %s does not match actual host zone %s", host.Zone, hostDetails.Zone))) }
There was a problem hiding this comment.
I would expect them to not include one thats not specified (or in the generic case where we read all zones). So I can add logic when checking to see if its is one that we are interested in or not.
There was a problem hiding this comment.
I took a stab at adding logic for if zone for found dedicated host is valid for pool config. I am gonna add another check shortly to make sure if user configured the dedicatedHost to have zone, that its actually correct as well.
There was a problem hiding this comment.
Do the user-input zones for dedicated hosts need to match with the actual zones returned from AWS?
On that point, I'm curious if users should need to specify the zone for the dedicated host? It seems like we can just look up the dedicated host by id and then use that to determine the zone (which the pr is already doing IIUC). Seems like we might be able to shed zone from the install config.
There was a problem hiding this comment.
This logic was updated. We no longer have field for config to specify the zone. The validation logic does make sure that the configured DHs do belong to a zone that the install-config is configured for.
|
/hold |
82915c2 to
8ef8679
Compare
|
/payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-aws-ovn-dedicated |
|
@vr4manta: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/1c5c6d80-c717-11f0-93e9-d918c346504c-0 |
|
/unhold |
|
/retest |
|
/hold |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/asset/machines/worker.go (1)
537-555:⚠️ Potential issue | 🟠 MajorFail fast when a resolved AZ has no matching dedicated host.
mpool.Zonesmay already have been defaulted to every worker subnet/AZ before this block runs. If the user only configured dedicated hosts for a subset of those zones,aws.MachineSetswill still generate a MachineSet for the extra AZs, andaws.DedicatedHost(...)resolves to""for them. That leaves aDedicatedHostpool emitting invalid provider specs instead of surfacing a clear installer error.Proposed guard
dHosts := map[string]icaws.Host{} if mpool.HostPlacement != nil && mpool.HostPlacement.Affinity != nil && *mpool.HostPlacement.Affinity == awstypes.HostAffinityDedicatedHost { dHosts, err = installConfig.AWS.DedicatedHosts(ctx, mpool.HostPlacement.DedicatedHost) if err != nil { return fmt.Errorf("failed to retrieve dedicated hosts for compute pool: %w", err) } + for _, zone := range mpool.Zones { + if aws.DedicatedHost(dHosts, mpool.HostPlacement, zone) == "" { + return fmt.Errorf("no dedicated host matches zone %s for compute pool %s", zone, pool.Name) + } + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/asset/machines/worker.go` around lines 537 - 555, If dedicated hosts were requested (mpool.HostPlacement != nil && mpool.HostPlacement.Affinity != nil && *mpool.HostPlacement.Affinity == awstypes.HostAffinityDedicatedHost), after calling installConfig.AWS.DedicatedHosts(...) and before calling aws.MachineSets(...), validate that every AZ in mpool.Zones has a corresponding entry in the returned dHosts map (or that aws.DedicatedHost(...) would not return an empty string for that AZ); if any AZ is missing a dedicated host, return an error (fmt.Errorf) describing the specific AZ(s) with no matching dedicated host so we fail fast instead of producing MachineSets with empty DedicatedHost values.
♻️ Duplicate comments (7)
pkg/asset/machines/aws/machines.go (1)
370-383:⚠️ Potential issue | 🟠 MajorSilent fallback to shared placement when no host matches the zone.
When
placement != nil(i.e., DedicatedHost affinity is configured) but no host ID matches the given zone, returning""causes the caller to omitPlacement.Hostentirely. This silently produces a non-dedicated instance for a pool that explicitly requested dedicated host placement.Consider returning an error when placement is configured but no matching host is found:
🔧 Suggested approach
-func DedicatedHost(hosts map[string]aws.Host, placement *awstypes.HostPlacement, zone string) string { +func DedicatedHost(hosts map[string]aws.Host, placement *awstypes.HostPlacement, zone string) (string, error) { // If install-config has HostPlacements configured, lets check the DedicatedHosts to see if one matches our region & zone. if placement != nil { + if *placement.Affinity == awstypes.HostAffinityDedicatedHost && len(placement.DedicatedHost) > 0 { + // Expect to find a matching host + } // We only support one host ID currently for an instance. Need to also get host that matches the zone the machines will be put into. for _, host := range placement.DedicatedHost { hostDetails, found := hosts[host.ID] if found && hostDetails.Zone == zone { - return hostDetails.ID + return hostDetails.ID, nil } } + if *placement.Affinity == awstypes.HostAffinityDedicatedHost { + return "", fmt.Errorf("no dedicated host found for zone %s", zone) + } } - return "" + return "", nil },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/asset/machines/aws/machines.go` around lines 370 - 383, The DedicatedHost function currently returns an empty string when placement is non-nil but no host in placement.DedicatedHost matches the given zone, which causes callers to omit Placement.Host; change DedicatedHost to propagate an error in that case (e.g., return an error alongside the host ID or convert the function to (string, error)) so callers can fail fast when DedicatedHost affinity is configured but no matching host is found; update callers to handle the new error return and surface a clear error message referencing DedicatedHost, placement.DedicatedHost and the zone.pkg/types/aws/machinepool.go (1)
238-239:⚠️ Potential issue | 🟡 MinorFix the stale YAML path in the public comment.
The comment references
dedicatedHost.host.idbut the actual schema path isdedicatedHost[].id(since DedicatedHost is a slice).📝 Suggested fix
- // HostAffinityDedicatedHost requires specifying a particular host via dedicatedHost.host.id. + // HostAffinityDedicatedHost requires specifying at least one host via dedicatedHost[].id. HostAffinityDedicatedHost HostAffinity = "DedicatedHost",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/types/aws/machinepool.go` around lines 238 - 239, Update the public comment for HostAffinityDedicatedHost to correct the stale YAML path: change the referenced schema path from dedicatedHost.host.id to dedicatedHost[].id (reflecting that DedicatedHost is a slice) so the comment accurately points to dedicatedHost[].id.pkg/types/aws/validation/featuregates.go (1)
28-38:⚠️ Potential issue | 🟠 MajorGate
hostPlacementondefaultMachinePlatformas well.This loop only gates
compute[*].platform.aws.hostPlacement. If a user setsplatform.aws.defaultMachinePlatform.hostPlacement, it will be merged into compute pools viaSet()but won't trigger the feature gate check.Consider adding a check before the loop:
+ // Check if defaultMachinePlatform has dedicated hosts configured + if c.Platform.AWS != nil && c.Platform.AWS.DefaultMachinePlatform != nil && c.Platform.AWS.DefaultMachinePlatform.HostPlacement != nil { + gatedFeatures = append(gatedFeatures, featuregates.GatedInstallConfigFeature{ + FeatureGateName: features.FeatureGateAWSDedicatedHosts, + Condition: true, + Field: field.NewPath("platform", "aws", "defaultMachinePlatform", "hostPlacement"), + }) + } // Check if any compute pool has dedicated hosts configured for idx, compute := range c.Compute {,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/types/aws/validation/featuregates.go` around lines 28 - 38, The loop currently only checks compute pools (c.Compute) for AWS HostPlacement; add a pre-check for c.Platform.AWS.DefaultMachinePlatform.HostPlacement and, if non-nil, append the same feature gate (featuregates.GatedInstallConfigFeature with FeatureGateName features.FeatureGateAWSDedicatedHosts, Condition true) using field.NewPath("platform","aws","defaultMachinePlatform","hostPlacement") before iterating compute so the defaultMachinePlatform case is gated (avoid duplicating the gate by performing this check first and only falling back to the existing compute loop if the defaultMachinePlatform check is not set).pkg/types/aws/validation/machinepool_test.go (1)
300-323:⚠️ Potential issue | 🟡 MinorTest cases use host IDs that would fail CRD validation.
These tests are named "valid dedicated host - 8 character format" but use IDs like
h-1a2b3c4d(8 hex chars) which don't match the kubebuilder validation pattern^h-[0-9a-f]{17}$(17 hex chars) defined inpkg/types/aws/machinepool.go.While these tests pass because the Go validation only checks for empty IDs (the regex is enforced by kubebuilder), the test names are misleading. Consider either:
- Using valid 17-character IDs:
h-1a2b3c4d5e6f78901- Renaming tests to clarify they only test Go-level validation
🔧 Suggested fix using valid IDs
{ - name: "valid dedicated host - 8 character format", + name: "valid dedicated host", pool: &aws.MachinePool{ HostPlacement: &aws.HostPlacement{ Affinity: ptr.To(aws.HostAffinityDedicatedHost), DedicatedHost: []aws.DedicatedHost{ - {ID: "h-1a2b3c4d"}, + {ID: "h-1a2b3c4d5e6f78901"}, }, }, }, }, { - name: "valid dedicated host - 8 character format with zone", + name: "valid dedicated host - another example", pool: &aws.MachinePool{ HostPlacement: &aws.HostPlacement{ Affinity: ptr.To(aws.HostAffinityDedicatedHost), DedicatedHost: []aws.DedicatedHost{ { - ID: "h-9876abcd", + ID: "h-9876abcdef0123456", }, }, }, }, },,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/types/aws/validation/machinepool_test.go` around lines 300 - 323, The test cases named "valid dedicated host - 8 character format" and "valid dedicated host - 8 character format with zone" use DedicatedHost.ID values like "h-1a2b3c4d" which do not match the kubebuilder CRD regex ^h-[0-9a-f]{17}$; update the DedicatedHost.ID fields in those aws.MachinePool test entries (the structs under HostPlacement -> DedicatedHost -> ID) to use valid 17-hex-character host IDs (e.g., "h-1a2b3c4d5e6f78901") or, if you prefer to keep short IDs, rename the test names to indicate they only exercise Go-level (non-CRD) validation rather than CRD-valid IDs.pkg/types/aws/validation/machinepool.go (1)
105-108:⚠️ Potential issue | 🟡 MinorUse
field.Forbiddeninstead offield.Requiredfor AnyAvailable case.When affinity is
AnyAvailableandDedicatedHostis set, the field should be marked as forbidden, not required. The error message says "forbidden otherwise" but usesfield.Required.🐛 Proposed fix
case aws.HostAffinityAnyAvailable: if len(p.HostPlacement.DedicatedHost) > 0 { - allErrs = append(allErrs, field.Required(fldPath.Child("dedicatedHost"), "dedicatedHost is required when 'affinity' is set to DedicatedHost, and forbidden otherwise")) + allErrs = append(allErrs, field.Forbidden(fldPath.Child("dedicatedHost"), "dedicatedHost must not be set when affinity is AnyAvailable")) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/types/aws/validation/machinepool.go` around lines 105 - 108, The validation for the aws.HostAffinityAnyAvailable case is using field.Required but should use field.Forbidden; update the code in the switch case handling aws.HostAffinityAnyAvailable to append field.Forbidden(fldPath.Child("dedicatedHost"), "<message>") instead of field.Required, keeping or clarifying the existing message (e.g., "dedicatedHost is forbidden when 'affinity' is set to AnyAvailable") so the error matches the logic applied to allErrs and fldPath.Child("dedicatedHost").pkg/asset/installconfig/aws/validation.go (1)
592-597:⚠️ Potential issue | 🟡 MinorInconsistent field path in ownership validation error.
The ownership validation error at line 595 uses
fldPath(the pool-level path) instead ofdhPath(the dedicated host entry path). Other errors in this loop consistently usedhPathfor better error localization.Suggested fix
// Check to make sure dedicated host is not owned by another cluster clusterIDs := foundHost.Tags.GetClusterIDs(TagValueOwned) if len(clusterIDs) > 0 { - allErrs = append(allErrs, field.Forbidden(fldPath, + allErrs = append(allErrs, field.Forbidden(dhPath, fmt.Sprintf("Dedicated host %s is owned by other cluster %v and cannot be used for new installations, another Dedicated Host must be created separately", foundHost.ID, clusterIDs))) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/asset/installconfig/aws/validation.go` around lines 592 - 597, The ownership validation appends an error with the wrong field path (uses the pool-level path variable instead of the dedicated-host entry path), so update the field.Forbidden call in the dedicated host ownership check to use dhPath (the dedicated-host specific path) rather than fldPath; locate the block using variables clusterIDs, foundHost, TagValueOwned, dhPath, allErrs and replace the path argument passed to field.Forbidden so the error is associated with the dedicated host entry.data/data/install.openshift.io_installconfigs.yaml (1)
219-251:⚠️ Potential issue | 🟠 MajorReject empty
dedicatedHostlists whenaffinity: DedicatedHost.Lines 250, 1860, 3410, and 5160 only test field presence, so
dedicatedHost: []still satisfies the CEL rule and an unusable config gets accepted. AddminItems: 1to the array schema in each copy of this block. If this YAML is generated, the fix should be made in the source schema/marker and then regenerated.Suggested fix
dedicatedHost: + minItems: 1 description: |-Also applies to: 1829-1861, 3379-3411, 5129-5161
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@data/data/install.openshift.io_installconfigs.yaml` around lines 219 - 251, The CEL validation only checks for presence of dedicatedHost, so an empty array passes; update the dedicatedHost array schema (the dedicatedHost property definition that contains items -> properties -> id and the surrounding type: array) to include minItems: 1 so empty lists are rejected when affinity == "DedicatedHost"; apply the same change to each duplicated block (the dedicatedHost array definitions that accompany the affinity/x-kubernetes-validations rule referencing affinity == 'DedicatedHost') or, if generated, fix the source schema/marker and regenerate the YAML.
🧹 Nitpick comments (2)
pkg/types/aws/validation/machinepool_test.go (1)
380-392: The poolName defaulting logic is fragile.The logic relies on string matching in
tc.expectedto determine if a test is for control plane or edge pools. This is brittle and could break if error messages change.Consider making this explicit by always setting
poolNamein the test cases:♻️ Suggested approach
for _, tc := range cases { t.Run(tc.name, func(t *testing.T) { poolName := tc.poolName - // If poolName is not specified, default to worker pool unless the test - // is explicitly testing control plane or edge pool behavior - if poolName == "" && tc.expected != "" && - !strings.Contains(tc.expected, "control plane") && - !strings.Contains(tc.expected, "edge pool") { - poolName = types.MachinePoolComputeRoleName - } else if poolName == "" && tc.expected == "" { + if poolName == "" { poolName = types.MachinePoolComputeRoleName }Then update the control plane test case to explicitly set
poolName: types.MachinePoolControlPlaneRoleNameinstead of relying on empty string.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/types/aws/validation/machinepool_test.go` around lines 380 - 392, The current defaulting logic for poolName in the test loop is brittle because it inspects tc.expected text; instead make poolName explicit in each test case and simplify the test setup: remove the string-matching branches around tc.expected and just set poolName := tc.poolName; if poolName == "" set it to types.MachinePoolComputeRoleName (default), and update the control plane test case to set poolName: types.MachinePoolControlPlaneRoleName (and any edge-pool tests to their explicit names) so ValidateMachinePool calls use explicit pool names rather than parsing tc.expected.pkg/asset/machines/worker_test.go (1)
130-155: Add one AWS dedicated-host regression case.These tests now initialize AWS metadata, but they still never exercise
HostPlacement. A case withhostPlacement.affinity=DedicatedHostwould cover the new branch inWorker.Generateand catch zone-to-host mapping regressions much earlier.Also applies to: 189-217
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/asset/machines/worker_test.go` around lines 130 - 155, Add a new test case in pkg/asset/machines/worker_test.go that exercises AWS HostPlacement by setting the MachinePool AWS HostPlacement affinity to DedicatedHost (e.g. set cfg.Compute[0].Platform.AWS.HostPlacement.Affinity = awstypes.HostAffinityDedicatedHost) before creating the installconfig asset (installconfig.MakeAsset) and initializing icAsset.AWS; this will force the Worker.Generate branch that handles HostPlacement and validate zone-to-host mapping behavior alongside the existing cases (reuse cfg, icAsset.AWS = icaws.NewMetadata(...), and existing tc variants).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/asset/installconfig/aws/dedicatedhosts.go`:
- Around line 18-19: Update the grammatical typo in the field comment for the
Tags field: change "Hosts's" to "Host's" in the comment above the Tags
declaration (the Tags field in dedicatedhosts.go / the Tags struct field) so it
reads "Tags is the map of the Host's tags." Ensure only the comment text is
modified and no code logic is changed.
In `@pkg/asset/installconfig/aws/validation_test.go`:
- Around line 2431-2443: The function withComputeHostPlacementAndZone accepts a
zone but never uses it; update the function to populate the compute pool's AWS
Zones so the zone parameter is applied: inside withComputeHostPlacementAndZone
(which currently sets HostPlacement on ic.Compute[index].Platform.AWS), also set
ic.Compute[index].Platform.AWS.Zones = []string{zone} (or append if multiple
zones are intended); alternatively, if you don’t need zone semantics, remove
withComputeHostPlacementAndZone and use withComputeHostPlacement to avoid
duplication.
---
Outside diff comments:
In `@pkg/asset/machines/worker.go`:
- Around line 537-555: If dedicated hosts were requested (mpool.HostPlacement !=
nil && mpool.HostPlacement.Affinity != nil && *mpool.HostPlacement.Affinity ==
awstypes.HostAffinityDedicatedHost), after calling
installConfig.AWS.DedicatedHosts(...) and before calling aws.MachineSets(...),
validate that every AZ in mpool.Zones has a corresponding entry in the returned
dHosts map (or that aws.DedicatedHost(...) would not return an empty string for
that AZ); if any AZ is missing a dedicated host, return an error (fmt.Errorf)
describing the specific AZ(s) with no matching dedicated host so we fail fast
instead of producing MachineSets with empty DedicatedHost values.
---
Duplicate comments:
In `@data/data/install.openshift.io_installconfigs.yaml`:
- Around line 219-251: The CEL validation only checks for presence of
dedicatedHost, so an empty array passes; update the dedicatedHost array schema
(the dedicatedHost property definition that contains items -> properties -> id
and the surrounding type: array) to include minItems: 1 so empty lists are
rejected when affinity == "DedicatedHost"; apply the same change to each
duplicated block (the dedicatedHost array definitions that accompany the
affinity/x-kubernetes-validations rule referencing affinity == 'DedicatedHost')
or, if generated, fix the source schema/marker and regenerate the YAML.
In `@pkg/asset/installconfig/aws/validation.go`:
- Around line 592-597: The ownership validation appends an error with the wrong
field path (uses the pool-level path variable instead of the dedicated-host
entry path), so update the field.Forbidden call in the dedicated host ownership
check to use dhPath (the dedicated-host specific path) rather than fldPath;
locate the block using variables clusterIDs, foundHost, TagValueOwned, dhPath,
allErrs and replace the path argument passed to field.Forbidden so the error is
associated with the dedicated host entry.
In `@pkg/asset/machines/aws/machines.go`:
- Around line 370-383: The DedicatedHost function currently returns an empty
string when placement is non-nil but no host in placement.DedicatedHost matches
the given zone, which causes callers to omit Placement.Host; change
DedicatedHost to propagate an error in that case (e.g., return an error
alongside the host ID or convert the function to (string, error)) so callers can
fail fast when DedicatedHost affinity is configured but no matching host is
found; update callers to handle the new error return and surface a clear error
message referencing DedicatedHost, placement.DedicatedHost and the zone.
In `@pkg/types/aws/machinepool.go`:
- Around line 238-239: Update the public comment for HostAffinityDedicatedHost
to correct the stale YAML path: change the referenced schema path from
dedicatedHost.host.id to dedicatedHost[].id (reflecting that DedicatedHost is a
slice) so the comment accurately points to dedicatedHost[].id.
In `@pkg/types/aws/validation/featuregates.go`:
- Around line 28-38: The loop currently only checks compute pools (c.Compute)
for AWS HostPlacement; add a pre-check for
c.Platform.AWS.DefaultMachinePlatform.HostPlacement and, if non-nil, append the
same feature gate (featuregates.GatedInstallConfigFeature with FeatureGateName
features.FeatureGateAWSDedicatedHosts, Condition true) using
field.NewPath("platform","aws","defaultMachinePlatform","hostPlacement") before
iterating compute so the defaultMachinePlatform case is gated (avoid duplicating
the gate by performing this check first and only falling back to the existing
compute loop if the defaultMachinePlatform check is not set).
In `@pkg/types/aws/validation/machinepool_test.go`:
- Around line 300-323: The test cases named "valid dedicated host - 8 character
format" and "valid dedicated host - 8 character format with zone" use
DedicatedHost.ID values like "h-1a2b3c4d" which do not match the kubebuilder CRD
regex ^h-[0-9a-f]{17}$; update the DedicatedHost.ID fields in those
aws.MachinePool test entries (the structs under HostPlacement -> DedicatedHost
-> ID) to use valid 17-hex-character host IDs (e.g., "h-1a2b3c4d5e6f78901") or,
if you prefer to keep short IDs, rename the test names to indicate they only
exercise Go-level (non-CRD) validation rather than CRD-valid IDs.
In `@pkg/types/aws/validation/machinepool.go`:
- Around line 105-108: The validation for the aws.HostAffinityAnyAvailable case
is using field.Required but should use field.Forbidden; update the code in the
switch case handling aws.HostAffinityAnyAvailable to append
field.Forbidden(fldPath.Child("dedicatedHost"), "<message>") instead of
field.Required, keeping or clarifying the existing message (e.g., "dedicatedHost
is forbidden when 'affinity' is set to AnyAvailable") so the error matches the
logic applied to allErrs and fldPath.Child("dedicatedHost").
---
Nitpick comments:
In `@pkg/asset/machines/worker_test.go`:
- Around line 130-155: Add a new test case in pkg/asset/machines/worker_test.go
that exercises AWS HostPlacement by setting the MachinePool AWS HostPlacement
affinity to DedicatedHost (e.g. set
cfg.Compute[0].Platform.AWS.HostPlacement.Affinity =
awstypes.HostAffinityDedicatedHost) before creating the installconfig asset
(installconfig.MakeAsset) and initializing icAsset.AWS; this will force the
Worker.Generate branch that handles HostPlacement and validate zone-to-host
mapping behavior alongside the existing cases (reuse cfg, icAsset.AWS =
icaws.NewMetadata(...), and existing tc variants).
In `@pkg/types/aws/validation/machinepool_test.go`:
- Around line 380-392: The current defaulting logic for poolName in the test
loop is brittle because it inspects tc.expected text; instead make poolName
explicit in each test case and simplify the test setup: remove the
string-matching branches around tc.expected and just set poolName :=
tc.poolName; if poolName == "" set it to types.MachinePoolComputeRoleName
(default), and update the control plane test case to set poolName:
types.MachinePoolControlPlaneRoleName (and any edge-pool tests to their explicit
names) so ValidateMachinePool calls use explicit pool names rather than parsing
tc.expected.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e4f9a69f-8259-46bd-9425-e15a45a67d9b
⛔ Files ignored due to path filters (1)
pkg/types/aws/zz_generated.deepcopy.gois excluded by!**/zz_generated*
📒 Files selected for processing (20)
data/data/install.openshift.io_installconfigs.yamlpkg/asset/cluster/aws/aws.gopkg/asset/installconfig/aws/dedicatedhosts.gopkg/asset/installconfig/aws/metadata.gopkg/asset/installconfig/aws/permissions.gopkg/asset/installconfig/aws/validation.gopkg/asset/installconfig/aws/validation_test.gopkg/asset/machines/aws/machines.gopkg/asset/machines/aws/machinesets.gopkg/asset/machines/worker.gopkg/asset/machines/worker_test.gopkg/destroy/aws/ec2helpers.gopkg/types/aws/machinepool.gopkg/types/aws/validation/featuregates.gopkg/types/aws/validation/featuregates_test.gopkg/types/aws/validation/machinepool.gopkg/types/aws/validation/machinepool_test.gopkg/types/aws/validation/platform.gopkg/types/machinepools.gopkg/types/validation/machinepools.go
🚧 Files skipped from review as they are similar to previous changes (5)
- pkg/asset/installconfig/aws/metadata.go
- pkg/types/aws/validation/featuregates_test.go
- pkg/types/validation/machinepools.go
- pkg/types/aws/validation/platform.go
- pkg/asset/cluster/aws/aws.go
| func (icBuild icBuildForAWS) withComputeHostPlacementAndZone(hostIDs []string, zone string, index int) icOption { | ||
| return func(ic *types.InstallConfig) { | ||
| aff := aws.HostAffinityDedicatedHost | ||
| dhs := make([]aws.DedicatedHost, 0, len(hostIDs)) | ||
| for _, id := range hostIDs { | ||
| dhs = append(dhs, aws.DedicatedHost{ID: id}) | ||
| } | ||
| ic.Compute[index].Platform.AWS.HostPlacement = &aws.HostPlacement{ | ||
| Affinity: &aff, | ||
| DedicatedHost: dhs, | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Unused zone parameter in withComputeHostPlacementAndZone.
The zone parameter is accepted but never used in the function body. The function is identical to withComputeHostPlacement. Either the zone should be used (e.g., to set ic.Compute[index].Platform.AWS.Zones) or this method should be removed in favor of using withComputeHostPlacement directly.
Suggested fix (if zone should set compute pool zones)
func (icBuild icBuildForAWS) withComputeHostPlacementAndZone(hostIDs []string, zone string, index int) icOption {
return func(ic *types.InstallConfig) {
aff := aws.HostAffinityDedicatedHost
dhs := make([]aws.DedicatedHost, 0, len(hostIDs))
for _, id := range hostIDs {
dhs = append(dhs, aws.DedicatedHost{ID: id})
}
ic.Compute[index].Platform.AWS.HostPlacement = &aws.HostPlacement{
Affinity: &aff,
DedicatedHost: dhs,
}
+ // Set the zone for the compute pool if specified
+ if zone != "" {
+ ic.Compute[index].Platform.AWS.Zones = []string{zone}
+ }
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (icBuild icBuildForAWS) withComputeHostPlacementAndZone(hostIDs []string, zone string, index int) icOption { | |
| return func(ic *types.InstallConfig) { | |
| aff := aws.HostAffinityDedicatedHost | |
| dhs := make([]aws.DedicatedHost, 0, len(hostIDs)) | |
| for _, id := range hostIDs { | |
| dhs = append(dhs, aws.DedicatedHost{ID: id}) | |
| } | |
| ic.Compute[index].Platform.AWS.HostPlacement = &aws.HostPlacement{ | |
| Affinity: &aff, | |
| DedicatedHost: dhs, | |
| } | |
| } | |
| } | |
| func (icBuild icBuildForAWS) withComputeHostPlacementAndZone(hostIDs []string, zone string, index int) icOption { | |
| return func(ic *types.InstallConfig) { | |
| aff := aws.HostAffinityDedicatedHost | |
| dhs := make([]aws.DedicatedHost, 0, len(hostIDs)) | |
| for _, id := range hostIDs { | |
| dhs = append(dhs, aws.DedicatedHost{ID: id}) | |
| } | |
| ic.Compute[index].Platform.AWS.HostPlacement = &aws.HostPlacement{ | |
| Affinity: &aff, | |
| DedicatedHost: dhs, | |
| } | |
| // Set the zone for the compute pool if specified | |
| if zone != "" { | |
| ic.Compute[index].Platform.AWS.Zones = []string{zone} | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/asset/installconfig/aws/validation_test.go` around lines 2431 - 2443, The
function withComputeHostPlacementAndZone accepts a zone but never uses it;
update the function to populate the compute pool's AWS Zones so the zone
parameter is applied: inside withComputeHostPlacementAndZone (which currently
sets HostPlacement on ic.Compute[index].Platform.AWS), also set
ic.Compute[index].Platform.AWS.Zones = []string{zone} (or append if multiple
zones are intended); alternatively, if you don’t need zone semantics, remove
withComputeHostPlacementAndZone and use withComputeHostPlacement to avoid
duplication.
|
@vr4manta: This pull request references SPLAT-2172 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (6)
pkg/asset/installconfig/aws/validation_test.go (1)
2431-2443:⚠️ Potential issue | 🟡 MinorUnused
zoneparameter inwithComputeHostPlacementAndZone.The
zoneparameter is accepted but never used—this function is identical towithComputeHostPlacement. Either use the zone (e.g., to setic.Compute[index].Platform.AWS.Zones) or remove this method and usewithComputeHostPlacementdirectly.Looking at the test case at line 1593 that uses this method, the intent seems to be testing a mismatch between pool zones and host zones, but the zone parameter doesn't actually set the compute pool's zones.
Suggested fix (if zone should set compute pool zones)
func (icBuild icBuildForAWS) withComputeHostPlacementAndZone(hostIDs []string, zone string, index int) icOption { return func(ic *types.InstallConfig) { aff := aws.HostAffinityDedicatedHost dhs := make([]aws.DedicatedHost, 0, len(hostIDs)) for _, id := range hostIDs { dhs = append(dhs, aws.DedicatedHost{ID: id}) } ic.Compute[index].Platform.AWS.HostPlacement = &aws.HostPlacement{ Affinity: &aff, DedicatedHost: dhs, } + // Ensure the zone is associated with the host placement + if zone != "" && !slices.Contains(ic.Compute[index].Platform.AWS.Zones, zone) { + ic.Compute[index].Platform.AWS.Zones = append(ic.Compute[index].Platform.AWS.Zones, zone) + } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/asset/installconfig/aws/validation_test.go` around lines 2431 - 2443, The helper withComputeHostPlacementAndZone currently ignores its zone parameter; update it so that after setting HostPlacement it also assigns the provided zone to the compute pool's AWS Zones (e.g., set ic.Compute[index].Platform.AWS.Zones = []string{zone}) so tests that expect a mismatch between pool zones and host zones are valid; alternatively, if the zone is not needed, remove withComputeHostPlacementAndZone and use the existing withComputeHostPlacement helper instead.pkg/asset/installconfig/aws/permissions.go (1)
793-805:⚠️ Potential issue | 🟡 MinorHelper only checks pools named
worker, missing other compute pools.
includesDedicatedHostsskips pools wherempool.Name != types.MachinePoolComputeRoleName("worker"). This means additional compute pools with host placement (which are valid per tests infeaturegates_test.go) won't trigger the dedicated host permissions.The condition should skip only edge pools, not non-worker compute pools:
Suggested fix
func includesDedicatedHosts(installConfig *types.InstallConfig) bool { for _, mpool := range installConfig.Compute { - if mpool.Name != types.MachinePoolComputeRoleName { + if mpool.Name == types.MachinePoolEdgeRoleName { continue } if mpool.Platform.AWS != nil && mpool.Platform.AWS.HostPlacement != nil { return true } } return false }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/asset/installconfig/aws/permissions.go` around lines 793 - 805, includesDedicatedHosts only inspects pools named types.MachinePoolComputeRoleName ("worker"), so other compute pools with AWS.HostPlacement are ignored; change the logic in includesDedicatedHosts to skip only edge pools (check for types.MachinePoolEdgeRoleName) instead of skipping non-worker names, i.e., iterate all installConfig.Compute entries, continue when mpool.Name == types.MachinePoolEdgeRoleName, and keep the existing AWS/HostPlacement nil checks to return true when any non-edge pool specifies dedicated hosts.pkg/asset/installconfig/aws/validation.go (1)
569-581:⚠️ Potential issue | 🟠 MajorPotential AZ format mismatch when pool.Zones is empty.
When
pool.Zonesis empty,meta.AvailabilityZones(ctx)is called as a fallback. In this codebase, availability zones from metadata can be short form (a,b,c) whilefoundHost.Zonefrom AWS is full form (us-east-1a). Theslices.Containscheck at line 578 would fail even for valid hosts.Also, the error message at line 579 shows
pool.Zoneswhich may be empty when the fallback was used, making the error confusing.Suggested fix: Only validate when zones are explicitly configured
// Is host valid for pools region and zone config? // Only check if zones are explicitly configured; if pool.Zones is empty, all zones are allowed - zones := pool.Zones - if len(zones) == 0 { - zones, err = meta.AvailabilityZones(ctx) - if err != nil { - allErrs = append(allErrs, field.InternalError(fldPath, fmt.Errorf("unable to retrieve availability zones: %w", err))) - } - } - if !slices.Contains(zones, foundHost.Zone) { - errMsg := fmt.Sprintf("machine pool specifies zones %v but dedicated host %s is in zone %s", pool.Zones, host.ID, foundHost.Zone) + if len(pool.Zones) > 0 && !slices.Contains(pool.Zones, foundHost.Zone) { + errMsg := fmt.Sprintf("machine pool specifies zones %v but dedicated host %s is in zone %s", pool.Zones, host.ID, foundHost.Zone) allErrs = append(allErrs, field.Invalid(dhPath, host, errMsg)) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/asset/installconfig/aws/validation.go` around lines 569 - 581, The validation currently uses meta.AvailabilityZones(ctx) when pool.Zones is empty, causing mismatches between short AZs (a,b) and full zones (us-east-1a) and producing a confusing error that prints an empty pool.Zones; change the logic to only perform the dedicated-host zone membership check when pool.Zones is explicitly configured (i.e. len(pool.Zones) > 0) so that slices.Contains(pool.Zones, foundHost.Zone) is used only for user-provided zones, and update the field.Invalid call (dhPath, host, ...) to include the actual configured zones (pool.Zones) in the message rather than any metadata fallback.pkg/types/aws/machinepool.go (1)
238-239:⚠️ Potential issue | 🟡 MinorStale YAML path in doc comment.
The comment references
dedicatedHost.host.idbut the actual schema field isdedicatedHost[].id. This will propagate to generated docs.Suggested fix
- // HostAffinityDedicatedHost requires specifying a particular host via dedicatedHost.host.id. + // HostAffinityDedicatedHost requires specifying at least one host via dedicatedHost[].id. HostAffinityDedicatedHost HostAffinity = "DedicatedHost"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/types/aws/machinepool.go` around lines 238 - 239, The doc comment for the HostAffinityDedicatedHost constant is using a stale YAML path; update the comment text that currently references "dedicatedHost.host.id" to the correct schema path "dedicatedHost[].id" so generated docs reflect the real field; locate the constant HostAffinityDedicatedHost (type HostAffinity, value "DedicatedHost") and edit its preceding comment accordingly.pkg/asset/machines/aws/machines.go (1)
370-382:⚠️ Potential issue | 🟠 MajorSilent fallback when no host matches the zone may mask configuration errors.
When
placement != nilbut no host inplacement.DedicatedHostmatches the requested zone, this function returns an empty string. This causes the provider to generate an instance without dedicated host placement, silently falling back to shared tenancy.For a pool explicitly configured with
Affinity == DedicatedHost, this could launch instances without the intended dedicated host isolation. While upstream validation should catch zone mismatches, consider returning an error (or at least a secondary boolean indicating whether a match was found) to surface configuration issues more explicitly.🔧 Suggested approach
Consider changing the signature to return
(string, bool)or(string, error)to indicate whether a valid host was found:-func DedicatedHost(hosts map[string]aws.Host, placement *awstypes.HostPlacement, zone string) string { +func DedicatedHost(hosts map[string]aws.Host, placement *awstypes.HostPlacement, zone string) (string, error) { if placement != nil { for _, host := range placement.DedicatedHost { hostDetails, found := hosts[host.ID] if found && hostDetails.Zone == zone { - return hostDetails.ID + return hostDetails.ID, nil } } + // Placement configured but no host matches zone + return "", fmt.Errorf("no dedicated host found for zone %s", zone) } - return "" + return "", nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/asset/machines/aws/machines.go` around lines 370 - 382, The DedicatedHost function currently returns an empty string on no-match which silently falls back to shared tenancy; change DedicatedHost(hosts map[string]aws.Host, placement *awstypes.HostPlacement, zone string) to return a result indicator (either (string, bool) or (string, error)) so callers can detect no-match. Update the function to return the matched host ID and true (or id, nil) when a host.Zone == zone is found, and to return ("", false) or ("", descriptiveError) when placement != nil but no host matches; then update all callers of DedicatedHost to handle the new boolean/error return and fail fast or surface a clear configuration error when a dedicated-affinity pool did not find a host.pkg/types/aws/validation/machinepool.go (1)
104-108:⚠️ Potential issue | 🟡 MinorMisleading error message for forbidden
dedicatedHostfield.When
AffinityisAnyAvailableandDedicatedHostentries are present, the error message says:"dedicatedHost is required when 'affinity' is set to DedicatedHost, and forbidden otherwise"
This message is more appropriate for the
DedicatedHostaffinity case (line 111). ForAnyAvailable, usefield.Forbiddenwith a clearer message:🐛 Proposed fix
case aws.HostAffinityAnyAvailable: if len(p.HostPlacement.DedicatedHost) > 0 { - allErrs = append(allErrs, field.Required(fldPath.Child("dedicatedHost"), "dedicatedHost is required when 'affinity' is set to DedicatedHost, and forbidden otherwise")) + allErrs = append(allErrs, field.Forbidden(fldPath.Child("dedicatedHost"), "dedicatedHost must not be set when affinity is AnyAvailable")) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/types/aws/validation/machinepool.go` around lines 104 - 108, The current validation branch for aws.HostAffinityAnyAvailable incorrectly appends a field.Required error with a misleading message; update the branch handling p.HostPlacement.Affinity == aws.HostAffinityAnyAvailable to append field.Forbidden(fldPath.Child("dedicatedHost"), "<clear message>") instead, e.g., "dedicatedHost must not be set when affinity is AnyAvailable", so replace the field.Required call that references p.HostPlacement.DedicatedHost with field.Forbidden(...) and ensure the error targets fldPath.Child("dedicatedHost").
🧹 Nitpick comments (2)
pkg/types/aws/validation/machinepool_test.go (2)
382-391: Complex test defaulting logic could be simplified.The logic to determine
poolNamebased on error message content is fragile and harder to maintain. Consider explicitly settingpoolNamein each test case that needs a specific value, rather than inferring from the expected error string.♻️ Simpler approach
Set
poolName: types.MachinePoolComputeRoleNameexplicitly in test cases that require it, and remove the inference logic:for _, tc := range cases { t.Run(tc.name, func(t *testing.T) { poolName := tc.poolName - // If poolName is not specified, default to worker pool unless the test - // is explicitly testing control plane or edge pool behavior - if poolName == "" && tc.expected != "" && - !strings.Contains(tc.expected, "control plane") && - !strings.Contains(tc.expected, "edge pool") { - poolName = types.MachinePoolComputeRoleName - } else if poolName == "" && tc.expected == "" { + if poolName == "" { poolName = types.MachinePoolComputeRoleName }Then explicitly set
poolName: ""for the control plane test case (already done on line 363).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/types/aws/validation/machinepool_test.go` around lines 382 - 391, The test's fragile inference for poolName should be removed and tests should explicitly set the poolName in their test cases; replace the conditional block that mutates poolName based on tc.expected and strings.Contains with no-op, and update each test case struct to include poolName: types.MachinePoolComputeRoleName where the compute/worker default is required and poolName: "" for control-plane/edge tests; this makes the setup deterministic and you can delete the logic that inspects tc.expected and strings.Contains.
300-323: Misleading test names suggest format validation that doesn't exist.These test cases are named "valid dedicated host - 8 character format" but use IDs like
h-1a2b3c4d(10 chars total) which don't match the AWS format requirement documented inpkg/types/aws/machinepool.go:
- Pattern:
^h-[0-9a-f]{17}$- Length: exactly 19 characters
The tests pass because
validateHostPlacementonly checkslen(host.ID) == 0and doesn't validate the format. The kubebuilder annotations provide CRD-level validation, but the Go validation is lenient.Consider either:
- Rename tests to clarify they test "non-empty ID" rather than "valid format"
- Add format validation in
validateHostPlacementto match the documented constraints♻️ Option 1: Rename tests to reflect actual validation
- { - name: "valid dedicated host - 8 character format", + { + name: "valid dedicated host - non-empty ID", pool: &aws.MachinePool{ HostPlacement: &aws.HostPlacement{ Affinity: ptr.To(aws.HostAffinityDedicatedHost), DedicatedHost: []aws.DedicatedHost{ - {ID: "h-1a2b3c4d"}, + {ID: "h-09dcf61cb388b0149"}, }, }, }, },
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/asset/installconfig/aws/permissions.go`:
- Around line 581-583: RequiredPermissionGroups currently appends
PermissionDedicatedHosts for any HostPlacement via includesDedicatedHosts(ic)
but must also append PermissionDynamicHostAllocation only when the
HostPlacement's affinity is HostAffinityAnyAvailable (dynamically allocated
hosts that require ec2:ReleaseHosts). Update the logic in
RequiredPermissionGroups to inspect ic.Platform.AWS.HostPlacement (or the
function that reads HostPlacement used by includesDedicatedHosts) and: keep
appending PermissionDedicatedHosts when HostPlacement is present, and
additionally append PermissionDynamicHostAllocation only when
HostPlacement.Affinity == HostAffinityAnyAvailable; do not add
PermissionDynamicHostAllocation for HostAffinityDedicatedHost.
In `@pkg/types/aws/machinepool.go`:
- Around line 250-251: The ID field in the machine pool struct is marked
+required but its JSON tag uses omitempty; update the struct's ID field (the
symbol "ID" in the machine pool struct in machinepool.go) to remove omitempty so
the tag reads `json:"id"` (keep the `+required` comment intact) to match other
required fields like the one in platform.go.
---
Duplicate comments:
In `@pkg/asset/installconfig/aws/permissions.go`:
- Around line 793-805: includesDedicatedHosts only inspects pools named
types.MachinePoolComputeRoleName ("worker"), so other compute pools with
AWS.HostPlacement are ignored; change the logic in includesDedicatedHosts to
skip only edge pools (check for types.MachinePoolEdgeRoleName) instead of
skipping non-worker names, i.e., iterate all installConfig.Compute entries,
continue when mpool.Name == types.MachinePoolEdgeRoleName, and keep the existing
AWS/HostPlacement nil checks to return true when any non-edge pool specifies
dedicated hosts.
In `@pkg/asset/installconfig/aws/validation_test.go`:
- Around line 2431-2443: The helper withComputeHostPlacementAndZone currently
ignores its zone parameter; update it so that after setting HostPlacement it
also assigns the provided zone to the compute pool's AWS Zones (e.g., set
ic.Compute[index].Platform.AWS.Zones = []string{zone}) so tests that expect a
mismatch between pool zones and host zones are valid; alternatively, if the zone
is not needed, remove withComputeHostPlacementAndZone and use the existing
withComputeHostPlacement helper instead.
In `@pkg/asset/installconfig/aws/validation.go`:
- Around line 569-581: The validation currently uses meta.AvailabilityZones(ctx)
when pool.Zones is empty, causing mismatches between short AZs (a,b) and full
zones (us-east-1a) and producing a confusing error that prints an empty
pool.Zones; change the logic to only perform the dedicated-host zone membership
check when pool.Zones is explicitly configured (i.e. len(pool.Zones) > 0) so
that slices.Contains(pool.Zones, foundHost.Zone) is used only for user-provided
zones, and update the field.Invalid call (dhPath, host, ...) to include the
actual configured zones (pool.Zones) in the message rather than any metadata
fallback.
In `@pkg/asset/machines/aws/machines.go`:
- Around line 370-382: The DedicatedHost function currently returns an empty
string on no-match which silently falls back to shared tenancy; change
DedicatedHost(hosts map[string]aws.Host, placement *awstypes.HostPlacement, zone
string) to return a result indicator (either (string, bool) or (string, error))
so callers can detect no-match. Update the function to return the matched host
ID and true (or id, nil) when a host.Zone == zone is found, and to return ("",
false) or ("", descriptiveError) when placement != nil but no host matches; then
update all callers of DedicatedHost to handle the new boolean/error return and
fail fast or surface a clear configuration error when a dedicated-affinity pool
did not find a host.
In `@pkg/types/aws/machinepool.go`:
- Around line 238-239: The doc comment for the HostAffinityDedicatedHost
constant is using a stale YAML path; update the comment text that currently
references "dedicatedHost.host.id" to the correct schema path
"dedicatedHost[].id" so generated docs reflect the real field; locate the
constant HostAffinityDedicatedHost (type HostAffinity, value "DedicatedHost")
and edit its preceding comment accordingly.
In `@pkg/types/aws/validation/machinepool.go`:
- Around line 104-108: The current validation branch for
aws.HostAffinityAnyAvailable incorrectly appends a field.Required error with a
misleading message; update the branch handling p.HostPlacement.Affinity ==
aws.HostAffinityAnyAvailable to append
field.Forbidden(fldPath.Child("dedicatedHost"), "<clear message>") instead,
e.g., "dedicatedHost must not be set when affinity is AnyAvailable", so replace
the field.Required call that references p.HostPlacement.DedicatedHost with
field.Forbidden(...) and ensure the error targets
fldPath.Child("dedicatedHost").
---
Nitpick comments:
In `@pkg/types/aws/validation/machinepool_test.go`:
- Around line 382-391: The test's fragile inference for poolName should be
removed and tests should explicitly set the poolName in their test cases;
replace the conditional block that mutates poolName based on tc.expected and
strings.Contains with no-op, and update each test case struct to include
poolName: types.MachinePoolComputeRoleName where the compute/worker default is
required and poolName: "" for control-plane/edge tests; this makes the setup
deterministic and you can delete the logic that inspects tc.expected and
strings.Contains.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b7471aa8-4691-4293-aa40-f9992fca7013
⛔ Files ignored due to path filters (1)
pkg/types/aws/zz_generated.deepcopy.gois excluded by!**/zz_generated*
📒 Files selected for processing (20)
data/data/install.openshift.io_installconfigs.yamlpkg/asset/cluster/aws/aws.gopkg/asset/installconfig/aws/dedicatedhosts.gopkg/asset/installconfig/aws/metadata.gopkg/asset/installconfig/aws/permissions.gopkg/asset/installconfig/aws/validation.gopkg/asset/installconfig/aws/validation_test.gopkg/asset/machines/aws/machines.gopkg/asset/machines/aws/machinesets.gopkg/asset/machines/worker.gopkg/asset/machines/worker_test.gopkg/destroy/aws/ec2helpers.gopkg/types/aws/machinepool.gopkg/types/aws/validation/featuregates.gopkg/types/aws/validation/featuregates_test.gopkg/types/aws/validation/machinepool.gopkg/types/aws/validation/machinepool_test.gopkg/types/aws/validation/platform.gopkg/types/machinepools.gopkg/types/validation/machinepools.go
✅ Files skipped from review due to trivial changes (1)
- pkg/asset/installconfig/aws/metadata.go
🚧 Files skipped from review as they are similar to previous changes (4)
- pkg/asset/installconfig/aws/dedicatedhosts.go
- pkg/types/aws/validation/featuregates_test.go
- pkg/destroy/aws/ec2helpers.go
- pkg/types/aws/validation/featuregates.go
| if includesDedicatedHosts(ic) { | ||
| permissionGroups = append(permissionGroups, PermissionDedicatedHosts) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if PermissionDynamicHostAllocation is used elsewhere
rg -n "PermissionDynamicHostAllocation" --type goRepository: openshift/installer
Length of output: 473
🏁 Script executed:
#!/bin/bash
# Get context around line 442 where PermissionDynamicHostAllocation is used
sed -n '435,450p' pkg/asset/installconfig/aws/permissions.go
# Also check the PermissionDedicatedHosts definition
rg -n "PermissionDedicatedHosts" --type go -A 2 -B 2Repository: openshift/installer
Length of output: 2301
🏁 Script executed:
#!/bin/bash
# Check the includesDedicatedHosts function to understand what it checks
rg -n "func includesDedicatedHosts" --type go -A 10
# Check how RequiredPermissionGroups is used
rg -n "RequiredPermissionGroups" --type go -B 2 -A 2Repository: openshift/installer
Length of output: 31117
🏁 Script executed:
#!/bin/bash
# Search for dedicated hosts tests
rg -n "DedicatedHosts" pkg/asset/installconfig/aws/permissions_test.go -B 3 -A 10
# Also check if there's any logic about dynamic host allocation
rg -n "ReleaseHosts\|DynamicHostAllocation" pkg/asset/installconfig/aws/ -B 2 -A 2Repository: openshift/installer
Length of output: 45
🏁 Script executed:
#!/bin/bash
# Search more broadly for tests and usage of dedicated hosts
rg -n "includesDedicatedHosts" pkg/asset/installconfig/aws/ -B 2 -A 2
# Check the RequiredPermissionGroups function more carefully
sed -n '500,650p' pkg/asset/installconfig/aws/permissions.goRepository: openshift/installer
Length of output: 5696
🏁 Script executed:
#!/bin/bash
# Look for any logic about dynamic host allocation in the codebase
rg -n "dynamic.*host\|allocat.*host" --type go -i pkg/asset/installconfig/aws/
# Also search for any comments or code that explains the relationship
grep -rn "ReleaseHosts\|dynamic" pkg/asset/installconfig/aws/permissions.goRepository: openshift/installer
Length of output: 490
🏁 Script executed:
#!/bin/bash
# Search for any test files that might test dedicated hosts scenarios
find . -name "*test.go" -type f | xargs grep -l "DedicatedHosts\|dedicat" | head -5
# Check git history or comments around the PermissionDynamicHostAllocation to understand when it was added
rg -n "PermissionDynamicHostAllocation" pkg/asset/installconfig/aws/permissions.go -B 5 -A 5Repository: openshift/installer
Length of output: 1372
🏁 Script executed:
#!/bin/bash
# Check where RequiredPermissionGroups is used to confirm it's for both install and delete
rg -n "RequiredPermissionGroups" pkg/asset/ -B 1 -A 1 | head -40
# Also check if there's any code that allocates dynamic hosts
rg -n "allocat\|AllocateHosts" pkg/ --type go -i | head -20Repository: openshift/installer
Length of output: 3170
🏁 Script executed:
#!/bin/bash
# Search for any code that uses HostPlacement or allocates hosts
rg -n "HostPlacement\|allocat" pkg/asset/installconfig/aws/ --type go -i -B 2 -A 2
# Check if there are any tests that specifically test the dedicated hosts permission
rg -n "PermissionDedicatedHosts\|PermissionDynamicHostAllocation" pkg/asset/installconfig/aws/permissions_test.go -B 3 -A 3Repository: openshift/installer
Length of output: 45
🏁 Script executed:
#!/bin/bash
# Search for dedicated hosts in validation tests
rg -n "HostPlacement" pkg/ --type go -B 2 -A 2 | head -60
# Also check if there's any documentation or comments about when dynamic hosts are allocated
rg -n "HostPlacement\|DedicatedHost" pkg/types/aws/ --type go -B 2 -A 2Repository: openshift/installer
Length of output: 4394
🏁 Script executed:
#!/bin/bash
# Check the HostPlacement struct definition to understand dedicated vs dynamic hosts
rg -n "type HostPlacement\|DedicatedHost\|Affinity" pkg/types/aws/machinepool.go -B 1 -A 5 | head -80
# Also check if there's any code that creates/releases dedicated hosts
rg -n "AllocateHosts\|ReleaseHosts" pkg/ --type go -B 2 -A 2Repository: openshift/installer
Length of output: 45
🏁 Script executed:
#!/bin/bash
# Find the HostPlacement struct definition
find . -name "*.go" -type f | xargs grep -l "type HostPlacement struct" | head -3
# Then get its definition
rg -n "type HostPlacement struct" --type go -A 20
# Also search for what determines dynamic host allocation
rg -n "Affinity\|DedicatedHost" pkg/types/aws/ --type go | head -30Repository: openshift/installer
Length of output: 4548
🏁 Script executed:
#!/bin/bash
# Check what HostAffinity constants are defined
rg -n "HostAffinity\|AnyAvailable\|DedicatedHost" pkg/types/aws/machinepool.go | head -30
# Look for any code that handles destroy/cleanup of hosts
rg -n "destroy\|cleanup\|release" pkg/asset/installconfig/aws/permissions.go -i -B 2 -A 2Repository: openshift/installer
Length of output: 45
🏁 Script executed:
#!/bin/bash
# Check the HostAffinity constants
rg -n "const (" pkg/types/aws/machinepool.go -A 20 | grep -E "Affinity|Host|const"
# Also check the comment about when dynamic hosts are allocated in permissions.go
sed -n '438,445p' pkg/asset/installconfig/aws/permissions.go
# Let me check if there's any PR or documentation about this
git log --oneline -n 20 -- pkg/asset/installconfig/aws/permissions.go 2>/dev/null | head -10Repository: openshift/installer
Length of output: 591
🏁 Script executed:
#!/bin/bash
# Check the HostAffinity constants more carefully
sed -n '230,245p' pkg/types/aws/machinepool.go
# Also check the git log more carefully
git log --oneline -n 30 -- pkg/asset/installconfig/aws/permissions.go 2>/dev/nullRepository: openshift/installer
Length of output: 857
🏁 Script executed:
#!/bin/bash
# Now I understand the distinction - let me verify the logic
# Check if includesDedicatedHosts distinguishes between AnyAvailable and DedicatedHost
sed -n '794,806p' pkg/asset/installconfig/aws/permissions.go
# Also search for any other code that adds PermissionDynamicHostAllocation
rg -n "PermissionDynamicHostAllocation" pkg/asset/installconfig/aws/permissions.goRepository: openshift/installer
Length of output: 742
🏁 Script executed:
#!/bin/bash
# Final verification - check if there's a way to distinguish between affinity types in RequiredPermissionGroups
# or if the permissions are intentionally grouped together
rg -n "AnyAvailable\|HostAffinityAnyAvailable" pkg/asset/installconfig/aws/ --type go
# Also check if there are any recent changes or issues related to this
git log --all --grep="DynamicHostAllocation\|ReleaseHosts" --oneline 2>/dev/null | head -10Repository: openshift/installer
Length of output: 45
Missing PermissionDynamicHostAllocation for dynamic host cleanup at destroy time, but the suggested fix needs refinement.
RequiredPermissionGroups is used for both install and delete credential validation. When dedicated hosts are configured with AnyAvailable affinity, the system dynamically allocates hosts that must be released at destroy time via ec2:ReleaseHosts. However, the current code only appends PermissionDedicatedHosts.
The suggested fix requires refinement: includesDedicatedHosts() returns true for any HostPlacement configuration but doesn't distinguish between HostAffinityAnyAvailable (dynamically allocated, requiring release) and HostAffinityDedicatedHost (user-provided, no release needed). Simply appending PermissionDynamicHostAllocation whenever HostPlacement != nil would add unnecessary permissions for the DedicatedHost affinity case.
The fix should conditionally add PermissionDynamicHostAllocation only when affinity is set to AnyAvailable, not for all dedicated host placements.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/asset/installconfig/aws/permissions.go` around lines 581 - 583,
RequiredPermissionGroups currently appends PermissionDedicatedHosts for any
HostPlacement via includesDedicatedHosts(ic) but must also append
PermissionDynamicHostAllocation only when the HostPlacement's affinity is
HostAffinityAnyAvailable (dynamically allocated hosts that require
ec2:ReleaseHosts). Update the logic in RequiredPermissionGroups to inspect
ic.Platform.AWS.HostPlacement (or the function that reads HostPlacement used by
includesDedicatedHosts) and: keep appending PermissionDedicatedHosts when
HostPlacement is present, and additionally append
PermissionDynamicHostAllocation only when HostPlacement.Affinity ==
HostAffinityAnyAvailable; do not add PermissionDynamicHostAllocation for
HostAffinityDedicatedHost.
| // +required | ||
| ID string `json:"id,omitempty"` |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if other required fields in the codebase use omitempty
rg -n '\+required' -A1 --type go pkg/types/aws/ | grep -E 'omitempty|json:'Repository: openshift/installer
Length of output: 251
🏁 Script executed:
sed -n '248,253p' pkg/types/aws/machinepool.goRepository: openshift/installer
Length of output: 193
Remove omitempty from the required ID field's JSON tag.
The ID field is annotated with +required but includes omitempty in the JSON tag, which is contradictory—a required field should not be omitted when empty. This also inconsistent with other required fields in the same package (e.g., in platform.go), which use json:"id" without omitempty.
Suggested fix
// +required
- ID string `json:"id,omitempty"`
+ ID string `json:"id"`📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // +required | |
| ID string `json:"id,omitempty"` | |
| // +required | |
| ID string `json:"id"` |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/types/aws/machinepool.go` around lines 250 - 251, The ID field in the
machine pool struct is marked +required but its JSON tag uses omitempty; update
the struct's ID field (the symbol "ID" in the machine pool struct in
machinepool.go) to remove omitempty so the tag reads `json:"id"` (keep the
`+required` comment intact) to match other required fields like the one in
platform.go.
| } | ||
|
|
||
| // Check to make sure dedicated host is not owned by another cluster | ||
| clusterIDs := foundHost.Tags.GetClusterIDs(TagValueOwned) |
There was a problem hiding this comment.
| clusterIDs := foundHost.Tags.GetClusterIDs(TagValueOwned) | |
| clusterIDs := foundHost.Tags.GetClusterIDs(TagValueShared) |
There was a problem hiding this comment.
@patrickdillon This is supposed to be owned, not shared. If the dedicated host we find is "owned" by another cluster, we want to throw an error since we cannot assign an instance from this cluster onto the host owned by another cluster.
There was a problem hiding this comment.
Following up on the other comment I just posted, it does not seem to me like there would ever be a scenario where the host would have an owned tag. It seems like instances on the host would be owned.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: patrickdillon The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| } | ||
|
|
||
| if includesDedicatedHosts(ic) { | ||
| permissionGroups = append(permissionGroups, PermissionDedicatedHosts) |
There was a problem hiding this comment.
coderabbit points out that we're missing the ec2:ReleaseHosts permission that is needed for destroy. Not sure if it is best to handle that by appending the PermissionDynamicHostAllocation permission here too, or collapsing that permission into the PermissionDedicatedHosts perm group
There was a problem hiding this comment.
@patrickdillon I am not sure how we handle this one. These are dynamic in that it is not configured in the install config. I can ALWAYS add it but i am sure that is frowned upon. So what I did is just make sure the installer loops as expected when the permission is missing. What are your thoughts on this? I can add this to the above check during install for when someone is doing BYO DH, but they may not want to provide the release permission.
|
/retest |
1 similar comment
|
/retest |
| } | ||
|
|
||
| // tagSharedDedicatedHosts tags users BYO dedicated hosts so they are not destroyed by the Installer. | ||
| func tagSharedDedicatedHosts(ctx context.Context, clusterID string, installConfig *installconfig.InstallConfig) error { |
There was a problem hiding this comment.
It seems like the tags should be set by adding the tags on the compute node manifests... The compute node manifests would already be including the owned tag.
There was a problem hiding this comment.
actually after reviewing this more, I'm not sure we want a shared tag. happy to discuss more tomorrow.
There was a problem hiding this comment.
@patrickdillon not sure I follow. compute nodes are related to the AWS EC2 instances, not the hosts that the instances run on. Dedicated Hosts are host in AWS that the end user can allocate and use to assign node/machine instances onto. We are tagging them similar to subnets, iam profiles, etc to make sure we do not release them during cluster destroy since a shared (BYO) host may be used by another OCP cluster.
There was a problem hiding this comment.
Thanks, this helps with my understanding: let me see if I understand correctly:
- users provision/byo dedicated hosts
- openshift can then manage instances on those dedicated hosts, just like we would any other ec2 instances
- it seems like the important point is that we manage the lifecycle of those instances, which may already be happening if they are getting the appropriate tags
Based on your clarification, my feedback/questions/assumptions would be:
- since hosts are byo, they would never have the
ownedtag. we do not manage the lifecycle of the host - accordingly, we should never release hosts
- applying
sharedtags to hosts would follow the conventions of VPCs, but perhaps serves no functional purpose?
Let me expand on that last point: to me it seems like at most all we should be doing is applying (at install time) a shared label, and removing that shared label at destroy time. That said, if the label is not used for resource discovery, or by any controllers; at best the label is used for monitoring, reporting. I'm totally fine with that, and seems like a good idea; but does not seem necessary for functionality. Hope that makes sense.
I think we're chatting in an hour, looking forward to it!
There was a problem hiding this comment.
So no, my understanding is not correct! OpenShift actually does allocate dedicated hosts but only does that day2. In those cases, the dedicated host is rightfully considered owned
|
@vr4manta if you haven't captured this already, as followup you probably want destroy validation like we have with VPCs to make sure an owned dedicated host does not have any shared tags: installer/pkg/destroy/aws/aws.go Line 166 in 6d01753 |
|
/test shellcheck |
1 similar comment
|
/test shellcheck |
|
/lgtm |
|
@patrickdillon: This PR has been marked to be verified later by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Thanks @vr4manta for addressing the comments, and @patrickdillon for chatting today. this looks good to me. /lgtm |
|
/retest |
|
@vr4manta: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
ac6111f
into
openshift:main
SPLAT-2172
Changes
Dependencies
Notes
MAO and CAO changes are needed for it to fully work. For now, this PR is adding the ability to generate the needed outputs for bootstrapping.
Summary by CodeRabbit
New Features
Improvements
Tests