MCO-2167: Migrate alert test cases from the private repository#5781
MCO-2167: Migrate alert test cases from the private repository#5781sergiordlr wants to merge 2 commits intoopenshift:mainfrom
Conversation
|
@sergiordlr: This pull request references MCO-2167 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughAdds test infra for image layering and MCO alert validation: new constants and repo/tag helpers, an in-node OS image builder, node disruption/version helpers, Prometheus/Thanos monitoring utilities and alert filtering, a disruptive MCO alert suite, a helper script, and small pod/MCP utilities. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
0e3c210 to
fb1747e
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: sergiordlr The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (3)
test/extended-priv/mco_alerts.go (1)
294-303: Goroutine error handling may not fail the test properly.The goroutine uses
o.Expect(err).NotTo(o.HaveOccurred())but if this assertion fails, the main test may have already moved past the goroutine's execution context. Consider using a channel or different synchronization to propagate errors reliably.Additionally, the hardcoded 5-minute sleep (
sleep 300) should align with the 6-minute timeout on line 325.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended-priv/mco_alerts.go` around lines 294 - 303, The anonymous goroutine that calls node.DebugNodeWithChroot currently asserts with o.Expect inside the goroutine and uses a hardcoded "sleep 300"; change it to propagate success/error back to the test via a channel instead of calling o.Expect inside the goroutine and make the sleep duration configurable to match the test timeout. Create an errCh (chan error) and have the goroutine send the returned err (or nil on success) after it runs node.DebugNodeWithChroot and restarts kubelet (and set fixed = true on success); in the main test select on errCh with a timeout that matches the test's 6-minute timeout (or define a shared constant like kubeletRestartDelay and use it both in the chroot command and the select) so failures in node.DebugNodeWithChroot (and the restart) reliably fail the test.test/extended-priv/inside_node_containers.go (1)
346-361: Consider cleanup on failure to prevent disk space leaks.If
pushImage()fails, the locally built image remains on the node. Over multiple failed test runs, this could accumulate and consume node disk space.♻️ Suggested approach using defer
func (b *OsImageBuilderInNode) CreateAndDigestOsImage() (string, error) { if err := b.prepareEnvironment(); err != nil { return "", err } if err := b.buildImage(); err != nil { return "", err } + // Ensure local image is cleaned up regardless of push/digest outcome + defer func() { + if rmErr := b.removeImage(); rmErr != nil { + logger.Warnf("Failed to remove local image: %v", rmErr) + } + }() if err := b.pushImage(); err != nil { return "", err } - if err := b.removeImage(); err != nil { - return "", err - } return b.digestImage() }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended-priv/inside_node_containers.go` around lines 346 - 361, The CreateAndDigestOsImage flow can leak the local image if pushImage() fails; after a successful buildImage() call register a deferred cleanup by calling removeImage() (e.g., defer b.removeImage()) so the locally built image is removed on any subsequent error, then call pushImage() and finally digestImage(); ensure the deferred removeImage() only gets registered after buildImage() succeeds and propagate errors from pushImage(), digestImage(), and removeImage() appropriately.test/extended-priv/util/prometheus_monitoring.go (1)
189-197: Inconsistent command construction pattern.
GetAlertsconstructs the curl command as a shell string withsh -c, while other methods (InstantQuery,RangeQuery,queryRules) use argument arrays passed directly toRemoteShPod. This inconsistency reduces maintainability and makes reasoning about the code harder.Consider aligning with the array-based pattern used elsewhere:
♻️ Suggested refactor
func (pmo *PrometheusMonitor) GetAlerts() (string, error) { // We don't want to print the token pmo.ocClient.NotShowInfo() defer pmo.ocClient.SetShowInfo() - getCmd := "curl -k -s -H \"" + fmt.Sprintf("Authorization: Bearer %v", pmo.Token) + "\" " + pmo.url + monitorAlerts - return RemoteShPod(pmo.ocClient, monitorNamespace, "statefulsets/"+prometheusK8s, "sh", "-c", getCmd) + queryArgs := []string{"curl", "-k", "-s", "-H", fmt.Sprintf("Authorization: Bearer %v", pmo.Token), pmo.url + monitorAlerts} + return RemoteShPod(pmo.ocClient, monitorNamespace, "statefulsets/"+prometheusK8s, queryArgs...) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended-priv/util/prometheus_monitoring.go` around lines 189 - 197, GetAlerts currently builds a single shell string and invokes RemoteShPod via "sh -c", which is inconsistent with InstantQuery/RangeQuery/queryRules; change GetAlerts to construct an argument slice (e.g. "curl", "-k", "-s", "-H", fmt.Sprintf("Authorization: Bearer %v", pmo.Token), pmo.url+monitorAlerts) and call RemoteShPod with those args directly (preserving the pmo.ocClient.NotShowInfo()/defer SetShowInfo()), referencing PrometheusMonitor.GetAlerts, RemoteShPod, pmo.Token, monitorAlerts, monitorNamespace and prometheusK8s.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/extended-priv/inside_node_containers.go`:
- Line 172: The test currently logs the service account token via
logger.Debugf("SA TOKEN: %s", saToken), which exposes credentials; remove this
debug print or replace it with a non-sensitive message (e.g., log that the token
was obtained without printing its value) or explicitly mask/redact saToken
before logging; update the call site that references logger.Debugf and the
saToken local variable in inside_node_containers.go to avoid emitting the raw
token.
In `@test/extended-priv/mco_alerts.go`:
- Line 501: Fix the typo in the log messages that call
exutil.By(fmt.Sprintf(...)) where the string uses "Verfiy" — change it to
"Verify" in each occurrence (the exutil.By calls constructing messages with
fmt.Sprintf and params.stillPresentDuration and similar messages around the
alert checks); search for exutil.By(fmt.Sprintf("Verfiy")...) and replace
"Verfiy" with "Verify" in those messages so all log output is spelled correctly.
- Around line 434-436: The unsafe type assertions on alertJSON elements (e.g.,
alert := alertJSON[0]; annotationsMap :=
alert["annotations"].(map[string]interface{}) and similarly for labelsMap) can
panic if keys are missing or types differ; update these to use the comma-ok form
(v, ok := alert["annotations"]; annotationsMap, ok :=
v.(map[string]interface{})) and check ok (and presence) before proceeding, and
if the check fails call the test failure helper (e.g., t.Fatalf or
t.Fatalf-equivalent) with a clear message; apply the same defensive pattern to
the labelsMap extraction so the test fails gracefully instead of panicking.
- Around line 510-521: The function checkFixedAlert calls mcp.GetName() and runs
MCP-specific assertions even when mcp can be nil, causing a nil-pointer panic;
add a nil-check for the mcp parameter at the top of checkFixedAlert and skip or
short-circuit the MCP-specific block (the exutil.By/Eventually that calls
BeDegraded and the logger.Infof referencing mcp.GetName()) when mcp == nil,
while still running the alert existence check (the getAlertsByName/WithArguments
block) as before; ensure any log messages are adjusted so they don’t dereference
mcp and keep references to the identifiers checkFixedAlert, mcp.GetName,
BeDegraded, and getAlertsByName to locate the changes.
In `@test/extended-priv/node.go`:
- Around line 1395-1406: The CopyFromLocal call in BreakRebaseInNode ignores its
returned error, so if the file copy fails subsequent operations (involving
/tmp/rpm-ostree.broken and node.DebugNodeWithChroot) will run against a missing
file; update BreakRebaseInNode to capture and check the error from
node.CopyFromLocal (the brokenRpmOstree result from
generateTemplateAbsolutePath) and return it immediately (or wrap it with
context) before calling node.DebugNodeWithChroot, ensuring failures from
CopyFromLocal are surfaced.
In `@test/extended-priv/testdata/files/rpm-ostree-force-pivot-error.sh`:
- Around line 7-9: The script uses an invalid exit code and unquoted positional
parameters: replace the invalid `exit -1` with a valid exit status like `exit 1`
and change the invocation `/tmp/rpm-ostree $@` to quote the positional
parameters as `/tmp/rpm-ostree "$@"` so argument boundaries are preserved;
update these occurrences in the script (look for the `exit -1` token and the
`/tmp/rpm-ostree $@` invocation) to apply the fix.
---
Nitpick comments:
In `@test/extended-priv/inside_node_containers.go`:
- Around line 346-361: The CreateAndDigestOsImage flow can leak the local image
if pushImage() fails; after a successful buildImage() call register a deferred
cleanup by calling removeImage() (e.g., defer b.removeImage()) so the locally
built image is removed on any subsequent error, then call pushImage() and
finally digestImage(); ensure the deferred removeImage() only gets registered
after buildImage() succeeds and propagate errors from pushImage(),
digestImage(), and removeImage() appropriately.
In `@test/extended-priv/mco_alerts.go`:
- Around line 294-303: The anonymous goroutine that calls
node.DebugNodeWithChroot currently asserts with o.Expect inside the goroutine
and uses a hardcoded "sleep 300"; change it to propagate success/error back to
the test via a channel instead of calling o.Expect inside the goroutine and make
the sleep duration configurable to match the test timeout. Create an errCh (chan
error) and have the goroutine send the returned err (or nil on success) after it
runs node.DebugNodeWithChroot and restarts kubelet (and set fixed = true on
success); in the main test select on errCh with a timeout that matches the
test's 6-minute timeout (or define a shared constant like kubeletRestartDelay
and use it both in the chroot command and the select) so failures in
node.DebugNodeWithChroot (and the restart) reliably fail the test.
In `@test/extended-priv/util/prometheus_monitoring.go`:
- Around line 189-197: GetAlerts currently builds a single shell string and
invokes RemoteShPod via "sh -c", which is inconsistent with
InstantQuery/RangeQuery/queryRules; change GetAlerts to construct an argument
slice (e.g. "curl", "-k", "-s", "-H", fmt.Sprintf("Authorization: Bearer %v",
pmo.Token), pmo.url+monitorAlerts) and call RemoteShPod with those args directly
(preserving the pmo.ocClient.NotShowInfo()/defer SetShowInfo()), referencing
PrometheusMonitor.GetAlerts, RemoteShPod, pmo.Token, monitorAlerts,
monitorNamespace and prometheusK8s.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: de41ae65-949b-4bf6-a994-0efdf61054f6
📒 Files selected for processing (10)
test/extended-priv/const.gotest/extended-priv/containers.gotest/extended-priv/inside_node_containers.gotest/extended-priv/machineconfigpool.gotest/extended-priv/mco_alerts.gotest/extended-priv/node.gotest/extended-priv/testdata/files/rpm-ostree-force-pivot-error.shtest/extended-priv/util.gotest/extended-priv/util/pods.gotest/extended-priv/util/prometheus_monitoring.go
test/extended-priv/testdata/files/rpm-ostree-force-pivot-error.sh
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/extended-priv/mco_alerts.go`:
- Around line 279-280: Remove the MCP recovery check invoked by
checkFixedAlert(oc, coMcp, expectedAlertName) in these two specs (the calls at
the shown location and the similar one around lines 329-330); instead, either
delete the call or replace it with a check that only asserts the alert is no
longer firing (e.g., a helper that uses oc and expectedAlertName to verify the
alert is cleared) so you do not reintroduce unrelated MCP state assertions via
coMcp.
- Around line 362-367: The test is reading the RHCOS version from mcp which may
point to a non-CoreOS worker pool; change the node source to the
compact/coreOS-compatible MCP by using coMcp instead of mcp when calling
GetSortedNodesOrFail(), then call GetRHCOSVersion() on that node and keep the
existing error check and logger; update references of mcp.GetSortedNodesOrFail()
to coMcp.GetSortedNodesOrFail() (keeping the GetRHCOSVersion, o.Expect and
logger.Infof usage).
- Around line 478-502: Replace the fixed time.Sleep-based check with a polling
assertion that waits for the alert to transition to "firing": stop using
time.Sleep(params.pendingDuration) and the separate
o.Eventually(...).Should(o.HaveLen(1)) check; instead, use o.Eventually with a
function that calls getAlertsByName(oc, params.expectedAlertName), returns the
fetched alerts (or an error) and assert both that len(alerts) == 1 and that
alerts[0]["state"] == "firing" (or use o.HaveKeyWithValue on the returned
alert), so the test retries until the Prometheus evaluation actually sets state
to "firing" rather than relying on a single sleep. Ensure the code references
getAlertsByName, params.pendingDuration (remove its sleep usage),
alertJSON/alertErr, and the o.Eventually assertion to perform the combined
length+state check.
In `@test/extended-priv/node.go`:
- Around line 1402-1406: The shell pipeline passed to node.DebugNodeWithChroot
should fail fast so earlier failures (chmod/cp/nsenter) don't get masked by a
later successful command; change the "sh -c" invocation to use "sh -e -c" and
replace the semicolon-separated commands with && chaining so any command error
aborts the chain (update the command string used in node.DebugNodeWithChroot to
add -e and use && between "chmod +x /tmp/rpm-ostree.broken", "cp
/usr/bin/rpm-ostree /tmp/rpm-ostree", "nsenter --mount=/proc/1/ns/mnt mount
--bind /tmp/rpm-ostree.broken /usr/bin/rpm-ostree", and "restorecon -v
/usr/bin/rpm-ostree").
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e132a5ae-ff61-4356-89e0-a704ae5253c0
📒 Files selected for processing (3)
test/extended-priv/mco_alerts.gotest/extended-priv/node.gotest/extended-priv/testdata/files/rpm-ostree-force-pivot-error.sh
✅ Files skipped from review due to trivial changes (1)
- test/extended-priv/testdata/files/rpm-ostree-force-pivot-error.sh
| exutil.By("Check that the alert is not triggered anymore") | ||
| checkFixedAlert(oc, coMcp, expectedAlertName) |
There was a problem hiding this comment.
Don't reintroduce unrelated MCP recovery checks in these two specs.
Both tests skip the MCP degradation assertion when the alert fires, but these checkFixedAlert calls add it back on the recovery path. That makes the specs fail on unrelated pool state instead of only on the alert under test.
Proposed fix
- checkFixedAlert(oc, coMcp, expectedAlertName)
+ checkFixedAlert(oc, nil, expectedAlertName)
@@
- checkFixedAlert(oc, mcp, expectedAlertName)
+ checkFixedAlert(oc, nil, expectedAlertName)Also applies to: 329-330
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/extended-priv/mco_alerts.go` around lines 279 - 280, Remove the MCP
recovery check invoked by checkFixedAlert(oc, coMcp, expectedAlertName) in these
two specs (the calls at the shown location and the similar one around lines
329-330); instead, either delete the call or replace it with a check that only
asserts the alert is no longer firing (e.g., a helper that uses oc and
expectedAlertName to verify the alert is cleared) so you do not reintroduce
unrelated MCP state assertions via coMcp.
| g.It("[PolarionID:73841][OTP] KubeletHealthState alert [Disruptive]", func() { | ||
| var ( | ||
| node = mcp.GetSortedNodesOrFail()[0] | ||
| fixed = false |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '286,326p' test/extended-priv/mco_alerts.goRepository: openshift/machine-config-operator
Length of output: 2001
Fix data race on fixed variable using proper goroutine synchronization.
The variable fixed is written by a goroutine (line 296) and read by the main goroutine in the Eventually polling loop (line 325) without synchronization. This is a data race that will be caught by Go's race detector.
Use a channel to synchronize instead: initialize kubeletRestarted := make(chan struct{}), close it when the kubelet restarts, and check for closure in the Eventually function using a select statement.
Proposed fix
- fixed = false
+ kubeletRestarted = make(chan struct{})
@@
- fixed = true
+ close(kubeletRestarted)
@@
- o.Eventually(func() bool { return fixed }, "6m", "20s").Should(o.BeTrue(), "Kubelet service was not restarted")
+ o.Eventually(func() bool {
+ select {
+ case <-kubeletRestarted:
+ return true
+ default:
+ return false
+ }
+ }, "6m", "20s").Should(o.BeTrue(), "Kubelet service was not restarted")| exutil.By("Get the RHCOS version from a worker node for later use") | ||
| node := mcp.GetSortedNodesOrFail()[0] | ||
| rhcosVersion, err := node.GetRHCOSVersion() | ||
| o.Expect(err).NotTo(o.HaveOccurred(), | ||
| "Error getting the RHCOS version from node %s", node.GetName()) | ||
| logger.Infof("RHCOS version from node %s: %s\n", node.GetName(), rhcosVersion) |
There was a problem hiding this comment.
Read the RHCOS version from coMcp, not mcp.
GetCompactCompatiblePool() can still resolve to a worker pool without CoreOS nodes. In that topology this spec fails before it ever exercises the alert, which is exactly the case GetCoreOsCompatiblePool() was added to handle.
Proposed fix
- exutil.By("Get the RHCOS version from a worker node for later use")
- node := mcp.GetSortedNodesOrFail()[0]
+ exutil.By("Get the RHCOS version from a CoreOS node for later use")
+ node := coMcp.GetSortedNodesOrFail()[0]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| exutil.By("Get the RHCOS version from a worker node for later use") | |
| node := mcp.GetSortedNodesOrFail()[0] | |
| rhcosVersion, err := node.GetRHCOSVersion() | |
| o.Expect(err).NotTo(o.HaveOccurred(), | |
| "Error getting the RHCOS version from node %s", node.GetName()) | |
| logger.Infof("RHCOS version from node %s: %s\n", node.GetName(), rhcosVersion) | |
| exutil.By("Get the RHCOS version from a CoreOS node for later use") | |
| node := coMcp.GetSortedNodesOrFail()[0] | |
| rhcosVersion, err := node.GetRHCOSVersion() | |
| o.Expect(err).NotTo(o.HaveOccurred(), | |
| "Error getting the RHCOS version from node %s", node.GetName()) | |
| logger.Infof("RHCOS version from node %s: %s\n", node.GetName(), rhcosVersion) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/extended-priv/mco_alerts.go` around lines 362 - 367, The test is reading
the RHCOS version from mcp which may point to a non-CoreOS worker pool; change
the node source to the compact/coreOS-compatible MCP by using coMcp instead of
mcp when calling GetSortedNodesOrFail(), then call GetRHCOSVersion() on that
node and keep the existing error check and logger; update references of
mcp.GetSortedNodesOrFail() to coMcp.GetSortedNodesOrFail() (keeping the
GetRHCOSVersion, o.Expect and logger.Infof usage).
| if params.pendingDuration != 0 { | ||
| exutil.By("Verify that the alert is pending") | ||
| o.Expect(alert).To(o.HaveKeyWithValue("state", "pending"), | ||
| "Expected the alert's state to be 'pending', but it is not.") | ||
| logger.Infof("OK!\n") | ||
| } | ||
|
|
||
| exutil.By("Verify that the alert is in firing state") | ||
| if params.pendingDuration != 0 { | ||
| logger.Infof("Wait %s minutes until the alert is fired", params.pendingDuration) | ||
| time.Sleep(params.pendingDuration) | ||
| } | ||
|
|
||
| logger.Infof("Checking alert's state") | ||
| o.Eventually(func() ([]map[string]interface{}, error) { | ||
| alertJSON, alertErr = getAlertsByName(oc, params.expectedAlertName) | ||
| return alertJSON, alertErr | ||
| }, "15m", "20s").Should(o.HaveLen(1), | ||
| "Expected 1 %s alert and only 1 to be triggered!", params.expectedAlertName) | ||
|
|
||
| logger.Infof("Found %s alerts: %v", params.expectedAlertName, alertJSON) | ||
|
|
||
| alert = alertJSON[0] | ||
| o.Expect(alert).To(o.HaveKeyWithValue("state", "firing"), | ||
| "Expected the alert to report 'firing' state") |
There was a problem hiding this comment.
Wait for the firing transition, not just alert presence after a fixed sleep.
time.Sleep(params.pendingDuration) assumes the next Prometheus evaluation has already happened. If the alert is still pending on the next scrape, HaveLen(1) passes and the following state assertion flakes.
Proposed fix
- if params.pendingDuration != 0 {
- logger.Infof("Wait %s minutes until the alert is fired", params.pendingDuration)
- time.Sleep(params.pendingDuration)
- }
-
logger.Infof("Checking alert's state")
- o.Eventually(func() ([]map[string]interface{}, error) {
+ o.Eventually(func() (string, error) {
alertJSON, alertErr = getAlertsByName(oc, params.expectedAlertName)
- return alertJSON, alertErr
- }, "15m", "20s").Should(o.HaveLen(1),
- "Expected 1 %s alert and only 1 to be triggered!", params.expectedAlertName)
-
- logger.Infof("Found %s alerts: %v", params.expectedAlertName, alertJSON)
-
- alert = alertJSON[0]
- o.Expect(alert).To(o.HaveKeyWithValue("state", "firing"),
+ if alertErr != nil {
+ return "", alertErr
+ }
+ if len(alertJSON) != 1 {
+ return "", fmt.Errorf("expected 1 %s alert, got %d", params.expectedAlertName, len(alertJSON))
+ }
+ state, ok := alertJSON[0]["state"].(string)
+ if !ok {
+ return "", fmt.Errorf("alert state is missing or not a string")
+ }
+ return state, nil
+ }, "15m", "20s").Should(o.Equal("firing"),
"Expected the alert to report 'firing' state")
-
- logger.Infof("OK!\n")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/extended-priv/mco_alerts.go` around lines 478 - 502, Replace the fixed
time.Sleep-based check with a polling assertion that waits for the alert to
transition to "firing": stop using time.Sleep(params.pendingDuration) and the
separate o.Eventually(...).Should(o.HaveLen(1)) check; instead, use o.Eventually
with a function that calls getAlertsByName(oc, params.expectedAlertName),
returns the fetched alerts (or an error) and assert both that len(alerts) == 1
and that alerts[0]["state"] == "firing" (or use o.HaveKeyWithValue on the
returned alert), so the test retries until the Prometheus evaluation actually
sets state to "firing" rather than relying on a single sleep. Ensure the code
references getAlertsByName, params.pendingDuration (remove its sleep usage),
alertJSON/alertErr, and the o.Eventually assertion to perform the combined
length+state check.
| _, err := node.DebugNodeWithChroot("sh", "-c", | ||
| "chmod +x /tmp/rpm-ostree.broken; "+ | ||
| "cp /usr/bin/rpm-ostree /tmp/rpm-ostree; "+ | ||
| "nsenter --mount=/proc/1/ns/mnt mount --bind /tmp/rpm-ostree.broken /usr/bin/rpm-ostree; "+ | ||
| "restorecon -v /usr/bin/rpm-ostree") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '1396,1406p' test/extended-priv/node.go
sed -n '1,11p' test/extended-priv/testdata/files/rpm-ostree-force-pivot-error.shRepository: openshift/machine-config-operator
Length of output: 1027
Use -e flag and && chaining to fail fast on the rpm-ostree setup chain.
The current sh -c with semicolons only reports the status of restorecon, so a failed chmod or cp can still report success. This is risky because the injected wrapper delegates every non-rebase call to /tmp/rpm-ostree; if that backup copy was never created, the node ends up more broken than the test intends.
Proposed fix
- _, err := node.DebugNodeWithChroot("sh", "-c",
- "chmod +x /tmp/rpm-ostree.broken; "+
- "cp /usr/bin/rpm-ostree /tmp/rpm-ostree; "+
- "nsenter --mount=/proc/1/ns/mnt mount --bind /tmp/rpm-ostree.broken /usr/bin/rpm-ostree; "+
- "restorecon -v /usr/bin/rpm-ostree")
+ _, err := node.DebugNodeWithChroot("sh", "-ec",
+ "chmod +x /tmp/rpm-ostree.broken && "+
+ "cp /usr/bin/rpm-ostree /tmp/rpm-ostree && "+
+ "nsenter --mount=/proc/1/ns/mnt mount --bind /tmp/rpm-ostree.broken /usr/bin/rpm-ostree && "+
+ "restorecon -v /usr/bin/rpm-ostree")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| _, err := node.DebugNodeWithChroot("sh", "-c", | |
| "chmod +x /tmp/rpm-ostree.broken; "+ | |
| "cp /usr/bin/rpm-ostree /tmp/rpm-ostree; "+ | |
| "nsenter --mount=/proc/1/ns/mnt mount --bind /tmp/rpm-ostree.broken /usr/bin/rpm-ostree; "+ | |
| "restorecon -v /usr/bin/rpm-ostree") | |
| _, err := node.DebugNodeWithChroot("sh", "-ec", | |
| "chmod +x /tmp/rpm-ostree.broken && "+ | |
| "cp /usr/bin/rpm-ostree /tmp/rpm-ostree && "+ | |
| "nsenter --mount=/proc/1/ns/mnt mount --bind /tmp/rpm-ostree.broken /usr/bin/rpm-ostree && "+ | |
| "restorecon -v /usr/bin/rpm-ostree") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/extended-priv/node.go` around lines 1402 - 1406, The shell pipeline
passed to node.DebugNodeWithChroot should fail fast so earlier failures
(chmod/cp/nsenter) don't get masked by a later successful command; change the
"sh -c" invocation to use "sh -e -c" and replace the semicolon-separated
commands with && chaining so any command error aborts the chain (update the
command string used in node.DebugNodeWithChroot to add -e and use && between
"chmod +x /tmp/rpm-ostree.broken", "cp /usr/bin/rpm-ostree /tmp/rpm-ostree",
"nsenter --mount=/proc/1/ns/mnt mount --bind /tmp/rpm-ostree.broken
/usr/bin/rpm-ostree", and "restorecon -v /usr/bin/rpm-ostree").
|
manually launched jobs: |
6627065 to
10b0ea2
Compare
|
@sergiordlr: This pull request references MCO-2167 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (5)
test/extended-priv/mco_alerts.go (4)
279-280:⚠️ Potential issue | 🟠 MajorKeep these specs alert-scoped on recovery too.
Both tests intentionally skip MCP assertions when the alert fires. Passing
coMcp/mcphere reintroduces unrelated pool-state failures on the recovery path.Suggested fix
- checkFixedAlert(oc, coMcp, expectedAlertName) + checkFixedAlert(oc, nil, expectedAlertName) @@ - checkFixedAlert(oc, mcp, expectedAlertName) + checkFixedAlert(oc, nil, expectedAlertName)Also applies to: 329-330
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended-priv/mco_alerts.go` around lines 279 - 280, The call to checkFixedAlert is passing coMcp/mcp and thus reintroduces unrelated MCP assertions on recovery; change the calls (e.g. checkFixedAlert(oc, coMcp, expectedAlertName)) to the alert-scoped form that omits the MCP argument (e.g. checkFixedAlert(oc, expectedAlertName)) so recovery checks stay alert-scoped; make the same change at the other occurrence around lines 329-330.
286-302:⚠️ Potential issue | 🔴 CriticalSynchronize the kubelet restart flag.
fixedis written in a goroutine and polled from the main goroutine without synchronization, so this spec has a real data race and can flake under-race.Suggested fix
- fixed = false + kubeletRestarted = make(chan struct{}) @@ - fixed = true + close(kubeletRestarted) @@ - o.Eventually(func() bool { return fixed }, "6m", "20s").Should(o.BeTrue(), "Kubelet service was not restarted") + o.Eventually(func() bool { + select { + case <-kubeletRestarted: + return true + default: + return false + } + }, "6m", "20s").Should(o.BeTrue(), "Kubelet service was not restarted")Also applies to: 324-325
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended-priv/mco_alerts.go` around lines 286 - 302, The boolean variable "fixed" is written inside the goroutine that calls node.DebugNodeWithChroot and read from the main goroutine, causing a data race; fix by synchronizing: replace the shared "fixed" flag with a concurrency-safe mechanism such as an atomic boolean (e.g., atomic.StoreUint32/atomic.LoadUint32) or a channel/WaitGroup to signal completion from the goroutine, update the goroutine to signal (instead of setting "fixed" directly) and change the main goroutine polling to use atomic.Load or receive from the channel; update the occurrences around the goroutine and the reads of "fixed" (including the similar occurrences at lines referenced as 324-325) to use the chosen synchronization primitive.
485-502:⚠️ Potential issue | 🟠 MajorWait for the
firingtransition, not just alert presence.
Eventually(...).Should(o.HaveLen(1))exits as soon as the alert exists. If the rule is stillpendingon that scrape, the nextstate == "firing"assertion is racy.Suggested fix
exutil.By("Verify that the alert is in firing state") - if params.pendingDuration != 0 { - logger.Infof("Wait %s minutes until the alert is fired", params.pendingDuration) - time.Sleep(params.pendingDuration) - } - logger.Infof("Checking alert's state") - o.Eventually(func() ([]map[string]interface{}, error) { + o.Eventually(func() (string, error) { alertJSON, alertErr = getAlertsByName(oc, params.expectedAlertName) - return alertJSON, alertErr - }, "15m", "20s").Should(o.HaveLen(1), - "Expected 1 %s alert and only 1 to be triggered!", params.expectedAlertName) - - logger.Infof("Found %s alerts: %v", params.expectedAlertName, alertJSON) - - alert = alertJSON[0] - o.Expect(alert).To(o.HaveKeyWithValue("state", "firing"), - "Expected the alert to report 'firing' state") + if alertErr != nil { + return "", alertErr + } + if len(alertJSON) != 1 { + return "", fmt.Errorf("expected 1 %s alert, got %d", params.expectedAlertName, len(alertJSON)) + } + state, ok := alertJSON[0]["state"].(string) + if !ok { + return "", fmt.Errorf("alert state is missing or not a string") + } + return state, nil + }, "15m", "20s").Should(o.Equal("firing"), + "Expected the alert to report 'firing' state") logger.Infof("OK!\n")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended-priv/mco_alerts.go` around lines 485 - 502, The test currently waits only for presence via o.Eventually(...).Should(o.HaveLen(1)) which races with a pending alert; change the check to wait for the alert's state to become "firing" by polling getAlertsByName(oc, params.expectedAlertName) and returning the alert's state (or an error) from the Eventually closure, then assert Should(o.Equal("firing")) (or Should(o.ContainSubstring("firing")) if you return a string) instead of relying on HaveLen; reference the existing getAlertsByName call and the alert/state assertion flow (alertJSON, alert, params.expectedAlertName) so the test only proceeds when state == "firing".
362-364:⚠️ Potential issue | 🟠 MajorUse the CoreOS-compatible pool for
GetRHCOSVersion().
GetCompactCompatiblePool()can still resolve to a worker pool with non-CoreOS nodes. In that topology this fails before the alert behavior is exercised.Suggested fix
- exutil.By("Get the RHCOS version from a worker node for later use") - node := mcp.GetSortedNodesOrFail()[0] + exutil.By("Get the RHCOS version from a CoreOS node for later use") + node := coMcp.GetSortedNodesOrFail()[0]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended-priv/mco_alerts.go` around lines 362 - 364, The code currently picks a node via mcp.GetSortedNodesOrFail()[0] which can return non-CoreOS workers; replace that with the CoreOS-compatible pool accessor so GetRHCOSVersion() runs on a CoreOS node. Specifically, call mcp.GetCoreOSCompatiblePool().GetSortedNodesOrFail()[0] (or the project’s equivalent CoreOS-compatible pool method) and then invoke node.GetRHCOSVersion() on that node.test/extended-priv/node.go (1)
1402-1406:⚠️ Potential issue | 🟠 MajorFail fast when wiring the broken
rpm-ostreewrapper.With
sh -cand;, a failedchmod,cp, ormountcan be masked by a later successfulrestorecon. That can leave/usr/bin/rpm-ostreebound to a wrapper that delegates to a missing/tmp/rpm-ostree.Suggested fix
- _, err := node.DebugNodeWithChroot("sh", "-c", - "chmod +x /tmp/rpm-ostree.broken; "+ - "cp /usr/bin/rpm-ostree /tmp/rpm-ostree; "+ - "nsenter --mount=/proc/1/ns/mnt mount --bind /tmp/rpm-ostree.broken /usr/bin/rpm-ostree; "+ - "restorecon -v /usr/bin/rpm-ostree") + _, err := node.DebugNodeWithChroot("sh", "-ec", + "chmod +x /tmp/rpm-ostree.broken && "+ + "cp /usr/bin/rpm-ostree /tmp/rpm-ostree && "+ + "nsenter --mount=/proc/1/ns/mnt mount --bind /tmp/rpm-ostree.broken /usr/bin/rpm-ostree && "+ + "restorecon -v /usr/bin/rpm-ostree")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended-priv/node.go` around lines 1402 - 1406, The command string passed to DebugNodeWithChroot currently chains operations with ';' so a later success (restorecon) can hide earlier failures; update the invocation of DebugNodeWithChroot to make the shell fail fast—either prefix the command with "set -euo pipefail;" or replace ';' with '&&' between the steps (the sequence in DebugNodeWithChroot: "chmod +x /tmp/rpm-ostree.broken", "cp /usr/bin/rpm-ostree /tmp/rpm-ostree", "nsenter --mount=/proc/1/ns/mnt mount --bind /tmp/rpm-ostree.broken /usr/bin/rpm-ostree", "restorecon -v /usr/bin/rpm-ostree") so any failed step aborts immediately and prevents leaving /usr/bin/rpm-ostree bound to a non-existent target.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/extended-priv/mco_alerts.go`:
- Around line 294-303: The goroutine that stops kubelet via
node.DebugNodeWithChroot("sh", "-c", "systemctl stop kubelet.service; sleep 300;
systemctl start kubelet.service") can leave kubelet stopped if the test aborts
early; register a g.DeferCleanup that ensures "systemctl start kubelet.service"
is invoked on the node to restore state. Concretely, keep the stop logic in the
goroutine (or call node.DebugNodeWithChroot to stop), but immediately call
g.DeferCleanup with a closure that calls node.DebugNodeWithChroot("sh", "-c",
"systemctl start kubelet.service") (handle/log errors but don’t panic), so
kubelet is restarted even if checkFiredAlert or later assertions abort the spec;
reference node.DebugNodeWithChroot and g.DeferCleanup when making the change.
---
Duplicate comments:
In `@test/extended-priv/mco_alerts.go`:
- Around line 279-280: The call to checkFixedAlert is passing coMcp/mcp and thus
reintroduces unrelated MCP assertions on recovery; change the calls (e.g.
checkFixedAlert(oc, coMcp, expectedAlertName)) to the alert-scoped form that
omits the MCP argument (e.g. checkFixedAlert(oc, expectedAlertName)) so recovery
checks stay alert-scoped; make the same change at the other occurrence around
lines 329-330.
- Around line 286-302: The boolean variable "fixed" is written inside the
goroutine that calls node.DebugNodeWithChroot and read from the main goroutine,
causing a data race; fix by synchronizing: replace the shared "fixed" flag with
a concurrency-safe mechanism such as an atomic boolean (e.g.,
atomic.StoreUint32/atomic.LoadUint32) or a channel/WaitGroup to signal
completion from the goroutine, update the goroutine to signal (instead of
setting "fixed" directly) and change the main goroutine polling to use
atomic.Load or receive from the channel; update the occurrences around the
goroutine and the reads of "fixed" (including the similar occurrences at lines
referenced as 324-325) to use the chosen synchronization primitive.
- Around line 485-502: The test currently waits only for presence via
o.Eventually(...).Should(o.HaveLen(1)) which races with a pending alert; change
the check to wait for the alert's state to become "firing" by polling
getAlertsByName(oc, params.expectedAlertName) and returning the alert's state
(or an error) from the Eventually closure, then assert Should(o.Equal("firing"))
(or Should(o.ContainSubstring("firing")) if you return a string) instead of
relying on HaveLen; reference the existing getAlertsByName call and the
alert/state assertion flow (alertJSON, alert, params.expectedAlertName) so the
test only proceeds when state == "firing".
- Around line 362-364: The code currently picks a node via
mcp.GetSortedNodesOrFail()[0] which can return non-CoreOS workers; replace that
with the CoreOS-compatible pool accessor so GetRHCOSVersion() runs on a CoreOS
node. Specifically, call mcp.GetCoreOSCompatiblePool().GetSortedNodesOrFail()[0]
(or the project’s equivalent CoreOS-compatible pool method) and then invoke
node.GetRHCOSVersion() on that node.
In `@test/extended-priv/node.go`:
- Around line 1402-1406: The command string passed to DebugNodeWithChroot
currently chains operations with ';' so a later success (restorecon) can hide
earlier failures; update the invocation of DebugNodeWithChroot to make the shell
fail fast—either prefix the command with "set -euo pipefail;" or replace ';'
with '&&' between the steps (the sequence in DebugNodeWithChroot: "chmod +x
/tmp/rpm-ostree.broken", "cp /usr/bin/rpm-ostree /tmp/rpm-ostree", "nsenter
--mount=/proc/1/ns/mnt mount --bind /tmp/rpm-ostree.broken /usr/bin/rpm-ostree",
"restorecon -v /usr/bin/rpm-ostree") so any failed step aborts immediately and
prevents leaving /usr/bin/rpm-ostree bound to a non-existent target.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 629f9b19-1856-4600-bc20-96ee86a64e1b
📒 Files selected for processing (4)
test/extended-priv/mco_alerts.gotest/extended-priv/mco_machineconfignode.gotest/extended-priv/node.gotest/extended-priv/testdata/files/rpm-ostree-force-pivot-error.sh
💤 Files with no reviewable changes (1)
- test/extended-priv/mco_machineconfignode.go
✅ Files skipped from review due to trivial changes (1)
- test/extended-priv/testdata/files/rpm-ostree-force-pivot-error.sh
| exutil.By("Break kubelet") | ||
| // We stop the kubelet service to break the node and after 5 minutes we start it again to fix the node | ||
| go func() { | ||
| defer g.GinkgoRecover() | ||
| _, err := node.DebugNodeWithChroot("sh", "-c", "systemctl stop kubelet.service; sleep 300; systemctl start kubelet.service") | ||
| o.Expect(err).NotTo(o.HaveOccurred(), | ||
| "Error stopping and restarting kubelet in %s", node) | ||
| logger.Infof("Kubelet service has been restarted again") | ||
| fixed = true | ||
| }() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's check the file exists and read the specified section
head -n 310 test/extended-priv/mco_alerts.go | tail -n 30Repository: openshift/machine-config-operator
Length of output: 1880
🏁 Script executed:
# Let's also see more context around this section
sed -n '280,320p' test/extended-priv/mco_alerts.goRepository: openshift/machine-config-operator
Length of output: 1880
🏁 Script executed:
# Search for DeferCleanup usage in test files to understand the pattern
rg "DeferCleanup" test/ -A 2 -B 1 --max-count 10Repository: openshift/machine-config-operator
Length of output: 1880
🏁 Script executed:
# Check if checkFiredAlert aborts/fails the test
rg "checkFiredAlert" test/extended-priv/mco_alerts.go -B 2 -A 5Repository: openshift/machine-config-operator
Length of output: 1880
🏁 Script executed:
# Look for other kubelet or similar service disruption patterns in tests
rg "systemctl (stop|start)" test/ -B 3 -A 3Repository: openshift/machine-config-operator
Length of output: 1880
🌐 Web query:
Ginkgo DeferCleanup pattern Go testing
💡 Result:
The DeferCleanup pattern in Ginkgo (v2+) is a cleanup mechanism for Go tests that registers dynamic cleanup functions (AfterEach/AfterAll/AfterSuite equivalents) called right after the spec's It subject completes but before static AfterEach/JustAfterEach nodes. It runs in LIFO order among DeferCleanups and after all After* nodes, simplifying pairing setup with cleanup without distant AfterEach blocks. Key benefits: - Keeps cleanup close to setup code (e.g., in BeforeEach or It). - Avoids closure variables for state passing. - Supports context.Context (SpecContext) for timeouts/interrupts. - Function returns 0 or 1 value (error); non-nil error fails the spec. - Enables full testing.TB compatibility (e.g., t.Cleanup via GinkgoT.Cleanup). Example from official docs: Describe("Reporting book weight", func { BeforeEach(func { originalWeightUnits := os.Getenv("WEIGHT_UNITS") os.Setenv("WEIGHT_UNITS", "oz") DeferCleanup(func { err := os.Setenv("WEIGHT_UNITS", originalWeightUnits) Expect(err).NotTo(HaveOccurred) }) }) It("...", func { ... }) }) Here, DeferCleanup restores the env var immediately after setup, closer to where it's changed. Another example (file cleanup): It("Should cleanup the created file", func { file, err := os.Create("test.txt") Expect(err).NotTo(HaveOccurred) DeferCleanup(os.Remove, "test.txt") // Use file... }) Execution order: BeforeEach(s) -> JustBeforeEach(s) -> It -> DeferCleanup(s) (LIFO) -> AfterEach(s) -> JustAfterEach(s). Mixing with After* can lead to surprises (DeferCleanup always last). Use DeferCleanup over AfterEach for most cases, especially integration tests needing resource teardown (files, servers, DBs). Docs recommend it as cleaner/simpler.
Citations:
🌐 Web query:
Ginkgo test cleanup early abort assertions
💡 Result:
In the Ginkgo BDD testing framework for Go, when an assertion fails (via Fail or a failing Gomega Expect), Ginkgo immediately aborts the current node (It, BeforeEach, etc.) by panicking. It rescues the panic, marks the spec as failed, and proceeds to run cleanup: all registered AfterEach, JustAfterEach, and DeferCleanup functions. This ensures resources are cleaned up even on early abort. Key mechanisms: - Fail/Expect panics to fast-fail the node (cannot be disabled). - Cleanup runs reliably post-abort via panic recovery. - Use DeferCleanup(setupFunc) right after setup for paired cleanup (Ginkgo v2+). - For goroutines with assertions: defer GinkgoRecover to allow proper failure propagation without suite panic. - Timeouts (SpecTimeout/NodeTimeout) also interrupt and trigger cleanup. Official docs confirm: "When a failure occurs in a BeforeEach, JustBeforeEach, or It closure Ginkgo halts execution of the current spec and cleans up by invoking any registered AfterEach or JustAfterEach closures (and any registered DeferCleanup closures if applicable)."
Citations:
- 1: https://onsi.github.io/ginkgo/
- 2: http://onsi.github.io/ginkgo
- 3: RFC: Ginkgo Timeout and Cleanup Proposal onsi/ginkgo#1041
- 4: GinkgoRecover not aborting the current It test node when Expect in goroutine fails onsi/ginkgo#1114
- 5: Ginkgo stops processing in middle of JustAfterEach after failed assertion in JustBeforeEach onsi/ginkgo#959
- 6: https://onsi.github.io/ginkgo/MIGRATING_TO_V2
Add failure-path cleanup for the kubelet disruption.
If checkFiredAlert or any later assertion aborts the spec early, the suite moves on while kubelet is still stopped for up to five minutes. In this serial file, that cascades into unrelated failures. Use g.DeferCleanup to ensure kubelet restarts even when an assertion fails.
Suggested fix
exutil.By("Break kubelet")
+ g.DeferCleanup(func() {
+ _, _ = node.DebugNodeWithChroot("systemctl", "start", "kubelet.service")
+ o.Eventually(node, "6m", "20s").Should(HaveConditionField("Ready", "status", TrueString),
+ "Node %s didn't become ready during cleanup", node)
+ })
// We stop the kubelet service to break the node and after 5 minutes we start it again to fix the node
go func() {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/extended-priv/mco_alerts.go` around lines 294 - 303, The goroutine that
stops kubelet via node.DebugNodeWithChroot("sh", "-c", "systemctl stop
kubelet.service; sleep 300; systemctl start kubelet.service") can leave kubelet
stopped if the test aborts early; register a g.DeferCleanup that ensures
"systemctl start kubelet.service" is invoked on the node to restore state.
Concretely, keep the stop logic in the goroutine (or call
node.DebugNodeWithChroot to stop), but immediately call g.DeferCleanup with a
closure that calls node.DebugNodeWithChroot("sh", "-c", "systemctl start
kubelet.service") (handle/log errors but don’t panic), so kubelet is restarted
even if checkFiredAlert or later assertions abort the spec; reference
node.DebugNodeWithChroot and g.DeferCleanup when making the change.
10b0ea2 to
7479576
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
test/extended-priv/node.go (1)
1402-1406:⚠️ Potential issue | 🟠 MajorUse
&&chaining to fail fast on the setup chain.The shell command still uses semicolons. If
chmodorcpfails, execution continues—potentially leaving the node broken without a valid/tmp/rpm-ostreebackup (which the injected script delegates to for non-rebase calls).🐛 Proposed fix
_, err := node.DebugNodeWithChroot("sh", "-c", - "chmod +x /tmp/rpm-ostree.broken; "+ - "cp /usr/bin/rpm-ostree /tmp/rpm-ostree; "+ - "nsenter --mount=/proc/1/ns/mnt mount --bind /tmp/rpm-ostree.broken /usr/bin/rpm-ostree; "+ + "chmod +x /tmp/rpm-ostree.broken && "+ + "cp /usr/bin/rpm-ostree /tmp/rpm-ostree && "+ + "nsenter --mount=/proc/1/ns/mnt mount --bind /tmp/rpm-ostree.broken /usr/bin/rpm-ostree && "+ "restorecon -v /usr/bin/rpm-ostree")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended-priv/node.go` around lines 1402 - 1406, The shell command passed to node.DebugNodeWithChroot uses semicolons which allows later commands to run even if earlier ones fail; change the command string to use && between the commands ("chmod +x ... && cp ... && nsenter ... && restorecon ...") so the chain fails fast on the first error, preserving the same arguments to node.DebugNodeWithChroot and keeping the overall command quoted as a single parameter.
🧹 Nitpick comments (2)
test/extended-priv/inside_node_containers.go (2)
297-298: Incomplete doc comment.The comment says "by:" but doesn't list the steps. Consider completing the description or removing "by:".
📝 Suggested fix
-// removeImage removes the built OS image from the node to free up space by: +// removeImage removes the built OS image from the node to free up space. +// It uses "podman manifest rm" for manifest builds or "podman rmi" for single images.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended-priv/inside_node_containers.go` around lines 297 - 298, The doc comment for the method removeImage on OsImageBuilderInNode is incomplete ("by:" with no steps); either complete the comment by describing the specific steps the function performs to remove the built OS image (e.g., stop containers, delete files, free space) or remove the trailing "by:" so the comment is a concise one-line summary; update the comment immediately above func (b *OsImageBuilderInNode) removeImage() error and ensure it accurately reflects the implementation.
320-321: Incomplete doc comment.Same issue as
removeImage- the comment says "by:" but doesn't complete the description.📝 Suggested fix
-// digestImage inspects the pushed image and returns its digest reference by: +// digestImage inspects the pushed image using skopeo and returns its digest reference +// in the format Name@Digest.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended-priv/inside_node_containers.go` around lines 320 - 321, The doc comment for digestImage is incomplete ("by:") similar to removeImage; update the comment above the digestImage method of OsImageBuilderInNode to be a complete sentence describing what the function does (e.g., "digestImage inspects the pushed image and returns its digest reference by inspecting the image manifest and extracting the digest") or remove the trailing "by:" so the comment reads: "digestImage inspects the pushed image and returns its digest reference." Ensure the comment matches the intent of the function and follows Go doc comment style for digestImage (and mirror the fix for removeImage if needed).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@test/extended-priv/node.go`:
- Around line 1402-1406: The shell command passed to node.DebugNodeWithChroot
uses semicolons which allows later commands to run even if earlier ones fail;
change the command string to use && between the commands ("chmod +x ... && cp
... && nsenter ... && restorecon ...") so the chain fails fast on the first
error, preserving the same arguments to node.DebugNodeWithChroot and keeping the
overall command quoted as a single parameter.
---
Nitpick comments:
In `@test/extended-priv/inside_node_containers.go`:
- Around line 297-298: The doc comment for the method removeImage on
OsImageBuilderInNode is incomplete ("by:" with no steps); either complete the
comment by describing the specific steps the function performs to remove the
built OS image (e.g., stop containers, delete files, free space) or remove the
trailing "by:" so the comment is a concise one-line summary; update the comment
immediately above func (b *OsImageBuilderInNode) removeImage() error and ensure
it accurately reflects the implementation.
- Around line 320-321: The doc comment for digestImage is incomplete ("by:")
similar to removeImage; update the comment above the digestImage method of
OsImageBuilderInNode to be a complete sentence describing what the function does
(e.g., "digestImage inspects the pushed image and returns its digest reference
by inspecting the image manifest and extracting the digest") or remove the
trailing "by:" so the comment reads: "digestImage inspects the pushed image and
returns its digest reference." Ensure the comment matches the intent of the
function and follows Go doc comment style for digestImage (and mirror the fix
for removeImage if needed).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: bf9a33a5-b95d-47da-8051-87eef79f4639
📒 Files selected for processing (5)
test/extended-priv/inside_node_containers.gotest/extended-priv/mco_alerts.gotest/extended-priv/mco_machineconfignode.gotest/extended-priv/node.gotest/extended-priv/testdata/files/rpm-ostree-force-pivot-error.sh
💤 Files with no reviewable changes (1)
- test/extended-priv/mco_machineconfignode.go
✅ Files skipped from review due to trivial changes (1)
- test/extended-priv/testdata/files/rpm-ostree-force-pivot-error.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- test/extended-priv/mco_alerts.go
| return nil | ||
| } | ||
|
|
||
| // removeImage removes the built OS image from the node to free up space by: |
There was a problem hiding this comment.
This comment seems like it is missing a list of steps after by:
| return nil | ||
| } | ||
|
|
||
| // digestImage inspects the pushed image and returns its digest reference by: |
There was a problem hiding this comment.
This comment seems like it is missing a list of steps after by:
|
@sergiordlr: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
- What I did
Migrate the MCO alerts test cases from the private repository
Remove test case "[PolarionID:69184][OTP] Enable feature gate MachineConfigNodes [apigroup:machineconfiguration.openshift.io]" since MachineConfigNodes featuregate doesn't exist anymore.
- How to verify it
All tests must pass