[release-4.18] OCPBUGS-86233: Fix ssh and password rollbacks#6096
Conversation
This change fixes the issue in SSH keys and user passwords that made the rollback useless as it tried to apply the new configuration instead of the previous one. Signed-off-by: Pablo Rodriguez Nava <git@amail.pablintino.eu>
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@openshift-cherrypick-robot: An error was encountered cloning bug for cherrypick for bug OCPBUGS-86233 on the Jira server at https://redhat.atlassian.net. No known errors were detected, please see the full error message for details. Full error message.
request failed. Please analyze the request body for more details. Status code: 400: {"errorMessages":[],"errors":{"comment":"Field does not support update 'comment'","issuelinks":"Field does not support update 'issuelinks'","worklog":"Field does not support update 'worklog'"}}
Please contact an administrator to resolve this issue, then request a bug refresh with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
/test unit |
HarshwardhanPatil07
left a comment
There was a problem hiding this comment.
Pre-merge Verification: PASSED
Cluster version: 4.18.0-0-2026-05-29-034344-test-ci-ln-qp7mvv2-latest
Steps Performed:
- Apply the first MC
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ cat << 'EOF' | oc apply -f -
# MC v1: The "good" config. Applied first and should succeed.
# It creates a file, a unit, and a dropin, plus sets a password and SSH key
# for the core user. All values contain "v1" as a marker so we can later
# verify which version is on disk.
#
# It also installs monitoring units (mco-drift-snapshot.*) that watch the
# tracked files and take timestamped snapshots to /tmp/drift-snapshots/
# whenever they change. This lets us inspect the file content during the
# brief rollback window when MC v2 fails.
#
# Expected flow:
# 1. Apply this MC -> node reaches Done
# 2. Apply MC v2 (test-mc-full-v2) -> OS update fails, MCD rolls back
# 3. The snapshot units capture the file states during rollback
# 4. Inspect /tmp/drift-snapshots/ on the node to verify rollback correctness
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: mc-tc-88821-rollback
spec:
config:
ignition:
version: 3.4.0
passwd:
users:
- name: core
# Password hash with "v1" marker at the end
passwordHash: $6$rounds=5000$saltsalt$abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSv1
# SSH key with "v1" marker in the email
sshAuthorizedKeys:
- ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDtest1234567890 testv1@mco
storage:
files:
# Test file with "v1" marker in content
- contents:
source: data:,mco-test-content-v1%0A
mode: 0644
overwrite: true
path: /etc/mco-test-file.conf
systemd:
units:
# Test unit with "v1" marker in environment variable
- contents: |
[Unit]
Description=MCO test unit
[Service]
Type=oneshot
ExecStart=/bin/true
Environment="MCO_TEST_UNIT=v1"
[Install]
WantedBy=multi-user.target
enabled: true
name: mco-test.service
# Test dropin on an existing unit with "v1" marker
- dropins:
- contents: |
[Service]
Environment="MCO_TEST=v1"
name: 99-mco-test.conf
name: wait-for-ipsec-connect.service
#
# --- Monitoring units below ---
# These units watch the tracked files and snapshot them to
# /tmp/drift-snapshots/<timestamp>/ whenever a change is detected.
# Each snapshot directory contains one file per tracked path.
# If a file is absent, the snapshot contains "ABSENT" instead.
#
# Snapshot service: copies all tracked files into a timestamped directory.
# The actual work runs in a background subshell so bash exits immediately,
# allowing PathChanged to re-arm and catch rapid successive writes.
- contents: |
[Unit]
Description=Snapshot all tracked files on change
[Service]
Type=forking
ExecStart=/bin/bash -c '\
( dir=/tmp/drift-snapshots/$(date +%%Y%%m%%d-%%H%%M%%S.%%N); \
mkdir -p $dir; \
for f in \
/etc/mco-test-file.conf \
/etc/mco-test-new-file.conf \
/etc/systemd/system/mco-test.service \
/etc/systemd/system/mco-test-new.service \
/etc/systemd/system/wait-for-ipsec-connect.service.d/99-mco-test.conf \
/etc/systemd/system/wait-for-ipsec-connect.service.d/99-mco-test-new.conf \
/home/core/.ssh/authorized_keys.d/ignition \
/etc/shadow; \
do \
name=$(echo $f | sed "s|/|_|g"); \
if [ -f "$f" ]; then \
cp "$f" "$dir/$name"; \
else \
echo "ABSENT" > "$dir/$name"; \
fi; \
done ) &'
name: mco-drift-snapshot.service
# Path unit: watches files that already exist for modifications.
# When any of them changes, triggers mco-drift-snapshot.service.
- contents: |
[Unit]
Description=Watch existing tracked files for changes
[Path]
PathChanged=/etc/mco-test-file.conf
PathChanged=/etc/systemd/system/mco-test.service
PathChanged=/etc/systemd/system/wait-for-ipsec-connect.service.d/99-mco-test.conf
PathChanged=/home/core/.ssh/authorized_keys.d/ignition
PathChanged=/etc/shadow
[Install]
WantedBy=multi-user.target
enabled: true
name: mco-drift-snapshot.path
# Path unit: watches for new files that don't exist yet in v1 but
# will be created by v2. Uses PathExists to trigger when they appear.
- contents: |
[Unit]
Description=Watch for new files being created
[Path]
PathExists=/etc/mco-test-new-file.conf
PathExists=/etc/systemd/system/mco-test-new.service
PathExists=/etc/systemd/system/wait-for-ipsec-connect.service.d/99-mco-test-new.conf
[Install]
WantedBy=multi-user.target
enabled: true
name: mco-drift-snapshot-new.path
# Service triggered when new files appear. Takes a snapshot and then
# starts the removal watcher so we also capture when they disappear.
- contents: |
[Unit]
Description=Snapshot when new files appear then watch for removal
[Service]
Type=oneshot
ExecStart=/bin/bash -c '\
/usr/bin/systemctl start mco-drift-snapshot.service; \
/usr/bin/systemctl start mco-drift-snapshot-removal.path'
name: mco-drift-snapshot-new.service
# Path unit: watches the new files for removal (PathChanged triggers
# when a watched file is deleted or modified after it existed).
- contents: |
[Unit]
Description=Watch for new files being removed
[Path]
PathChanged=/etc/mco-test-new-file.conf
PathChanged=/etc/systemd/system/mco-test-new.service
PathChanged=/etc/systemd/system/wait-for-ipsec-connect.service.d/99-mco-test-new.conf
[Install]
WantedBy=multi-user.target
name: mco-drift-snapshot-removal.path
# Service triggered when new files are removed. Takes a snapshot
# so we can verify they were correctly cleaned up during rollback.
- contents: |
[Unit]
Description=Snapshot when new files are removed
[Service]
Type=oneshot
ExecStart=/usr/bin/systemctl start mco-drift-snapshot.service
name: mco-drift-snapshot-removal.service
EOF
machineconfig.machineconfiguration.openshift.io/mc-tc-88821-rollback created
- Wait for mcp to get updated
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ oc get mcp
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-e7102a355d48dc73e7c003ef314e3771 True False False 3 3 3 0 4h10m
worker rendered-worker-8e1d35bbfd9fed02ec2fdbd55a4c963b True False False 3 3 3 0 4h10m
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ oc get mc mc-tc-88821-rollback
NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE
mc-tc-88821-rollback 3.4.0 139m
- Apply the second MC
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ cat << 'EOF' | oc apply -f -
# MC v2: The "bad" config. Applied after v1 to trigger a rollback.
# It modifies every field from v1 (changing "v1" markers to "v2") and also
# adds new resources that don't exist in v1 (a new file, a new unit, and
# a new dropin). This lets us test rollback of both modified and newly
# created resources.
#
# The osImageURL points to a non-bootable image that will pull successfully
# but fail during rpm-ostree rebase, triggering the MCD rollback path.
#
# After rollback, the monitoring units from v1 should capture snapshots
# showing whether files were correctly reverted to v1 content and whether
# newly created files were properly removed.
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: mc-tc-88821-rollback-v2
spec:
config:
ignition:
version: 3.4.0
passwd:
users:
- name: core
# Password hash changed: "v1" -> "v2" marker
passwordHash: $6$rounds=5000$saltsalt$zyxwvutsrqponmlkjihgfedcba9876543210ZYXWVUTSRQPONMLKJIv2
# SSH key changed: "v1" -> "v2" marker in email
sshAuthorizedKeys:
- ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDmodified9876543210 testv2@mco
storage:
files:
# Same file as v1, content changed: "v1" -> "v2"
- contents:
source: data:,mco-test-content-v2%0A
mode: 0644
overwrite: true
path: /etc/mco-test-file.conf
# New file that does not exist in v1.
# Should be removed during rollback.
- contents:
source: data:,mco-test-new-file-v2%0A
mode: 0644
overwrite: true
path: /etc/mco-test-new-file.conf
systemd:
units:
# Same unit as v1, environment changed: "v1" -> "v2"
- contents: |
[Unit]
Description=MCO test unit
[Service]
Type=oneshot
ExecStart=/bin/true
Environment="MCO_TEST_UNIT=v2"
[Install]
WantedBy=multi-user.target
enabled: true
name: mco-test.service
# New unit that does not exist in v1.
# Should be removed during rollback.
- contents: |
[Unit]
Description=MCO test new unit
[Service]
Type=oneshot
ExecStart=/bin/true
Environment="MCO_TEST_NEW_UNIT=v2"
[Install]
WantedBy=multi-user.target
enabled: true
name: mco-test-new.service
- dropins:
# Same dropin as v1, environment changed: "v1" -> "v2"
- contents: |
[Service]
Environment="MCO_TEST=v2"
name: 99-mco-test.conf
# New dropin that does not exist in v1.
# Should be removed during rollback.
- contents: |
[Service]
Environment="MCO_TEST_NEW_DROPIN=v2"
name: 99-mco-test-new.conf
name: wait-for-ipsec-connect.service
extensions: []
kernelArguments: []
osImageURL: quay.io/openshifttest/tc88821fakeimage:latest
EOF
machineconfig.machineconfiguration.openshift.io/mc-tc-88821-rollback-v2 created
- Wait for mcp to degrade
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ oc get mcp
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-e7102a355d48dc73e7c003ef314e3771 True False False 3 3 3 0 4h54m
worker rendered-worker-8e1d35bbfd9fed02ec2fdbd55a4c963b False True True 3 0 0 1 4h54m
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ oc get mcp -oyaml
---
- lastTransitionTime: "2026-05-29T08:25:55Z"
message: 'Node ip-10-0-12-87.us-east-2.compute.internal is reporting: "Failed
to update OS to quay.io/openshifttest/tc88821fakeimage:latest after retries:
timed out waiting for the condition"'
reason: 1 nodes are reporting degraded status on sync
status: "True"
---
- Check all rollback snapshots
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ NODE=${1:-$(oc get nodes -l node-role.kubernetes.io/worker -o jsonpath='{.items[0].metadata.name}')}
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ echo "=== /etc/mco-test-file.conf ==="
oc debug -q node/$NODE -- chroot /host bash -c 'for d in $(ls -d /tmp/drift-snapshots/* 2>/dev/null | sort); do echo "$(basename $d): $(cat $d/_etc_mco-test-file.conf 2>/dev/null)"; done' 2>/dev/null
=== /etc/mco-test-file.conf ===
---
20260529-082551.146434243: mco-test-content-v2
20260529-082551.443047820: mco-test-content-v2
20260529-082712.044953285: mco-test-content-v1
20260529-082712.681199530: mco-test-content-v1
20260529-082713.192486629: mco-test-content-v2
20260529-082713.560669350: mco-test-content-v2
---
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ echo "=== mco-test.service ==="
oc debug -q node/$NODE -- chroot /host bash -c 'for d in $(ls -d /tmp/drift-snapshots/* 2>/dev/null | sort); do echo "$(basename $d): $(grep MCO_TEST_UNIT $d/_etc_systemd_system_mco-test.service 2>/dev/null || cat $d/_etc_systemd_system_mco-test.service 2>/dev/null)"; done' 2>/dev/null
=== mco-test.service ===
---
20260529-083939.401241249: Environment="MCO_TEST_UNIT=v2"
20260529-084059.788320902: Environment="MCO_TEST_UNIT=v1"
20260529-084100.752959389: Environment="MCO_TEST_UNIT=v1"
20260529-084101.273528921: Environment="MCO_TEST_UNIT=v1"
20260529-084101.619461195: Environment="MCO_TEST_UNIT=v2"
---
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ echo "=== 99-mco-test.conf dropin ==="
oc debug -q node/$NODE -- chroot /host bash -c 'for d in $(ls -d /tmp/drift-snapshots/* 2>/dev/null | sort); do echo "$(basename $d): $(grep MCO_TEST $d/_etc_systemd_system_wait-for-ipsec-connect.service.d_99-mco-test.conf 2>/dev/null || cat $d/_etc_systemd_system_wait-for-ipsec-connect.service.d_99-mco-test.conf 2>/dev/null)"; done' 2>/dev/null
=== 99-mco-test.conf dropin ===
---
20260529-084508.510216175: Environment="MCO_TEST=v2"
20260529-084509.153056872: Environment="MCO_TEST=v2"
20260529-084629.747062852: Environment="MCO_TEST=v1"
20260529-084630.399965324: Environment="MCO_TEST=v1"
20260529-084630.953239712: Environment="MCO_TEST=v2"
20260529-084631.599856208: Environment="MCO_TEST=v2"
---
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ echo "=== /etc/mco-test-new-file.conf (new in v2) ==="
oc debug -q node/$NODE -- chroot /host bash -c 'for d in $(ls -d /tmp/drift-snapshots/* 2>/dev/null | sort); do echo "$(basename $d): $(cat $d/_etc_mco-test-new-file.conf 2>/dev/null)"; done' 2>/dev/null
=== /etc/mco-test-new-file.conf (new in v2) ===
---
20260529-083244.282660887: mco-test-new-file-v2
20260529-083405.490337404: mco-test-new-file-v2
20260529-083406.132941474: ABSENT
20260529-083406.692627335: mco-test-new-file-v2
20260529-083407.375242300: mco-test-new-file-v2
---
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ echo ""
echo "=== mco-test-new.service (new in v2) ==="
oc debug -q node/$NODE -- chroot /host bash -c 'for d in $(ls -d /tmp/drift-snapshots/* 2>/dev/null | sort); do echo "$(basename $d): $(grep MCO_TEST_NEW_UNIT $d/_etc_systemd_system_mco-test-new.service 2>/dev/null || cat $d/_etc_systemd_system_mco-test-new.service 2>/dev/null)"; done' 2>/dev/null
=== mco-test-new.service (new in v2) ===
---
20260529-084346.049496549: Environment="MCO_TEST_NEW_UNIT=v2"
20260529-084346.757110160: Environment="MCO_TEST_NEW_UNIT=v2"
20260529-084507.315193227: Environment="MCO_TEST_NEW_UNIT=v2"
20260529-084507.945510805: ABSENT
20260529-084508.510216175: Environment="MCO_TEST_NEW_UNIT=v2"
20260529-084509.153056872: Environment="MCO_TEST_NEW_UNIT=v2"
20260529-084629.747062852: Environment="MCO_TEST_NEW_UNIT=v2"
---
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ echo "=== 99-mco-test-new.conf dropin (new in v2) ==="
oc debug -q node/$NODE -- chroot /host bash -c 'for d in $(ls -d /tmp/drift-snapshots/* 2>/dev/null | sort); do echo "$(basename $d): $(grep MCO_TEST_NEW_DROPIN $d/_etc_systemd_system_wait-for-ipsec-connect.service.d_99-mco-test-new.conf 2>/dev/null || cat $d/_etc_systemd_system_wait-for-ipsec-connect.service.d_99-mco-test-new.conf 2>/dev/null)"; done' 2>/dev/null
=== 99-mco-test-new.conf dropin (new in v2) ===
---
20260529-085323.339467438: Environment="MCO_TEST_NEW_DROPIN=v2"
20260529-085443.869101355: Environment="MCO_TEST_NEW_DROPIN=v2"
20260529-085444.799920802: ABSENT
20260529-085445.326298291: ABSENT
20260529-085445.687670961: Environment="MCO_TEST_NEW_DROPIN=v2"
20260529-085606.604470438: Environment="MCO_TEST_NEW_DROPIN=v2"
---
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ echo "=== authorized_keys (ssh) ==="
oc debug -q node/$NODE -- chroot /host bash -c 'for d in $(ls -d /tmp/drift-snapshots/* 2>/dev/null | sort); do echo "$(basename $d): $(grep -o "test[^ ]*@mco[^ ]*" $d/_home_core_.ssh_authorized_keys.d_ignition 2>/dev/null)"; done' 2>/dev/null
=== authorized_keys (ssh) ===
---
20260529-084059.788320902: testv1@mco
testv2@mco
20260529-084100.752959389: testv1@mco
testv2@mco
---
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ echo "=== shadow (core password) ==="
oc debug -q node/$NODE -- chroot /host bash -c 'for d in $(ls -d /tmp/drift-snapshots/* 2>/dev/null | sort); do echo "$(basename $d): $(grep core $d/_etc_shadow 2>/dev/null | cut -d: -f2 | grep -o "..$")"; done' 2>/dev/null
=== shadow (core password) ===
---
20260529-083242.676187369: !!
v2
20260529-083243.328712284: !!
v2
20260529-083243.906153410: !!
v2
20260529-083244.282660887: !!
v2
20260529-083405.490337404: !!
v2
20260529-083406.132941474: !!
v2
20260529-083406.692627335: !!
v2
---
- Cleanup
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ oc delete mc mc-tc-88821-rollback-v2
machineconfig.machineconfiguration.openshift.io "mc-tc-88821-rollback-v2" deleted
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ oc patch node "$NODE" -p "{\"metadata\":{\"annotations\":{\"machineconfiguration.openshift.io/desiredConfig\":\"$(oc get node "$NODE" -o jsonpath='{.metadata.annotations.machineconfiguration\.openshift\.io/currentConfig}')\"}}}"
node/ip-10-0-12-87.us-east-2.compute.internal patched
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ oc get mcp
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-e7102a355d48dc73e7c003ef314e3771 True False False 3 3 3 0 5h18m
worker rendered-worker-8e1d35bbfd9fed02ec2fdbd55a4c963b True False False 3 3 3 0 5h18m
|
/verified by @HarshwardhanPatil07 here |
|
@HarshwardhanPatil07: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
isabella-janssen
left a comment
There was a problem hiding this comment.
/lgtm
/label backport-risk-assessed
Clean bp
|
Scheduling tests matching the |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: djoshy, isabella-janssen, openshift-cherrypick-robot The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@openshift-cherrypick-robot: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This is an automated cherry-pick of #6069
/assign pablintino