Skip to content

OCPBUGS-81715: Fix monitoring of udev using nsenter#600

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
gnufied:fix-udev-monitoring
Apr 14, 2026
Merged

OCPBUGS-81715: Fix monitoring of udev using nsenter#600
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
gnufied:fix-udev-monitoring

Conversation

@gnufied
Copy link
Copy Markdown
Member

@gnufied gnufied commented Apr 5, 2026

https://redhat.atlassian.net/browse/OCPBUGS-81715

Summary by CodeRabbit

  • Refactor
    • Improved disk device monitoring with safer process management and better shutdown coordination
    • Emits clearer runtime warnings when required system capabilities aren’t available and skips monitoring gracefully
    • Streams diagnostic output from the device monitor into warnings for faster troubleshooting
    • More reliable and responsive device detection in containerized and isolated environments when supported
    • Better resource handling and graceful error recovery during device discovery and monitoring

@openshift-ci openshift-ci bot requested review from RomanBednar and mpatlasov April 5, 2026 21:31
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 5, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gnufied

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 5, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 5b66134e-3c91-45b6-99d2-8e27214e718f

📥 Commits

Reviewing files that changed from the base of the PR and between 6c7841b and f8dcb15.

📒 Files selected for processing (1)
  • pkg/diskmaker/discovery/monitor.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/diskmaker/discovery/monitor.go

Walkthrough

rawUdevBlockMonitor now uses a new factory, newUdevMonitorCommand(), to build the udevadm command (optionally wrapped with nsenter). It opens and scans the command's stderr in a goroutine, logs stderr lines as warnings, and synchronizes shutdown with cmd.Wait() and the scanner completion.

Changes

Cohort / File(s) Summary
Monitor Command Initialization & Stderr Handling
pkg/diskmaker/discovery/monitor.go
Added newUdevMonitorCommand() to detect nsenter and return a wrapped exec.Cmd (or nil if absent). Updated rawUdevBlockMonitor() to use the factory, bail when nil, attach StderrPipe(), spawn a stderr-scanning goroutine that logs lines, and wait for both cmd and scanner to finish.

Sequence Diagram

sequenceDiagram
    participant Monitor as rawUdevBlockMonitor
    participant Factory as newUdevMonitorCommand
    participant Lookup as exec.LookPath(nsenter)
    participant Cmd as udevadm Process
    participant Stderr as Stderr Scanner (goroutine)

    Monitor->>Factory: Request udevadm command
    Factory->>Lookup: Check nsenter
    alt nsenter found
        Lookup-->>Factory: path
        Factory->>Cmd: return nsenter-wrapped exec.Cmd
    else nsenter missing
        Lookup-->>Factory: not found
        Factory-->>Monitor: nil (warning)
        Monitor-->>Monitor: exit without starting monitor
    end

    Monitor->>Cmd: Open StderrPipe() and Start()
    Monitor->>Stderr: spawn scanner goroutine
    Cmd-->>Stderr: write stderr lines
    Stderr->>Monitor: log warnings per line
    Monitor->>Cmd: defer cmd.Wait()
    Cmd-->>Monitor: exit
    Monitor->>Stderr: wait for scanner completion
    Stderr-->>Monitor: done
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Ote Binary Stdout Contract ❓ Inconclusive The verification cannot be completed without access to the actual shell command outputs showing klog configuration and function initialization call chains. Execute the provided shell commands to retrieve the actual code contents and analyze whether klog is configured before monitor functions are called.
✅ Passed checks (8 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main change: fixing monitoring of udev using nsenter. This directly corresponds to the implementation change where rawUdevBlockMonitor now builds its command via newUdevMonitorCommand() which uses nsenter when available.
Stable And Deterministic Test Names ✅ Passed Pull request modifies only production code in monitor.go; related test file uses standard Go testing, not Ginkgo patterns, so custom check is not applicable.
Test Structure And Quality ✅ Passed This PR modifies only implementation code in monitor.go without adding or modifying any Ginkgo test files, making test quality assessment not applicable.
Microshift Test Compatibility ✅ Passed This PR does not add any new Ginkgo e2e tests. Changes are confined to monitor.go implementation file.
Single Node Openshift (Sno) Test Compatibility ✅ Passed This pull request modifies monitor.go but does not add any Ginkgo e2e tests.
Topology-Aware Scheduling Compatibility ✅ Passed The PR modifies udev monitoring and device discovery code with improved command construction and stderr handling, introducing no topology-aware scheduling constraints.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This pull request does not add new Ginkgo e2e tests. Changes are confined to pkg/diskmaker/discovery/monitor.go utility file.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 5, 2026
@gnufied gnufied changed the title Fix monitoring of udev using nsenter OCPBUGS-81715: Fix monitoring of udev using nsenter Apr 5, 2026
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Apr 5, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@gnufied: This pull request references Jira Issue OCPBUGS-81715, which is invalid:

  • expected the bug to target the "4.22.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

https://redhat.atlassian.net/browse/OCPBUGS-81715

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/diskmaker/discovery/monitor.go`:
- Around line 61-77: When cmd.Start() (in the udev monitor startup around
stdout/stderr handling) fails due to nsenter runtime issues, retry
creating/starting the monitor using the direct udevadm invocation instead of
bailing out; modify newUdevMonitorCommand() or the startup logic to detect when
the returned *exec.Cmd wraps nsenter (e.g., command path or args include
"nsenter") and on Start error attempt to construct a fallback cmd that directly
runs "udevadm monitor --udev --subsystem-match=block" (or the existing
non-nsenter branch) and Start that new cmd, logging both attempts and only
returning when both fail so the caller can decide whether to disable udevEvents.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 66551c15-ae94-4369-90d8-ba53af1d34c8

📥 Commits

Reviewing files that changed from the base of the PR and between 3f95af7 and a2ddd79.

📒 Files selected for processing (2)
  • Dockerfile.diskmaker.rhel7
  • pkg/diskmaker/discovery/monitor.go

Comment thread pkg/diskmaker/discovery/monitor.go
Comment thread Dockerfile.diskmaker.rhel7 Outdated
COPY config/manifests /manifests

RUN yum install -y e2fsprogs xfsprogs && yum clean all && rm -rf /var/cache/yum
RUN yum install -y e2fsprogs xfsprogs systemd-udev && yum clean all && rm -rf /var/cache/yum
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're using nsenter, then udevadm does not need to be installed inside the container.

Comment thread pkg/diskmaker/discovery/monitor.go Outdated
Comment on lines +124 to +126
if _, pathErr := exec.LookPath("nsenter"); pathErr == nil {
return exec.Command("nsenter", "--mount=/proc/1/ns/mnt", "--", "udevadm", "monitor", "-u", "-k", "-s", "block")
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should either discover very precisely if it's in a container or not or have a cmdline option to use stdbuf below (having nsenter as the default). The /proc/1/ns/mnt check above will pass on any recent Linux, even on the host.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the CRC cluster at least running without nsenter doesn't work:

udevadm monitor stderr: Running in chroot, ignoring request
W0405 18:15:35.460151 2162284 discovery.go:103] disabling udev monitoring

You are right, we probably don't need udevadm in the container then.

Copy link
Copy Markdown
Contributor

@jsafrane jsafrane Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second thoughts: what's wrong with calling nsenter even on the host and for local development / debugging? It's not necessary, but it should work. Should we remove the nsenter check + stdbuf fallback below?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the check. PTAL

@gnufied gnufied force-pushed the fix-udev-monitoring branch from a2ddd79 to 6c7841b Compare April 13, 2026 15:57
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/diskmaker/discovery/monitor.go`:
- Around line 61-63: The current check if cmd == nil in the monitoring setup
incorrectly warns about requiring nsenter but then returns, conflicting with a
later comment about falling back to direct udevadm monitor; fix this by either
constructing a direct udevadm command when nsenter is unavailable (i.e., build
cmd to run "udevadm monitor --udev --kernel" or the equivalent used elsewhere
and continue) or change the warning/logging to clearly state that device
monitoring is being disabled and return; update the branch handling around the
cmd variable in the monitor setup (the block guarding cmd == nil and the later
fallback code) so the behavior and message are consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 49671fba-ce89-4963-90d8-0d6090fedf26

📥 Commits

Reviewing files that changed from the base of the PR and between a2ddd79 and 6c7841b.

📒 Files selected for processing (1)
  • pkg/diskmaker/discovery/monitor.go

Comment thread pkg/diskmaker/discovery/monitor.go
@gnufied gnufied force-pushed the fix-udev-monitoring branch from 6c7841b to f8dcb15 Compare April 13, 2026 16:10
@gnufied
Copy link
Copy Markdown
Member Author

gnufied commented Apr 13, 2026

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Apr 13, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@gnufied: This pull request references Jira Issue OCPBUGS-81715, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 13, 2026

@gnufied: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@jsafrane
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 14, 2026
@gnufied
Copy link
Copy Markdown
Member Author

gnufied commented Apr 14, 2026

/verified by @gnufied

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Apr 14, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@gnufied: This PR has been marked as verified by @gnufied.

Details

In response to this:

/verified by @gnufied

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-bot openshift-merge-bot bot merged commit 73a071f into openshift:main Apr 14, 2026
9 checks passed
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@gnufied: Jira Issue OCPBUGS-81715: All pull requests linked via external trackers have merged:

All linked pull requests have the verified tag. Jira Issue OCPBUGS-81715 has been moved to the VERIFIED state.

Details

In response to this:

https://redhat.atlassian.net/browse/OCPBUGS-81715

Summary by CodeRabbit

  • Refactor
  • Improved disk device monitoring with safer process management and better shutdown coordination
  • Emits clearer runtime warnings when required system capabilities aren’t available and skips monitoring gracefully
  • Streams diagnostic output from the device monitor into warnings for faster troubleshooting
  • More reliable and responsive device detection in containerized and isolated environments when supported
  • Better resource handling and graceful error recovery during device discovery and monitoring

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@gnufied
Copy link
Copy Markdown
Member Author

gnufied commented Apr 16, 2026

/cherry-pick release-4.22

@openshift-cherrypick-robot
Copy link
Copy Markdown

@gnufied: new pull request created: #612

Details

In response to this:

/cherry-pick release-4.22

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants