Skip to content

feat: RDMA naming infra changes#67

Open
ggoklani wants to merge 1 commit intolinux-system-roles:mainfrom
ggoklani:rdma_naming_setup
Open

feat: RDMA naming infra changes#67
ggoklani wants to merge 1 commit intolinux-system-roles:mainfrom
ggoklani:rdma_naming_setup

Conversation

@ggoklani
Copy link
Collaborator

@ggoklani ggoklani commented Feb 5, 2026

Enhancement:
Extended RDMA setup to install DOCA OFED on RHEL-like 9 x86_64 and to configure Azure persistent RDMA device naming (script + systemd service + udev-triggered activation).
Reason:
DOCA/OFED is required for Mellanox/NVIDIA networking stack support on RHEL 9.
Persistent RDMA naming prevents RDMA device name drift across reboots/hardware events on Azure, improving stability for HPC workloads.
Result:
When hpc_install_rdma: true:
On RHEL-like 9 x86_64 (non-ostree): installs DOCA host RPM (from hpc_doca_host_rpm_url) and then installs doca-ofed.
On Azure (system_vendor == "Microsoft Corporation"): deploys /usr/sbin/azure_persistent_rdma_naming.sh, azure_persistent_rdma_naming.service, and a udev rule to trigger naming on InfiniBand add/change events.

Before:
ibv_devices
    device             node GUID
    ------          ----------------
    mlx5_0          00155dfffe340078
    mlx5_1          000d3afffe7d1412

After
bv_devices
    device             node GUID
    ------          ----------------
    mlx5_ib0        00155dfffe340078
    mlx5_an0        000d3afffe7d1412
    
sudo systemctl status azure_persistent_rdma_naming_monitor.service
● azure_persistent_rdma_naming_monitor.service - Azure persistent RDMA naming Monitor
     Loaded: loaded (/etc/systemd/system/azure_persistent_rdma_naming_monitor.service; enabled; preset: disabled)
     Active: active (running) since Fri 2026-02-06 08:10:47 UTC; 3min 39s ago
   Main PID: 1806 (bash)
      Tasks: 2 (limit: 2266275)
     Memory: 700.0K
        CPU: 252ms
     CGroup: /system.slice/azure_persistent_rdma_naming_monitor.service
             ├─1806 bash /usr/sbin/azure_persistent_rdma_naming_monitor.sh
             └─3079 sleep 60

Feb 06 08:10:47 gaurav-hpc-rdma-002 systemd[1]: Started Azure persistent RDMA naming Monitor.
[azureuser@gaurav-hpc-rdma-002 ~]$ sudo systemctl status azure_persistent_rdma_naming.service
○ azure_persistent_rdma_naming.service - Azure persistent RDMA naming
     Loaded: loaded (/etc/systemd/system/azure_persistent_rdma_naming.service; enabled; preset: disabled)
     Active: inactive (dead) since Fri 2026-02-06 08:10:47 UTC; 3min 54s ago
    Process: 2133 ExecStart=/usr/sbin/azure_persistent_rdma_naming.sh (code=exited, status=0/SUCCESS)
   Main PID: 2133 (code=exited, status=0/SUCCESS)
        CPU: 83ms

Feb 06 08:10:47 gaurav-hpc-rdma-002 systemd[1]: Starting Azure persistent RDMA naming...
Feb 06 08:10:47 gaurav-hpc-rdma-002 azure_persistent_rdma_naming.sh[2260]: mlx5_an0
Feb 06 08:10:47 gaurav-hpc-rdma-002 azure_persistent_rdma_naming.sh[2268]: mlx5_ib0
Feb 06 08:10:47 gaurav-hpc-rdma-002 systemd[1]: azure_persistent_rdma_naming.service: Deactivated successfully.
Feb 06 08:10:47 gaurav-hpc-rdma-002 systemd[1]: Finished Azure persistent RDMA naming.
    
Issue Tracker Tickets (Jira or BZ if any):

Summary by Sourcery

Add DOCA RDMA setup for RHEL 9 x86_64 and introduce Azure-specific persistent RDMA device naming infrastructure.

New Features:

  • Install DOCA host RPM and doca-ofed as part of RDMA setup on non-ostree RHEL 9 x86_64 systems.
  • Configure Azure-specific persistent RDMA device naming via a script, systemd service, and udev rules, with defaults and configuration variables.
  • Expose new role variables for DOCA host RPM URL, enabling Azure persistent RDMA naming, and rdma_rename path with documented defaults.

Enhancements:

  • Extend RDMA handlers to trigger udev for InfiniBand subsystem changes to support the new persistent naming flow.

Documentation:

  • Document the new DOCA RPM URL, Azure persistent RDMA naming toggle, and rdma_rename path variables in the role README.

Summary by Sourcery

Extend RDMA setup to install DOCA/OFED on RHEL 9 x86_64 and add Azure-specific infrastructure for persistent RDMA device naming.

New Features:

  • Install DOCA host RPM and doca-ofed automatically as part of RDMA setup on non-ostree RHEL-like 9 x86_64 systems.
  • Configure Azure-specific persistent RDMA device naming via scripts, systemd services, and udev rules to keep RDMA device names stable.
  • Introduce role variables for DOCA host RPM URL, Azure persistent RDMA naming enablement, and the rdma_rename binary path with sensible defaults.

Enhancements:

  • Add a udev handler to trigger InfiniBand subsystem events to support the new persistent RDMA naming flow.

Documentation:

  • Document the new DOCA RPM URL, Azure persistent RDMA naming toggle, and rdma_rename path variables in the role README.

@sourcery-ai
Copy link

sourcery-ai bot commented Feb 5, 2026

Reviewer's Guide

Extends the HPC RDMA role to install NVIDIA DOCA/OFED on RHEL 9 x86_64 and adds Azure-specific persistent RDMA device naming via Ansible-driven scripts, systemd services, udev rules, and new role variables/defaults.

Sequence diagram for Azure persistent RDMA naming on InfiniBand device change

sequenceDiagram
    participant InfiniBandDevice
    participant udevd
    participant systemd
    participant azure_persistent_rdma_naming_service
    participant azure_persistent_rdma_naming_sh
    participant ibdev2netdev
    participant ibv_devinfo
    participant rdma_rename

    InfiniBandDevice->>udevd: Device add/change (subsystem infiniband)
    udevd->>udevd: Match 99-azure-persistent-rdma-naming.rules
    udevd->>systemd: Set ENV SYSTEMD_WANTS=azure_persistent_rdma_naming.service
    systemd->>azure_persistent_rdma_naming_service: Start oneshot service
    azure_persistent_rdma_naming_service->>azure_persistent_rdma_naming_sh: ExecStart /usr/sbin/azure_persistent_rdma_naming.sh

    azure_persistent_rdma_naming_sh->>ibdev2netdev: ibdev2netdev -v
    ibdev2netdev-->>azure_persistent_rdma_naming_sh: List RDMA devices
    azure_persistent_rdma_naming_sh->>ibv_devinfo: ibv_devinfo -d old_device (per device)
    ibv_devinfo-->>azure_persistent_rdma_naming_sh: link_layer (InfiniBand/Ethernet)

    alt link_layer == InfiniBand
        azure_persistent_rdma_naming_sh->>rdma_rename: rdma_rename old_device NAME_FIXED mlx5_ibN
    else link_layer == Ethernet
        azure_persistent_rdma_naming_sh->>rdma_rename: rdma_rename old_device NAME_FIXED mlx5_anN
    else other link_layer
        azure_persistent_rdma_naming_sh->>azure_persistent_rdma_naming_sh: Log unknown device type
    end

    azure_persistent_rdma_naming_sh-->>azure_persistent_rdma_naming_service: Exit 0
    azure_persistent_rdma_naming_service-->>systemd: oneshot complete (inactive)
Loading

Sequence diagram for Azure persistent RDMA naming monitor and remediation

sequenceDiagram
    participant systemd
    participant azure_persistent_rdma_naming_monitor_service
    participant azure_persistent_rdma_naming_monitor_sh
    participant ibdev2netdev
    participant ibv_devinfo
    participant azure_persistent_rdma_naming_service

    systemd->>azure_persistent_rdma_naming_monitor_service: Start monitor (Restart=always)
    azure_persistent_rdma_naming_monitor_service->>azure_persistent_rdma_naming_monitor_sh: ExecStart /usr/sbin/azure_persistent_rdma_naming_monitor.sh

    loop Every 60 seconds
        azure_persistent_rdma_naming_monitor_sh->>ibdev2netdev: ibdev2netdev -v
        ibdev2netdev-->>azure_persistent_rdma_naming_monitor_sh: List RDMA devices
        azure_persistent_rdma_naming_monitor_sh->>ibv_devinfo: ibv_devinfo checks (implicit)
        alt Found device name without an or ib pattern
            azure_persistent_rdma_naming_monitor_sh->>systemd: systemctl enable azure_persistent_rdma_naming.service
            azure_persistent_rdma_naming_monitor_sh->>azure_persistent_rdma_naming_service: systemctl restart azure_persistent_rdma_naming.service
            azure_persistent_rdma_naming_service->>azure_persistent_rdma_naming_service: Run naming script
            azure_persistent_rdma_naming_service-->>azure_persistent_rdma_naming_monitor_sh: Service completes
            azure_persistent_rdma_naming_monitor_sh->>azure_persistent_rdma_naming_monitor_sh: sleep 60 and break inner loop
        else All device names match persistent scheme
            azure_persistent_rdma_naming_monitor_sh->>azure_persistent_rdma_naming_monitor_sh: sleep 60
        end
    end

    note over azure_persistent_rdma_naming_monitor_service,systemd: Restart=always ensures monitor keeps running
Loading

Flow diagram for Ansible-based DOCA/OFED install and Azure persistent RDMA naming setup

flowchart LR
    A_start([Start RDMA role tasks])

    A_start --> B_check_doca{RHEL-like distro AND
ansible_distribution_major_version == 9 AND
ansible_architecture == x86_64 AND
not __hpc_server_is_ostree}

    B_check_doca -- Yes --> C_set_doca_path[Set __hpc_doca_rpm_path to /tmp/basename of __hpc_doca_host_rpm_url]
    C_set_doca_path --> D_download_doca[Download DOCA host RPM from __hpc_doca_host_rpm_url]
    D_download_doca --> E_install_doca_rpm[Install DOCA host RPM via dnf]
    E_install_doca_rpm --> F_dnf_clean[dnf clean all]
    F_dnf_clean --> G_install_doca_ofed[Install doca-ofed via dnf with retries]
    B_check_doca -- No --> H_skip_doca[Skip DOCA host RPM and doca-ofed install]

    G_install_doca_ofed --> I_check_azure_naming{hpc_enable_azure_persistent_rdma_naming}
    H_skip_doca --> I_check_azure_naming

    I_check_azure_naming -- No --> Z_end([Continue with remaining RDMA role tasks])
    I_check_azure_naming -- Yes --> J_check_vendor{ansible_system_vendor == Microsoft_Corporation}

    J_check_vendor -- No --> K_debug_skip[Debug: Skipping Azure persistent RDMA naming on non-Azure system]
    K_debug_skip --> Z_end

    J_check_vendor -- Yes --> L_set_rdma_rename_path[Set __hpc_rdma_rename_path_effective from __hpc_rdma_rename_path]
    L_set_rdma_rename_path --> M_install_naming_script[Template azure_persistent_rdma_naming.sh to /usr/sbin]
    M_install_naming_script --> N_install_naming_service[Template azure_persistent_rdma_naming.service to /etc/systemd/system]
    N_install_naming_service --> O_install_udev_rule[Template 99-azure-persistent-rdma-naming.rules to /etc/udev/rules.d]
    O_install_udev_rule --> P_notify_handlers[Notify Reload_udev and Trigger_udev_for_infiniband]
    P_notify_handlers --> Q_enable_start_naming[Enable and start azure_persistent_rdma_naming.service]

    Q_enable_start_naming --> R_install_monitor_script[Template azure_persistent_rdma_naming_monitor.sh to /usr/sbin]
    R_install_monitor_script --> S_install_monitor_service[Template azure_persistent_rdma_naming_monitor.service to /etc/systemd/system]
    S_install_monitor_service --> T_enable_start_monitor[Enable and start azure_persistent_rdma_naming_monitor.service]
    T_enable_start_monitor --> Z_end

    subgraph Handlers
        U_reload_udev[Reload udev: udevadm control --reload]
        V_trigger_ib[Trigger udev infiniband: udevadm trigger --subsystem-match=infiniband]
    end

    P_notify_handlers -.-> U_reload_udev
    P_notify_handlers -.-> V_trigger_ib
Loading

File-Level Changes

Change Details Files
Add conditional DOCA host RPM and doca-ofed installation for RHEL-like 9 x86_64 systems as part of RDMA setup.
  • Gate DOCA installation on non-ostree, Red Hat–like distro, RHEL major version 9, and x86_64 architecture.
  • Download DOCA host RPM from configurable URL into /tmp and install it via dnf with GPG checks disabled.
  • Clean DNF metadata after installing the DOCA repo RPM to avoid cache issues.
  • Install doca-ofed via dnf command with verbose logging, retry logic, and idempotent change detection based on stdout.
tasks/main.yml
Introduce Azure persistent RDMA device naming infrastructure (script, monitor, systemd services, and udev rule) controlled by a new role toggle.
  • Add top-level Ansible task block guarded by hpc_enable_azure_persistent_rdma_naming to configure Azure RDMA naming.
  • Short-circuit configuration on non-Azure systems using system_vendor fact and emit a debug message when skipped.
  • On Azure, set an effective rdma_rename path fact and template-install azure_persistent_rdma_naming.sh, systemd unit, udev rule, monitor script, and monitor unit into standard locations.
  • Enable and start azure_persistent_rdma_naming.service (oneshot) and azure_persistent_rdma_naming_monitor.service (long-running) with daemon-reload in the tasks.
  • Define udev rule that uses SYSTEMD_WANTS to trigger azure_persistent_rdma_naming.service on InfiniBand add/change events instead of RUN= handlers.
  • Add a new handler that runs udevadm trigger --subsystem-match=infiniband and notify it from the udev rule installation task.
tasks/main.yml
handlers/main.yml
templates/rdma/azure_persistent_rdma_naming.sh.j2
templates/rdma/azure_persistent_rdma_naming_monitor.sh.j2
templates/rdma/azure_persistent_rdma_naming.service.j2
templates/rdma/azure_persistent_rdma_naming_monitor.service.j2
templates/rdma/99-azure-persistent-rdma-naming.rules.j2
Add and document new role variables and defaults for DOCA and Azure RDMA naming behavior.
  • Introduce __hpc_rdma_rename_path and __hpc_doca_host_rpm_url internal vars with sensible defaults in vars/main.yml.
  • Extend __hpc_rdma_packages comment to clarify that it is the base RDMA stack and tools, without altering the existing package list.
  • Add hpc_enable_azure_persistent_rdma_naming default (true) to defaults/main.yml to control Azure RDMA naming setup.
  • Document hpc_doca_host_rpm_url, hpc_enable_azure_persistent_rdma_naming, and hpc_rdma_rename_path in README.md with descriptions, defaults, and types.
vars/main.yml
defaults/main.yml
README.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@ggoklani ggoklani force-pushed the rdma_naming_setup branch 4 times, most recently from 800c574 to e91834d Compare February 6, 2026 08:18
@ggoklani ggoklani marked this pull request as ready for review February 6, 2026 08:53
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • In azure_persistent_rdma_naming.sh, the command-availability checks exit with different codes (ibdev2netdev exits with default/non-zero while ibv_devinfo exits 0); consider making both exits explicit and consistent so callers can reliably interpret failure vs. intentional skip.
  • The device-name checks in azure_persistent_rdma_naming_monitor.sh use substring matches (*"an"* / *"ib"*), which can produce false positives; tightening this to explicit prefixes like mlx5_an* / mlx5_ib* (or a clearly defined pattern) will reduce accidental matches and unnecessary service restarts.
  • For installing doca-ofed, using the ansible.builtin.dnf module instead of the raw command: dnf ... would make the task more idempotent and Ansible-native (e.g., leverage state: present and built-in retry/error handling instead of parsing stdout for "Nothing to do").
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `azure_persistent_rdma_naming.sh`, the command-availability checks exit with different codes (`ibdev2netdev` exits with default/non-zero while `ibv_devinfo` exits 0); consider making both exits explicit and consistent so callers can reliably interpret failure vs. intentional skip.
- The device-name checks in `azure_persistent_rdma_naming_monitor.sh` use substring matches (`*"an"*` / `*"ib"*`), which can produce false positives; tightening this to explicit prefixes like `mlx5_an*` / `mlx5_ib*` (or a clearly defined pattern) will reduce accidental matches and unnecessary service restarts.
- For installing `doca-ofed`, using the `ansible.builtin.dnf` module instead of the raw `command: dnf ...` would make the task more idempotent and Ansible-native (e.g., leverage `state: present` and built-in retry/error handling instead of parsing stdout for "Nothing to do").

## Individual Comments

### Comment 1
<location> `templates/rdma/azure_persistent_rdma_naming.sh.j2:15-17` </location>
<code_context>
+an_index=0
+ib_index=0
+
+if ! command -v ibdev2netdev >/dev/null 2>&1; then
+  echo "ibdev2netdev not found; ensure RDMA tools are installed."
+  exit
+fi
+
</code_context>

<issue_to_address>
**issue (bug_risk):** Use an explicit success exit code when ibdev2netdev is missing to avoid failing the unit unnecessarily.

This bare `exit` will return the non-zero status from `command -v`, causing the systemd unit to be marked failed when `ibdev2netdev` is not installed. Since this is a "feature not present" case (like the `ibv_devinfo` path below), please exit with `0` here to avoid spurious failures on systems without RDMA tooling.
</issue_to_address>

### Comment 2
<location> `templates/rdma/azure_persistent_rdma_naming_monitor.sh.j2:25-27` </location>
<code_context>
+
+while true; do
+  for device in $(ibdev2netdev -v | sort -n | cut -f2 -d' '); do
+    if [[ "${device}" != *"an"* && "${device}" != *"ib"* ]]; then
+      systemctl enable azure_persistent_rdma_naming.service >/dev/null 2>&1 || true
+      systemctl restart azure_persistent_rdma_naming.service >/dev/null 2>&1 || true
+      sleep 60
+      break
</code_context>

<issue_to_address>
**suggestion (performance):** Avoid repeatedly calling `systemctl enable` in the hot path of the monitor loop.

Within the loop, every non-conforming device triggers both `systemctl enable` and `systemctl restart` on `azure_persistent_rdma_naming.service`. The repeated `enable` is redundant and adds avoidable overhead and potential systemd contention. Consider relying on Ansible to enable the unit ahead of time, or enabling it once outside the `while true` loop, and only restarting inside the loop.

Suggested implementation:

```
if ! command -v ibv_devinfo >/dev/null 2>&1; then
  echo "ibv_devinfo not found; skipping RDMA naming monitor."
  exit 0
fi

# Enable once up front; avoid repeating this in the hot path of the monitor loop.
systemctl enable azure_persistent_rdma_naming.service >/dev/null 2>&1 || true

while true; do

```

```
  for device in $(ibdev2netdev -v | sort -n | cut -f2 -d' '); do
    if [[ "${device}" != *"an"* && "${device}" != *"ib"* ]]; then
      systemctl restart azure_persistent_rdma_naming.service >/dev/null 2>&1 || true
      sleep 60
      break
    fi
  done

```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@spetrosi
Copy link
Collaborator

spetrosi commented Feb 6, 2026

By the way, if your code is AI assistant, it should be mentioned in the commit message

@ggoklani ggoklani force-pushed the rdma_naming_setup branch 3 times, most recently from 060b07e to 5ad72c9 Compare February 6, 2026 14:46
tasks/main.yml Outdated
- Trigger udev for infiniband

- name: Apply notified handlers (reload systemd/udev if needed) before starting services
meta: flush_handlers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE that this will cause all notified handlers to trigger, in the order that they were notified - earlier in this role, and even possibly in other tasks in the playbook that calls this role, which may cause unintended behavior. I don't know if it will be a problem, but it will be hard to test all of the real-world combinations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handlers are typically used in the case where "I need this task to run at some point after all of the other tasks in this role are complete, and I don't care when". This doesn't seem like one of those cases.

@ggoklani ggoklani force-pushed the rdma_naming_setup branch 2 times, most recently from a7d009a to ff14ebc Compare February 9, 2026 06:47
@ggoklani ggoklani requested review from richm and spetrosi February 9, 2026 07:31
Copy link
Collaborator

@spetrosi spetrosi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check ansible-lint errors please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants