feat: RDMA naming infra changes#67
Conversation
Reviewer's GuideExtends the HPC RDMA role to install NVIDIA DOCA/OFED on RHEL 9 x86_64 and adds Azure-specific persistent RDMA device naming via Ansible-driven scripts, systemd services, udev rules, and new role variables/defaults. Sequence diagram for Azure persistent RDMA naming on InfiniBand device changesequenceDiagram
participant InfiniBandDevice
participant udevd
participant systemd
participant azure_persistent_rdma_naming_service
participant azure_persistent_rdma_naming_sh
participant ibdev2netdev
participant ibv_devinfo
participant rdma_rename
InfiniBandDevice->>udevd: Device add/change (subsystem infiniband)
udevd->>udevd: Match 99-azure-persistent-rdma-naming.rules
udevd->>systemd: Set ENV SYSTEMD_WANTS=azure_persistent_rdma_naming.service
systemd->>azure_persistent_rdma_naming_service: Start oneshot service
azure_persistent_rdma_naming_service->>azure_persistent_rdma_naming_sh: ExecStart /usr/sbin/azure_persistent_rdma_naming.sh
azure_persistent_rdma_naming_sh->>ibdev2netdev: ibdev2netdev -v
ibdev2netdev-->>azure_persistent_rdma_naming_sh: List RDMA devices
azure_persistent_rdma_naming_sh->>ibv_devinfo: ibv_devinfo -d old_device (per device)
ibv_devinfo-->>azure_persistent_rdma_naming_sh: link_layer (InfiniBand/Ethernet)
alt link_layer == InfiniBand
azure_persistent_rdma_naming_sh->>rdma_rename: rdma_rename old_device NAME_FIXED mlx5_ibN
else link_layer == Ethernet
azure_persistent_rdma_naming_sh->>rdma_rename: rdma_rename old_device NAME_FIXED mlx5_anN
else other link_layer
azure_persistent_rdma_naming_sh->>azure_persistent_rdma_naming_sh: Log unknown device type
end
azure_persistent_rdma_naming_sh-->>azure_persistent_rdma_naming_service: Exit 0
azure_persistent_rdma_naming_service-->>systemd: oneshot complete (inactive)
Sequence diagram for Azure persistent RDMA naming monitor and remediationsequenceDiagram
participant systemd
participant azure_persistent_rdma_naming_monitor_service
participant azure_persistent_rdma_naming_monitor_sh
participant ibdev2netdev
participant ibv_devinfo
participant azure_persistent_rdma_naming_service
systemd->>azure_persistent_rdma_naming_monitor_service: Start monitor (Restart=always)
azure_persistent_rdma_naming_monitor_service->>azure_persistent_rdma_naming_monitor_sh: ExecStart /usr/sbin/azure_persistent_rdma_naming_monitor.sh
loop Every 60 seconds
azure_persistent_rdma_naming_monitor_sh->>ibdev2netdev: ibdev2netdev -v
ibdev2netdev-->>azure_persistent_rdma_naming_monitor_sh: List RDMA devices
azure_persistent_rdma_naming_monitor_sh->>ibv_devinfo: ibv_devinfo checks (implicit)
alt Found device name without an or ib pattern
azure_persistent_rdma_naming_monitor_sh->>systemd: systemctl enable azure_persistent_rdma_naming.service
azure_persistent_rdma_naming_monitor_sh->>azure_persistent_rdma_naming_service: systemctl restart azure_persistent_rdma_naming.service
azure_persistent_rdma_naming_service->>azure_persistent_rdma_naming_service: Run naming script
azure_persistent_rdma_naming_service-->>azure_persistent_rdma_naming_monitor_sh: Service completes
azure_persistent_rdma_naming_monitor_sh->>azure_persistent_rdma_naming_monitor_sh: sleep 60 and break inner loop
else All device names match persistent scheme
azure_persistent_rdma_naming_monitor_sh->>azure_persistent_rdma_naming_monitor_sh: sleep 60
end
end
note over azure_persistent_rdma_naming_monitor_service,systemd: Restart=always ensures monitor keeps running
Flow diagram for Ansible-based DOCA/OFED install and Azure persistent RDMA naming setupflowchart LR
A_start([Start RDMA role tasks])
A_start --> B_check_doca{RHEL-like distro AND
ansible_distribution_major_version == 9 AND
ansible_architecture == x86_64 AND
not __hpc_server_is_ostree}
B_check_doca -- Yes --> C_set_doca_path[Set __hpc_doca_rpm_path to /tmp/basename of __hpc_doca_host_rpm_url]
C_set_doca_path --> D_download_doca[Download DOCA host RPM from __hpc_doca_host_rpm_url]
D_download_doca --> E_install_doca_rpm[Install DOCA host RPM via dnf]
E_install_doca_rpm --> F_dnf_clean[dnf clean all]
F_dnf_clean --> G_install_doca_ofed[Install doca-ofed via dnf with retries]
B_check_doca -- No --> H_skip_doca[Skip DOCA host RPM and doca-ofed install]
G_install_doca_ofed --> I_check_azure_naming{hpc_enable_azure_persistent_rdma_naming}
H_skip_doca --> I_check_azure_naming
I_check_azure_naming -- No --> Z_end([Continue with remaining RDMA role tasks])
I_check_azure_naming -- Yes --> J_check_vendor{ansible_system_vendor == Microsoft_Corporation}
J_check_vendor -- No --> K_debug_skip[Debug: Skipping Azure persistent RDMA naming on non-Azure system]
K_debug_skip --> Z_end
J_check_vendor -- Yes --> L_set_rdma_rename_path[Set __hpc_rdma_rename_path_effective from __hpc_rdma_rename_path]
L_set_rdma_rename_path --> M_install_naming_script[Template azure_persistent_rdma_naming.sh to /usr/sbin]
M_install_naming_script --> N_install_naming_service[Template azure_persistent_rdma_naming.service to /etc/systemd/system]
N_install_naming_service --> O_install_udev_rule[Template 99-azure-persistent-rdma-naming.rules to /etc/udev/rules.d]
O_install_udev_rule --> P_notify_handlers[Notify Reload_udev and Trigger_udev_for_infiniband]
P_notify_handlers --> Q_enable_start_naming[Enable and start azure_persistent_rdma_naming.service]
Q_enable_start_naming --> R_install_monitor_script[Template azure_persistent_rdma_naming_monitor.sh to /usr/sbin]
R_install_monitor_script --> S_install_monitor_service[Template azure_persistent_rdma_naming_monitor.service to /etc/systemd/system]
S_install_monitor_service --> T_enable_start_monitor[Enable and start azure_persistent_rdma_naming_monitor.service]
T_enable_start_monitor --> Z_end
subgraph Handlers
U_reload_udev[Reload udev: udevadm control --reload]
V_trigger_ib[Trigger udev infiniband: udevadm trigger --subsystem-match=infiniband]
end
P_notify_handlers -.-> U_reload_udev
P_notify_handlers -.-> V_trigger_ib
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
800c574 to
e91834d
Compare
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- In
azure_persistent_rdma_naming.sh, the command-availability checks exit with different codes (ibdev2netdevexits with default/non-zero whileibv_devinfoexits 0); consider making both exits explicit and consistent so callers can reliably interpret failure vs. intentional skip. - The device-name checks in
azure_persistent_rdma_naming_monitor.shuse substring matches (*"an"*/*"ib"*), which can produce false positives; tightening this to explicit prefixes likemlx5_an*/mlx5_ib*(or a clearly defined pattern) will reduce accidental matches and unnecessary service restarts. - For installing
doca-ofed, using theansible.builtin.dnfmodule instead of the rawcommand: dnf ...would make the task more idempotent and Ansible-native (e.g., leveragestate: presentand built-in retry/error handling instead of parsing stdout for "Nothing to do").
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `azure_persistent_rdma_naming.sh`, the command-availability checks exit with different codes (`ibdev2netdev` exits with default/non-zero while `ibv_devinfo` exits 0); consider making both exits explicit and consistent so callers can reliably interpret failure vs. intentional skip.
- The device-name checks in `azure_persistent_rdma_naming_monitor.sh` use substring matches (`*"an"*` / `*"ib"*`), which can produce false positives; tightening this to explicit prefixes like `mlx5_an*` / `mlx5_ib*` (or a clearly defined pattern) will reduce accidental matches and unnecessary service restarts.
- For installing `doca-ofed`, using the `ansible.builtin.dnf` module instead of the raw `command: dnf ...` would make the task more idempotent and Ansible-native (e.g., leverage `state: present` and built-in retry/error handling instead of parsing stdout for "Nothing to do").
## Individual Comments
### Comment 1
<location> `templates/rdma/azure_persistent_rdma_naming.sh.j2:15-17` </location>
<code_context>
+an_index=0
+ib_index=0
+
+if ! command -v ibdev2netdev >/dev/null 2>&1; then
+ echo "ibdev2netdev not found; ensure RDMA tools are installed."
+ exit
+fi
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Use an explicit success exit code when ibdev2netdev is missing to avoid failing the unit unnecessarily.
This bare `exit` will return the non-zero status from `command -v`, causing the systemd unit to be marked failed when `ibdev2netdev` is not installed. Since this is a "feature not present" case (like the `ibv_devinfo` path below), please exit with `0` here to avoid spurious failures on systems without RDMA tooling.
</issue_to_address>
### Comment 2
<location> `templates/rdma/azure_persistent_rdma_naming_monitor.sh.j2:25-27` </location>
<code_context>
+
+while true; do
+ for device in $(ibdev2netdev -v | sort -n | cut -f2 -d' '); do
+ if [[ "${device}" != *"an"* && "${device}" != *"ib"* ]]; then
+ systemctl enable azure_persistent_rdma_naming.service >/dev/null 2>&1 || true
+ systemctl restart azure_persistent_rdma_naming.service >/dev/null 2>&1 || true
+ sleep 60
+ break
</code_context>
<issue_to_address>
**suggestion (performance):** Avoid repeatedly calling `systemctl enable` in the hot path of the monitor loop.
Within the loop, every non-conforming device triggers both `systemctl enable` and `systemctl restart` on `azure_persistent_rdma_naming.service`. The repeated `enable` is redundant and adds avoidable overhead and potential systemd contention. Consider relying on Ansible to enable the unit ahead of time, or enabling it once outside the `while true` loop, and only restarting inside the loop.
Suggested implementation:
```
if ! command -v ibv_devinfo >/dev/null 2>&1; then
echo "ibv_devinfo not found; skipping RDMA naming monitor."
exit 0
fi
# Enable once up front; avoid repeating this in the hot path of the monitor loop.
systemctl enable azure_persistent_rdma_naming.service >/dev/null 2>&1 || true
while true; do
```
```
for device in $(ibdev2netdev -v | sort -n | cut -f2 -d' '); do
if [[ "${device}" != *"an"* && "${device}" != *"ib"* ]]; then
systemctl restart azure_persistent_rdma_naming.service >/dev/null 2>&1 || true
sleep 60
break
fi
done
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
By the way, if your code is AI assistant, it should be mentioned in the commit message |
060b07e to
5ad72c9
Compare
tasks/main.yml
Outdated
| - Trigger udev for infiniband | ||
|
|
||
| - name: Apply notified handlers (reload systemd/udev if needed) before starting services | ||
| meta: flush_handlers |
There was a problem hiding this comment.
NOTE that this will cause all notified handlers to trigger, in the order that they were notified - earlier in this role, and even possibly in other tasks in the playbook that calls this role, which may cause unintended behavior. I don't know if it will be a problem, but it will be hard to test all of the real-world combinations.
There was a problem hiding this comment.
Handlers are typically used in the case where "I need this task to run at some point after all of the other tasks in this role are complete, and I don't care when". This doesn't seem like one of those cases.
a7d009a to
ff14ebc
Compare
spetrosi
left a comment
There was a problem hiding this comment.
Check ansible-lint errors please
ff14ebc to
cf08d1c
Compare
Enhancement:
Extended RDMA setup to install DOCA OFED on RHEL-like 9 x86_64 and to configure Azure persistent RDMA device naming (script + systemd service + udev-triggered activation).
Reason:
DOCA/OFED is required for Mellanox/NVIDIA networking stack support on RHEL 9.
Persistent RDMA naming prevents RDMA device name drift across reboots/hardware events on Azure, improving stability for HPC workloads.
Result:
When hpc_install_rdma: true:
On RHEL-like 9 x86_64 (non-ostree): installs DOCA host RPM (from hpc_doca_host_rpm_url) and then installs doca-ofed.
On Azure (system_vendor == "Microsoft Corporation"): deploys /usr/sbin/azure_persistent_rdma_naming.sh, azure_persistent_rdma_naming.service, and a udev rule to trigger naming on InfiniBand add/change events.
Before:
ibv_devices
device node GUID
------ ----------------
mlx5_0 00155dfffe340078
mlx5_1 000d3afffe7d1412
After
bv_devices
device node GUID
------ ----------------
mlx5_ib0 00155dfffe340078
mlx5_an0 000d3afffe7d1412
sudo systemctl status azure_persistent_rdma_naming_monitor.service
● azure_persistent_rdma_naming_monitor.service - Azure persistent RDMA naming Monitor
Loaded: loaded (/etc/systemd/system/azure_persistent_rdma_naming_monitor.service; enabled; preset: disabled)
Active: active (running) since Fri 2026-02-06 08:10:47 UTC; 3min 39s ago
Main PID: 1806 (bash)
Tasks: 2 (limit: 2266275)
Memory: 700.0K
CPU: 252ms
CGroup: /system.slice/azure_persistent_rdma_naming_monitor.service
├─1806 bash /usr/sbin/azure_persistent_rdma_naming_monitor.sh
└─3079 sleep 60
Feb 06 08:10:47 gaurav-hpc-rdma-002 systemd[1]: Started Azure persistent RDMA naming Monitor.
[azureuser@gaurav-hpc-rdma-002 ~]$ sudo systemctl status azure_persistent_rdma_naming.service
○ azure_persistent_rdma_naming.service - Azure persistent RDMA naming
Loaded: loaded (/etc/systemd/system/azure_persistent_rdma_naming.service; enabled; preset: disabled)
Active: inactive (dead) since Fri 2026-02-06 08:10:47 UTC; 3min 54s ago
Process: 2133 ExecStart=/usr/sbin/azure_persistent_rdma_naming.sh (code=exited, status=0/SUCCESS)
Main PID: 2133 (code=exited, status=0/SUCCESS)
CPU: 83ms
Feb 06 08:10:47 gaurav-hpc-rdma-002 systemd[1]: Starting Azure persistent RDMA naming...
Feb 06 08:10:47 gaurav-hpc-rdma-002 azure_persistent_rdma_naming.sh[2260]: mlx5_an0
Feb 06 08:10:47 gaurav-hpc-rdma-002 azure_persistent_rdma_naming.sh[2268]: mlx5_ib0
Feb 06 08:10:47 gaurav-hpc-rdma-002 systemd[1]: azure_persistent_rdma_naming.service: Deactivated successfully.
Feb 06 08:10:47 gaurav-hpc-rdma-002 systemd[1]: Finished Azure persistent RDMA naming.
Issue Tracker Tickets (Jira or BZ if any):
Summary by Sourcery
Add DOCA RDMA setup for RHEL 9 x86_64 and introduce Azure-specific persistent RDMA device naming infrastructure.
New Features:
Enhancements:
Documentation:
Summary by Sourcery
Extend RDMA setup to install DOCA/OFED on RHEL 9 x86_64 and add Azure-specific infrastructure for persistent RDMA device naming.
New Features:
Enhancements:
Documentation: