OCPBUGS-86571: templates: disable IPv4 DAD to fix nodeip-configuration race#6098
OCPBUGS-86571: templates: disable IPv4 DAD to fix nodeip-configuration race#6098mkowalski wants to merge 1 commit into
Conversation
RHEL 10 enables IPv4 Duplicate Address Detection (DAD / ACD) by default
in NetworkManager. The ACD probing takes ~3 seconds, during which the
IPv4 address remains in tentative state and is not visible to
applications querying interface addresses.
This introduces a race condition in dual-stack baremetal clusters where
nodeip-configuration.service starts before the IPv4 address is assigned.
The service only sees the IPv6 address (which completes DAD faster) and
configures kubelet with IPv6-only, despite the interface eventually
getting both addresses.
Timeline observed on affected nodes:
T+0s Interface up, IPv6 tentative, IPv4 ACD probing starts
T+2s IPv6 DAD complete, IPv4 still probing
nodeip-configuration.service runs → sees only IPv6 → writes
IPv6-only config
T+4s IPv4 ACD complete (too late)
Fix this by adding a global NetworkManager drop-in that sets
ipv4.dad-timeout=0, restoring the RHEL 9 behavior where IPv4 addresses
are assigned immediately without ACD probing.
Fixes: https://issues.redhat.com/browse/OCPBUGS-86571
Generated-by: OpenClaw OpenClaw 2026.5.12 (f066dd2)
AI-model: claude-opus-4.6
Signed-off-by: Mateusz Kowalski <mko@redhat.com>
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
WalkthroughA NetworkManager configuration template file is populated to deploy ChangesNetworkManager IPv4 DAD Configuration
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes 🚥 Pre-merge checks | ✅ 15✅ Passed checks (15 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: mkowalski The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@mkowalski: This pull request references Jira Issue OCPBUGS-86571, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@mkowalski: This pull request references Jira Issue OCPBUGS-86571, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira backport release-4.22 |
|
@mkowalski: The following backport issues have been created:
Queuing cherrypicks to the requested branches to be created after this PR merges: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@openshift-ci-robot: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@mkowalski: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/hold It's not necessarily what I want. Ref.: https://redhat-internal.slack.com/archives/C04M1SH1VNZ/p1779891220519739 |
|
@mkowalski: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@mkowalski: This pull request references Jira Issue OCPBUGS-86571. The bug has been updated to no longer refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Summary
RHEL 10 enables IPv4 DAD (Duplicate Address Detection / ACD) by default in NetworkManager. The ACD probing takes ~3 seconds, during which the IPv4 address is in tentative state and invisible to applications.
This causes a race condition in dual-stack baremetal clusters where
nodeip-configuration.servicestarts before IPv4 is assigned, sees only IPv6, and configures kubelet with IPv6-only — despite the interface eventually getting both addresses.Root Cause
RHEL 10 changed the default
ipv4.dad-timeoutfrom0(disabled) to a non-zero value, enabling ACD probing for all IPv4 addresses. IPv6 DAD completes faster (~2s), sonodeip-configuration.serviceruns in the window where IPv6 is ready but IPv4 is still probing:Fix
Add a global NetworkManager drop-in (
/etc/NetworkManager/conf.d/01-no-dad.conf) that setsipv4.dad-timeout=0, restoring the RHEL 9 behavior. This follows the same pattern as the existing01-ipv6.confdrop-in.Fixes: https://issues.redhat.com/browse/OCPBUGS-86571
🤖 This PR was created by OpenClaw on behalf of @mkowalski.
Summary by CodeRabbit