Add symlink-based LUN device path resolver for Azure NVMe support#402
Open
s4heid wants to merge 1 commit intocloudfoundry:mainfrom
Open
Add symlink-based LUN device path resolver for Azure NVMe support#402s4heid wants to merge 1 commit intocloudfoundry:mainfrom
s4heid wants to merge 1 commit intocloudfoundry:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Azure v6+ VM sizes (Dv6, Dasv6, Dadsv6, Ev6, etc.) use NVMe disk controllers instead of SCSI. The existing
scsiLunDevicePathResolverdiscovers data disks by scanning/sys/bus/vmbus/devices/and/sys/class/scsi_host/paths that do not exist on NVMe VMs. This means the bosh-agent cannot resolve ephemeral or persistent disks on NVMe hardware, blocking adoption of v6+ VM sizes.Solution
Introduce a configurable, infrastructure-agnostic symlink-based resolver that works for both SCSI and NVMe by leveraging udev-managed symlinks (provided by
azure-vm-utils).Two new resolvers:
SymlinkLunDevicePathResolver: resolves disks via<basePath>/<LUN>symlinks (e.g./dev/disk/azure/data/by-lun/1→/dev/nvme0n3). Polls with 100ms interval until the symlink and its target exist, or times out.FallbackDevicePathResolver: generic compositor. Tries a primary resolver first; on failure, delegates to a secondary resolver.When
LunDeviceSymlinkPathis set inLinuxOptions, the symlink resolver wraps the existingDevicePathResolutionType-selected resolver as a fallback, but tries the symlink path first regardless of whether the type is"scsi", or anything else.Additionally, fixes an NVMe-related regex in
findRootDevicePathAndNumber()where the NVMe pattern didn't handle multi-digit numbers (e.g./dev/nvme0n12p2).Backward Compatibility
Note
This change is not expected to cause any breaking changes and can be merged without requiring additional modifications.
However, the symlink resolver requires certain dependencies to be in place in order to function properly.
LunDeviceSymlinkPathis empty (default), behavior is identical to before.LunDeviceSymlinkPathis set, the existing resolver (e.g. SCSI sysfs scanner) becomes the fallback. If the symlink path doesn't exist or times out, the original resolver is tried.lun+host_device_id; the symlink resolver only useslun.LunDeviceSymlinkPath.Dependencies
azure-vm-utilsto be installed in the stemcell, which provides the udev rules that create/dev/disk/azure/data/by-lun/<LUN>symlinks for both SCSI and NVMe VMs.Add Azure Gen2 and NVMe support for stemcells bosh-linux-stemcell-builder#469
LunDeviceSymlinkPathin theagent.json.TBD
Self-Validation
Tested on Azure VMs with 2 Azure data disks (ephemeral LUN 0 + persistent LUN 1), deployed via BOSH with a zookeeper release:
Standard_D4ads_v6 (NVMe with temp disk)
NVMe-only VM. All data disks on
MSFT NVMe Accelerator v1.0controller. Local temp disk on separateMicrosoft NVMe Direct Disk v2controller./dev/nvme0n1/dev/nvme0n2/dev/nvme0n3/dev/nvme1n1Agent log confirms symlink resolver handles all disk operations:
Standard_D4as_v6 (NVMe)
NVMe-only VM without local temp disk. Same controller layout as D4ads_v6 minus the NVMe Direct Disk.
/dev/nvme0n1/dev/nvme0n2/dev/nvme0n3Standard_D4as_v5 (SCSI)
SCSI VM. Data disks on VMBus SCSI controller (
{f8b3781b-1e82-4818-a1c3-63d806ec15bb})./dev/sda/dev/sdb/dev/sdcAgent log confirms symlink resolver works as fast path on SCSI too. No SCSI sysfs fallback needed:
Standard_D4as_v4 (SCSI)
SCSI VM with local temp disk. Note the non-sequential device letter assignment typical of v4 VMs. The OS disk is
/dev/sdb, temp disk is/dev/sdc, and data disks are/dev/sdaand/dev/sdd./dev/sdb/dev/sdc/dev/sda/dev/sddAgent log confirms symlink resolver works as fast path. No SCSI sysfs fallback needed:
This demonstrates the symlink resolver handles the notoriously unstable SCSI device letter ordering on older Azure VMs, where the OS disk isn't
/dev/sdaand data disk LUN 0 isn't/dev/sdb. The symlink path/dev/disk/azure/data/by-lun/0→/dev/sdais stable regardless of enumeration order.Standard_E4as_v4 (SCSI)
Memory-optimized E-series SCSI VM with local temp disk. Exhibits the same non-sequential device letter assignment as D4as_v4: OS disk is
/dev/sdb, temp disk is/dev/sdc, data disks are/dev/sdaand/dev/sdd./dev/sdb/dev/sdc/dev/sda/dev/sddAgent log confirms symlink resolver works as fast path. No SCSI sysfs fallback needed:
Standard_E4as_v6 (NVMe)
Memory-optimized E-series NVMe VM without local temp disk. All disks exposed via
MSFT NVMe Accelerator v1.0controller. NVMe namespace IDs map directly: OS = nsid 1, data LUN N = nsid N+2./dev/nvme0n1/dev/nvme0n2/dev/nvme0n3Agent log confirms symlink resolver resolves NVMe devices directly — no SCSI sysfs paths involved:
Standard_F4as_v6 (NVMe)
Compute-optimized F-series NVMe VM without local temp disk. Same NVMe controller and namespace mapping as E4as_v6, confirming behavior is consistent across VM families.
/dev/nvme0n1/dev/nvme0n2/dev/nvme0n3Agent log confirms symlink resolver works identically on F-series: