Skip to content

Generate CDI specification including additional GIDs#630

Merged
elezar merged 5 commits intoNVIDIA:mainfrom
elezar:CNT-4739/additional-gids
Feb 19, 2026
Merged

Generate CDI specification including additional GIDs#630
elezar merged 5 commits intoNVIDIA:mainfrom
elezar:CNT-4739/additional-gids

Conversation

@elezar
Copy link
Member

@elezar elezar commented Aug 5, 2024

Includes the change from #1655 (which could be reviewed separately).

This change adds logic to include the AdditionalGIDs required for a device node in a generated CDI specification.

When a device node requires specific group membership for access, a non-root user running in a container (e.g. using the -u Docker command line argument) may not have access to the device.

In the v0.7.0 CDI specification, support was added for specifying additional GIDs in the CDI specification and these changes extract the required information from the device nodes on the host if available and update the (non-root) additional GIDs in the container.

Since this MAY affect the MINIMUM required CDI spec version, a feature flag is provided to allow users to opt-out of this. In: /etc/nvidia-container-runtime/config.toml

[features]
   no-additional-gids-for-device-nodes = true

or using the nvidia-ctk config CLI command:

sudo nvidia-ctk config --in-place --set features.no-additional-gids-for-device-nodes

It is also exposed as an nvcdi feature flag (no-additional-gids-for-device-nodes) which can be set at construction of the CDI spec generator:

nvcdi.WithFeatureFlag(nvcdi.FeatureNoAdditionalGIDsForDeviceNodes)

When using nvidia-ctk cdi generate to generate CDI specs:

nvidia-ctk cdi generate --feature-flag=no-additional-gids-for-device-nodes

In the case of the nvidia-cdi-refresh.service updating /etc/nvidia-container-toolkit/nvidia-cdi-refresh.env to include:

NVIDIA_CTK_CDI_GENERATE_FEATURE_FLAGS=no-additional-gids-for-device-nodes

will also ensure that a spec without the additional fields is generated.

The runtime support for the v0.7.0 spec is as follows:

@elezar
Copy link
Member Author

elezar commented Aug 5, 2024

Note: This updates the default spec version to v0.7.0 since the DRM devices generally require additional GIDs. As such, this should probably be an opt-in (or at least an opt-out) feature.

@elezar elezar requested review from cdesiniotis and klueska August 5, 2024 12:07
@elezar elezar self-assigned this Aug 12, 2024
@elezar elezar force-pushed the CNT-4739/additional-gids branch from 74441f4 to 285ea44 Compare August 21, 2024 09:30
@elezar elezar requested a review from tariq1890 August 21, 2024 09:36
@elezar elezar marked this pull request as ready for review August 21, 2024 09:36
@elezar elezar force-pushed the CNT-4739/additional-gids branch from 285ea44 to 141772c Compare August 22, 2024 12:21
@elezar elezar requested review from klueska and tariq1890 August 22, 2024 12:22
@elezar elezar force-pushed the CNT-4739/additional-gids branch 3 times, most recently from fab0c6d to 5d55813 Compare August 22, 2024 16:17
@elezar elezar added this to the v1.18.0 milestone Jun 27, 2025

This comment was marked as outdated.

@ArangoGutierrez ArangoGutierrez added the needs-rebase Pull request has merge conflicts label Jul 8, 2025
@elezar elezar modified the milestones: v1.18.0, next-minor Aug 25, 2025
@elezar elezar removed the needs-rebase Pull request has merge conflicts label Sep 16, 2025
@elezar elezar marked this pull request as draft September 16, 2025 12:42
@elezar elezar modified the milestones: next-minor, v1.19.0 Nov 6, 2025
@elezar elezar modified the milestones: v1.19.0, next-minor Jan 20, 2026
@elezar elezar force-pushed the CNT-4739/additional-gids branch from 5d55813 to baa308b Compare February 5, 2026 20:33
@elezar elezar force-pushed the CNT-4739/additional-gids branch from baa308b to ab620ac Compare February 6, 2026 12:50
@elezar elezar modified the milestones: next-minor, v1.19.0 Feb 6, 2026
@elezar elezar force-pushed the CNT-4739/additional-gids branch from ab620ac to 1198939 Compare February 6, 2026 14:48
@elezar elezar marked this pull request as ready for review February 6, 2026 14:48
@elezar elezar marked this pull request as draft February 6, 2026 14:49
@coveralls
Copy link

coveralls commented Feb 6, 2026

Pull Request Test Coverage Report for Build 22136305391

Details

  • 73 of 119 (61.34%) changed or added relevant lines in 16 files are covered.
  • 20 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.08%) to 39.245%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/nvcdi/lib-nvml.go 0 1 0.0%
pkg/nvcdi/mig-device-nvml.go 0 1 0.0%
internal/edits/device.go 23 25 92.0%
pkg/nvcdi/lib-wsl.go 0 2 0.0%
pkg/nvcdi/management.go 0 2 0.0%
internal/modifier/discover.go 14 18 77.78%
internal/modifier/cdi.go 0 6 0.0%
pkg/nvcdi/transform/deduplicate.go 11 17 64.71%
internal/edits/edits.go 6 28 21.43%
Files with Coverage Reduction New Missed Lines %
internal/edits/edits.go 20 8.82%
Totals Coverage Status
Change from base Build 22134051897: 0.08%
Covered Lines: 5769
Relevant Lines: 14700

💛 - Coveralls

@elezar elezar force-pushed the CNT-4739/additional-gids branch from 1198939 to 4382fee Compare February 6, 2026 15:13
@elezar elezar force-pushed the CNT-4739/additional-gids branch 2 times, most recently from a2251f9 to b3662cf Compare February 12, 2026 13:23
@elezar elezar marked this pull request as ready for review February 12, 2026 13:25
Comment on lines 126 to 128
if permissionsForOther := dn.FileMode.Perm(); permissionsForOther&06 != 0 {
return []uint32{*dn.GID}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow this. The device is world readable or writeable if (dn.FileMode.Perm() & 06) != 0. Don't we want to add the group id for the opposite condition when the device is NOT world readable AND world writeable?

In my mind, this should be re-written as:

if rwPermissionsForOther := (dn.FileMode.Perm() & 06); rwPermissionsForOther != 06 {
  return []uint32{*dn.GID}
}

but let me know if I am misunderstanding.

Copy link
Member Author

@elezar elezar Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. You're correct. The logic is not complete. Thanks for your suggestion. Let me also add some unit tests.

Update: After adding the unit test, I decided to also introduce two helpers (isWorldReadable, isWorldWritable` for readability).

pkg/nvcdi/api.go Outdated
FeatureDisableMultipleCSVDevices = FeatureFlag("disable-multiple-csv-devices")

// NoAdditionalGIDsForDeviceNodes disables the injection of additional GIDs
// for a device node when the node is not readable by the user.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as before. Should we mention both read and write permissions?
s/not readable by the user/not readable and writeable by the user/

@elezar elezar force-pushed the CNT-4739/additional-gids branch 5 times, most recently from a5067df to 11b4d88 Compare February 17, 2026 08:51
This change removes the creation of a ContainerEdits modifier
when constructing a modifier from a Discoverer. Instead, the
created ContainerEdits are applied directly to the incomming
OCI runtime spec.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a Factory interface to the edits package. This can be
used to construct CDI container edits. Currently an EmptyFactory is the
only implementation and replaces direct calls to NewContainerEdits.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a FromDiscoverer method to the edits.Factory
interface. This allows for a configurable way to generate CDI
container edits from a discoverer.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds support for injecting additional GIDs using the internal
CDI representations. (Applicable for Tegra-based systems and /dev/dri devices)
This is disabled by default, but can be opted in to by setting the
features.allow-additional-gids feature flag.

This can also be done by running

sudo nvidia-ctk config --in-place --set features.allow-additional-gids

Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a no-additional-gids-for-device-nodes feature flag to the
nvidia-container-runtime feature flags as well as the nvcdi feature flags.

When enabled, the check to add additional GIDs for devices that are not
world-read/writeable is skipped.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
@elezar elezar force-pushed the CNT-4739/additional-gids branch from 11b4d88 to e559d52 Compare February 18, 2026 10:37
@elezar elezar merged commit 7bdb4e5 into NVIDIA:main Feb 19, 2026
16 checks passed
@elezar elezar deleted the CNT-4739/additional-gids branch February 19, 2026 07:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants

Comments