Cgroup adopt#2
Open
reboss wants to merge 16 commits into
Open
Conversation
Implements middleware to extract peer credentials (PID, UID, GID) from Unix socket connections via SO_PEERCRED syscall. This provides the foundation for per-user enforcement features like cgroup adoption. - Linux implementation uses SO_PEERCRED to extract credentials from fd - Non-Linux platforms get a no-op implementation - Credentials are stored in request context for downstream handlers - Gracefully handles non-Unix socket connections (TCP, etc.) Signed-off-by: John Robbins <John.Robbins@amd.com>
Tests verify: - Context value storage and retrieval of peer credentials - Middleware structure and handler wrapping - Graceful handling of missing credentials Signed-off-by: John Robbins <John.Robbins@amd.com>
Implements DeriveParentFromPid() to read /proc/<pid>/cgroup and extract the appropriate cgroup parent slice for container placement. - Supports both cgroup v1 and v2 formats - Prioritizes systemd slices over other controllers - Extracts deepest .slice component (e.g., user-1000.slice) - Falls back gracefully for non-systemd setups Signed-off-by: John Robbins <John.Robbins@amd.com>
Tests verify: - Cgroup v2 format parsing (unified hierarchy) - Cgroup v1 format parsing (systemd and cpu controllers) - Extraction of deepest .slice component - Error handling for invalid/empty files - Edge cases (root cgroup, no slice components) Signed-off-by: John Robbins <John.Robbins@amd.com>
Adds new boolean config field to enable user cgroup adoption. When enabled, containers will automatically inherit their creator's cgroup parent based on the API client's process ID. This is the configuration knob for the cgroup adoption enforcement feature, allowing administrators to enable/disable it via daemon.json or CLI flags. Signed-off-by: John Robbins <John.Robbins@amd.com>
Adds CLI flag registration for the cgroup adoption feature. Administrators can enable it via: dockerd --adopt-user-cgroups Placed near other security and cgroup-related flags for logical grouping. Signed-off-by: John Robbins <John.Robbins@amd.com>
Adds peer credential middleware to the middleware chain, positioned after version middleware and before authorization middleware. The middleware extracts peer credentials from Unix socket connections and makes them available in request context for downstream handlers and enforcement logic. Signed-off-by: John Robbins <John.Robbins@amd.com>
Add placeholder tests for daemon-level cgroup adoption enforcement. Tests are marked as Skip until the applyCgroupAdoption method is implemented in the next commit. Tests will verify: - Adoption works when enabled with valid peer credentials - Error when peer credentials missing - Rejection of user-specified cgroup parent - Acceptance of matching cgroup parent values Signed-off-by: John Robbins <John.Robbins@amd.com>
Adds applyCgroupAdoption() method that: - Extracts peer credentials from request context - Derives cgroup parent from the client's PID - Enforces that containers cannot override the cgroup parent - Rejects requests with error if user tries to set different parent Integrated into adaptContainerSettings() which is called during container creation. Only enforces when AdoptUserCgroups config is enabled. The enforcement is strict: containers MUST run under their creator's cgroup when adoption is enabled, with no exceptions. Signed-off-by: John Robbins <John.Robbins@amd.com>
- TestCgroupAdoptionEnabled: Verifies containers inherit creator's cgroup - TestCgroupAdoptionDisabled: Verifies feature is off by default - TestCgroupAdoptionUserOverrideRejected: Verifies enforcement (rejects non-matching parent) - TestCgroupAdoptionMatchingParentAccepted: Allows matching parent override - TestCgroupAdoptionNoPeerCredentials: Handles missing peer credentials gracefully Tests require root and use daemon.New() test harness. Signed-off-by: John Robbins <John.Robbins@amd.com>
Add ConnContext to http.Server to store the net.Conn in request context. This is required for the peer credential middleware to access the underlying Unix socket file descriptor via SO_PEERCRED syscall. Without this, r.Context().Value(http.LocalAddrContextKey) returns nil and peer credentials cannot be extracted. Signed-off-by: John Robbins <John.Robbins@amd.com>
Change cgroup adoption to use the entire cgroup hierarchy path instead of extracting only the deepest .slice component. This ensures containers properly inherit resource limits from SLURM jobs, systemd scopes, and other complex cgroup hierarchies. For example, a SLURM job's cgroup: /system.slice/slurmstepd.scope/job_123/step_0/user/task_0 will now be adopted in full, rather than just "system.slice".
Add middleware to extract UID/GID/PID from Unix socket connections using SO_PEERCRED. This allows API handlers to identify the client process for security and resource management features. The middleware stores credentials in the request context using PeerCredKey, and uses a custom PeerConnKey to avoid conflicts with http.LocalAddrContextKey which gets overwritten by the HTTP stack.
Register the peer credential middleware in the API server chain and configure ConnContext to store connections in the request context. Uses middleware.PeerConnKey to avoid conflicts with the standard http.LocalAddrContextKey which the HTTP stack overwrites with the local address value.
Update cgroup adoption tests to expect full path instead of deepest .slice component. Add test for SLURM cgroup hierarchy. Add tests for peer credential middleware to verify: - Middleware handles missing connections gracefully - PeerConnKey is distinct from http.LocalAddrContextKey - Credentials are properly stored in context
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
- What I did
Added support for automatic cgroup adoption via a new daemon configuration option
--adopt-user-cgroups. When enabled, containers automatically inherit their creator's cgroup parent instead of running under the default Docker cgroup. This enables better resource isolation and accounting in multi-user environments where users should not be able to escape their systemd resource constraints.This feature is particularly useful in:
- How I did it
The implementation follows a clean architectural pattern:
Peer credential extraction - Created middleware (
daemon/server/middleware/peercred_linux.go) that extracts UID/GID/PID from Unix socket connections using the SO_PEERCRED syscall and stores them in the request context.Cgroup derivation - Added utility (
pkg/cgroups/adoption_linux.go) that reads/proc/<pid>/cgroupand parses both cgroup v1 and v2 formats to derive the user's cgroup parent (extracts the deepest.slicecomponent).Daemon configuration - Added
AdoptUserCgroups boolfield to daemon config with corresponding--adopt-user-cgroupsCLI flag.Enforcement at daemon layer - Modified
adaptContainerSettings()indaemon/daemon_unix.goto callapplyCgroupAdoption()when the feature is enabled. The enforcement uses a strict model: if a user tries to specify a different cgroup parent than the adopted one, the request is rejected with an InvalidParameter error.Platform support - Linux-specific implementations with no-op stubs for other platforms.
All code follows test-driven development with comprehensive unit and integration tests.
- How to verify it
Start dockerd with the feature enabled:
Check your current cgroup:
Create a container:
docker run -d --name test nginxVerify the container inherited your cgroup parent:
Verify enforcement - attempting to override should fail:
docker run --cgroup-parent /custom/parent nginx # Should error: "cannot set cgroup parent when --adopt-user-cgroups is enabled"Run the test suite:
- Human readable description for the release notes
- A picture of a cute animal (not mandatory but encouraged)