feat(platform): cgroup-aware CPU/memory detection in detect_system_linux#365
Open
yangsec888 wants to merge 1 commit into
Open
feat(platform): cgroup-aware CPU/memory detection in detect_system_linux#365yangsec888 wants to merge 1 commit into
yangsec888 wants to merge 1 commit into
Conversation
Detect the effective CPU quota and memory limit from cgroup v2 or v1
files rather than always reporting host totals. Inside a container,
`sysconf(_SC_NPROCESSORS_ONLN)` and `sysinfo()` return the host's
numbers — which makes downstream consumers (e.g. cbm_default_worker_count)
over-provision workers, exhaust the cgroup's memory cap, and trigger
OOMKills.
The new Linux path:
1. Reads `<root>/cpu.max` (v2) or `<root>/cpu/cpu.cfs_{quota,period}_us`
(v1) and computes effective CPUs as ceil(quota/period).
2. Reads `<root>/memory.max` (v2) or
`<root>/memory/memory.limit_in_bytes` (v1) and treats "max" /
near-ULLONG_MAX as "unlimited".
3. Takes min(cgroup, host) for both, so a mis-mounted cgroup that
reports something larger than the host can't push us above true
hardware. Falls back cleanly to host when no cgroup files exist.
Helpers are exposed via `src/foundation/system_info_internal.h` (an
internal-only header, alongside the existing pipeline_internal.h
precedent) so tests can drive them against a fake `/sys/fs/cgroup`
tree without depending on the runtime environment.
Adds 11 Linux-only tests covering:
- v2 cpu.max integer quota + ceil rounding + "max" unlimited
- v1 cfs_quota_us/cfs_period_us + -1 unlimited
- v2 memory.max integer + "max"
- v1 memory.limit_in_bytes + near-ULLONG_MAX unlimited sentinel
- Missing cgroup files (host-fallback path)
macOS and BSD detection are unchanged. Windows is unaffected.
Refs DeusData#363
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #363 in conjunction with #364: this PR is the auto-detect half of cgroup-awareness; #364 is the
CBM_WORKERSenv-override escape hatch. They're independent and can land in either order.Today
detect_system_linux()reports host CPU count viasysconf(_SC_NPROCESSORS_ONLN)and host RAM viasysinfo(). Inside a container, neither reflects the cgroup's effective quota, socbm_default_worker_countover-provisions workers and the SQLite mmap budget can exceed the cgroup memory cap — see #363 for the OOMKill story.This PR makes the Linux detection path cgroup-aware:
<root>/cpu.max(v2) or<root>/cpu/cpu.cfs_quota_us+cpu.cfs_period_us(v1), computeceil(quota / period)."max"and-1quotas mean "no limit, fall back to sysconf".<root>/memory.max(v2) or<root>/memory/memory.limit_in_bytes(v1)."max"is unlimited; cgroup v1's near-ULLONG_MAXsentinel (PAGE_COUNTER_MAX) is also treated as unlimited.min(cgroup, host), so a mis-mounted cgroup that reports something bigger than the host can't push us above true hardware.The helpers live in a new internal header
src/foundation/system_info_internal.h(same pattern as the existingpipeline_internal.h) so tests can drive them against a fake/sys/fs/cgrouptree without depending on the runtime environment.Why this matters
Symptoms in the downstream consumer (sast-ai-app) before the workaround landed:
limits.memory: 2Gion a 32-core node spawned 32 indexing workers.OOMKilledmid-index, watcher restart loop, 20Gi PVC grew anyway.Operationally this was patched downstream by quadrupling pod memory and pre-creating PVCs at 4× the size CBM "thought" it needed. With this PR, the cgroup limits flow through
cbm_default_worker_countnaturally and the over-provisioning disappears.Test plan
11 new Linux-only tests in
tests/test_platform.c, each creating a freshmkdtemptree and exercising one detection path:cgroup_v2_cpu_quota—cpu.max = "200000 100000"→ 2cgroup_v2_cpu_quota_rounds_up—cpu.max = "150000 100000"→ ceil(1.5) = 2cgroup_v2_cpu_unlimited—cpu.max = "max 100000"→ -1cgroup_v1_cpu_quota— cfs_quota_us/cfs_period_us = 200000/100000 → 2cgroup_v1_cpu_unlimited— cfs_quota_us = -1 → -1cgroup_no_cpu_files— empty tmp dir → -1cgroup_v2_mem—memory.max = "2147483648"→ 2 GiBcgroup_v2_mem_unlimited—memory.max = "max"→ 0cgroup_v1_mem—memory.limit_in_bytes = "1073741824"→ 1 GiBcgroup_v1_mem_unlimited_sentinel—memory.limit_in_bytes = "9223372036854775807"→ 0cgroup_no_mem_files— empty tmp dir → 0Local verification
Verified on macOS (Apple Silicon, clang 17). The 11 Linux tests are
#ifdef __linux__-guarded and skipped on macOS, so end-to-end Linux validation lives in upstream CI.scripts/build.sh,-Wall -Wextra -Werror): clean, 47s.scripts/test.sh):3553 passed, 1 failed. The single failure issearch_code_multi_wordintests/test_mcp.c:694— already failing on plainupstream/mainat the same SHA, unrelated to this PR, also showing up on recent nightly soak failures.clang-format+cppcheck): clean on all three touched files. (The remainingclang-formatdiff atsystem_info.c:97/99is in BSD code I didn't touch; it reflects an Apple clang-format 17 vs. CI clang-format disagreement that exists onupstream/mainalready.)Files
src/foundation/system_info.c— new cgroup helpers;detect_system_linuxrewritten with the safety clamps. macOS/BSD/Windows paths untouched. (+117/-6)src/foundation/system_info_internal.h— new internal header declaring the cgroup helpers for tests. (+44)tests/test_platform.c— 11 new Linux-only tests + small tmp-dir/fixture helpers. (+179)Relationship to #364
These two are deliberately independent:
CBM_WORKERSenv override) lets operators force a specific value when they want to leave additional headroom below the cgroup cap (or above it, on bare metal).If both land, the precedence is:
CBM_WORKERSenv > cgroup auto-detect > host fallback, which matches the precedence shape we use for otherCBM_*knobs.