Skip to content

feat(platform): cgroup-aware CPU/memory detection in detect_system_linux#365

Open
yangsec888 wants to merge 1 commit into
DeusData:mainfrom
yangsec888:feat/cgroup-aware-detection
Open

feat(platform): cgroup-aware CPU/memory detection in detect_system_linux#365
yangsec888 wants to merge 1 commit into
DeusData:mainfrom
yangsec888:feat/cgroup-aware-detection

Conversation

@yangsec888
Copy link
Copy Markdown

Summary

Closes #363 in conjunction with #364: this PR is the auto-detect half of cgroup-awareness; #364 is the CBM_WORKERS env-override escape hatch. They're independent and can land in either order.

Today detect_system_linux() reports host CPU count via sysconf(_SC_NPROCESSORS_ONLN) and host RAM via sysinfo(). Inside a container, neither reflects the cgroup's effective quota, so cbm_default_worker_count over-provisions workers and the SQLite mmap budget can exceed the cgroup memory cap — see #363 for the OOMKill story.

This PR makes the Linux detection path cgroup-aware:

  1. CPUs: read <root>/cpu.max (v2) or <root>/cpu/cpu.cfs_quota_us + cpu.cfs_period_us (v1), compute ceil(quota / period). "max" and -1 quotas mean "no limit, fall back to sysconf".
  2. Memory: read <root>/memory.max (v2) or <root>/memory/memory.limit_in_bytes (v1). "max" is unlimited; cgroup v1's near-ULLONG_MAX sentinel (PAGE_COUNTER_MAX) is also treated as unlimited.
  3. Safety clamp: both axes take min(cgroup, host), so a mis-mounted cgroup that reports something bigger than the host can't push us above true hardware.
  4. Fallback: missing cgroup files → host values, exactly as before. Bare-metal hosts and non-Linux platforms are unchanged.

The helpers live in a new internal header src/foundation/system_info_internal.h (same pattern as the existing pipeline_internal.h) so tests can drive them against a fake /sys/fs/cgroup tree without depending on the runtime environment.

Why this matters

Symptoms in the downstream consumer (sast-ai-app) before the workaround landed:

  • Pod with limits.memory: 2Gi on a 32-core node spawned 32 indexing workers.
  • Each worker's mmap window scaled with host RAM rather than the cgroup cap.
  • Result: OOMKilled mid-index, watcher restart loop, 20Gi PVC grew anyway.

Operationally this was patched downstream by quadrupling pod memory and pre-creating PVCs at 4× the size CBM "thought" it needed. With this PR, the cgroup limits flow through cbm_default_worker_count naturally and the over-provisioning disappears.

Test plan

11 new Linux-only tests in tests/test_platform.c, each creating a fresh mkdtemp tree and exercising one detection path:

  • cgroup_v2_cpu_quotacpu.max = "200000 100000" → 2
  • cgroup_v2_cpu_quota_rounds_upcpu.max = "150000 100000" → ceil(1.5) = 2
  • cgroup_v2_cpu_unlimitedcpu.max = "max 100000" → -1
  • cgroup_v1_cpu_quota — cfs_quota_us/cfs_period_us = 200000/100000 → 2
  • cgroup_v1_cpu_unlimited — cfs_quota_us = -1 → -1
  • cgroup_no_cpu_files — empty tmp dir → -1
  • cgroup_v2_memmemory.max = "2147483648" → 2 GiB
  • cgroup_v2_mem_unlimitedmemory.max = "max" → 0
  • cgroup_v1_memmemory.limit_in_bytes = "1073741824" → 1 GiB
  • cgroup_v1_mem_unlimited_sentinelmemory.limit_in_bytes = "9223372036854775807" → 0
  • cgroup_no_mem_files — empty tmp dir → 0

Local verification

Verified on macOS (Apple Silicon, clang 17). The 11 Linux tests are #ifdef __linux__-guarded and skipped on macOS, so end-to-end Linux validation lives in upstream CI.

  • Build (scripts/build.sh, -Wall -Wextra -Werror): clean, 47s.
  • Test (scripts/test.sh): 3553 passed, 1 failed. The single failure is search_code_multi_word in tests/test_mcp.c:694 — already failing on plain upstream/main at the same SHA, unrelated to this PR, also showing up on recent nightly soak failures.
  • Lint (clang-format + cppcheck): clean on all three touched files. (The remaining clang-format diff at system_info.c:97/99 is in BSD code I didn't touch; it reflects an Apple clang-format 17 vs. CI clang-format disagreement that exists on upstream/main already.)

Files

  • src/foundation/system_info.c — new cgroup helpers; detect_system_linux rewritten with the safety clamps. macOS/BSD/Windows paths untouched. (+117/-6)
  • src/foundation/system_info_internal.h — new internal header declaring the cgroup helpers for tests. (+44)
  • tests/test_platform.c — 11 new Linux-only tests + small tmp-dir/fixture helpers. (+179)

Relationship to #364

These two are deliberately independent:

If both land, the precedence is: CBM_WORKERS env > cgroup auto-detect > host fallback, which matches the precedence shape we use for other CBM_* knobs.

Detect the effective CPU quota and memory limit from cgroup v2 or v1
files rather than always reporting host totals. Inside a container,
`sysconf(_SC_NPROCESSORS_ONLN)` and `sysinfo()` return the host's
numbers — which makes downstream consumers (e.g. cbm_default_worker_count)
over-provision workers, exhaust the cgroup's memory cap, and trigger
OOMKills.

The new Linux path:

  1. Reads `<root>/cpu.max` (v2) or `<root>/cpu/cpu.cfs_{quota,period}_us`
     (v1) and computes effective CPUs as ceil(quota/period).
  2. Reads `<root>/memory.max` (v2) or
     `<root>/memory/memory.limit_in_bytes` (v1) and treats "max" /
     near-ULLONG_MAX as "unlimited".
  3. Takes min(cgroup, host) for both, so a mis-mounted cgroup that
     reports something larger than the host can't push us above true
     hardware. Falls back cleanly to host when no cgroup files exist.

Helpers are exposed via `src/foundation/system_info_internal.h` (an
internal-only header, alongside the existing pipeline_internal.h
precedent) so tests can drive them against a fake `/sys/fs/cgroup`
tree without depending on the runtime environment.

Adds 11 Linux-only tests covering:
  - v2 cpu.max integer quota + ceil rounding + "max" unlimited
  - v1 cfs_quota_us/cfs_period_us + -1 unlimited
  - v2 memory.max integer + "max"
  - v1 memory.limit_in_bytes + near-ULLONG_MAX unlimited sentinel
  - Missing cgroup files (host-fallback path)

macOS and BSD detection are unchanged. Windows is unaffected.

Refs DeusData#363
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Linux: cbm_system_info / cbm_default_worker_count don't respect cgroup CPU/memory limits

1 participant