Skip to content

Linux: cbm_system_info / cbm_default_worker_count don't respect cgroup CPU/memory limits #363

@yangsec888

Description

@yangsec888

Summary

When CBM runs inside a Linux container with cgroup CPU and/or memory
limits (the normal Kubernetes / Docker / Nomad / systemd-slice case),
cbm_system_info() and cbm_default_worker_count() report
node-level CPU count and RAM, not the cgroup's effective limits.
This produces three concrete operator-visible problems:

  1. cbm_default_worker_count(initial=true) returns
    sysconf(_SC_NPROCESSORS_ONLN), which on Linux is the number of
    online host CPUs. A 1-vCPU container scheduled onto a 16-core
    node spawns ~16 indexing workers, each with its own per-worker
    buffers (AST stacks, tree-sitter parsers, slab allocators).
    In a memory-constrained container this is the dominant OOMKill
    driver — far more so than g_budget (which is observability-only;
    cbm_mem_over_budget() and cbm_vmem_over_budget() are declared
    but never called).
  2. mem.init budget_mb=… log line is computed from the host's
    sysinfo(2).totalram, so a 2 GiB-limited pod on a 62 GB node
    logs budget_mb=31000, which is alarming and confuses incident
    response. (Cosmetic, but compounds 1.)
  3. The over-budget worker count also magnifies the blast radius of
    Dump-phase crash on large TS monorepo (v0.6.1, darwin-arm64) — pipeline completes through semantic_edges then terminates abnormally before gbuf.dump #317 (dump-phase OOM crash on large TS monorepo) and Silent index corruption after rapid kill/restart cycles #334 (silent
    index corruption after rapid kill/restart) — both of which we have
    reproduced in the wild on Kubernetes pods.

Where in the source

Verified against main at HEAD 22153563cd1072b4f79e0e27113f6e0dea3abc1a:

Suggested fix shape

Two reads in detect_system_linux(), both with safe fallbacks to
the existing host-scoped values:

  1. CPU count — read cgroup v2 cpu.max (preferred) or v1
    cpu.cfs_quota_us / cpu.cfs_period_us. If the result is
    max / unlimited / parse error, fall back to
    sysconf(_SC_NPROCESSORS_ONLN).
  2. Memory — read cgroup v2 memory.max (preferred) or v1
    memory.limit_in_bytes. If the result is max / unlimited or
    exceeds host total, fall back to sysinfo(2).totalram * mem_unit.

Both files live at well-known paths (/sys/fs/cgroup/... for v2,
/sys/fs/cgroup/<controller>/... for v1) and are simple text
reads. No new dependencies.

A reasonable cap for safety: min(cgroup_cpu, host_cpu) and
min(cgroup_mem, host_mem). This is robust against
mis-mounted-cgroups edge cases.

Compatible with existing env knobs

Once cgroup awareness lands, an explicit env-override remains useful
for ops who want to tune below cgroup limits (e.g. leave headroom
for sibling processes in the same container). A companion PR (filed
alongside this issue) adds CBM_WORKERS for that case. The two
together give: env > cgroup > host as the precedence chain,
matching the CBM_SQLITE_MMAP_SIZE precedent from commit 093707c.

Out of scope

  • macOS, BSD, Windows containerization stories (none of them have
    the same sysconf/sysinfo problem in practice).
  • mimalloc tuning. mimalloc respects its own arena settings; the
    request here is just to compute the inputs CBM passes to its
    worker scheduler and budget logger correctly.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions