Skip to content

feat(platform): CBM_WORKERS env override for cbm_default_worker_count#364

Open
yangsec888 wants to merge 1 commit into
DeusData:mainfrom
yangsec888:feat/cbm-workers-env-override
Open

feat(platform): CBM_WORKERS env override for cbm_default_worker_count#364
yangsec888 wants to merge 1 commit into
DeusData:mainfrom
yangsec888:feat/cbm-workers-env-override

Conversation

@yangsec888
Copy link
Copy Markdown

What

Adds a single env-knob, CBM_WORKERS, that explicitly sets the
worker count returned by cbm_default_worker_count(). When unset,
behaviour is unchanged.

Why

In containerized deployments, cbm_default_worker_count(initial=true)
returns sysconf(_SC_NPROCESSORS_ONLN) (host-CPU count, not the
container's CPU quota). On a 1-vCPU pod scheduled onto a 16-core
node cbm spawns ~16 indexing workers, each with its own per-worker
buffers — the dominant OOMKill driver in container deployments
(see companion issue #363 for the broader cgroup-awareness ask).

A small env knob solves the immediate operator pain without taking
on the scope of full cgroup detection, and it follows the same
shape as the existing CBM_* env overrides: explicit override >
implicit detection.

Patch

src/foundation/system_info.c:

int cbm_default_worker_count(bool initial) {
    /* CBM_WORKERS env override (clamped to [1, CBM_WORKERS_MAX]).
     * Useful inside containers where sysconf(_SC_NPROCESSORS_ONLN)
     * reports host CPUs rather than the cgroup's effective CPU quota.
     * Same precedence shape as other CBM_* env overrides:
     * explicit override > implicit detection. */
    char buf[CBM_SZ_32];
    if (cbm_safe_getenv("CBM_WORKERS", buf, sizeof(buf), NULL) != NULL) {
        long n = strtol(buf, NULL, CBM_DECIMAL_BASE);
        if (n >= MIN_WORKERS && n <= CBM_WORKERS_MAX) {
            return (int)n;
        }
        cbm_log_warn("workers.env.invalid", "value", buf,
                     "fallback", "sysconf");
    }

    cbm_system_info_t info = cbm_system_info();
    // ... unchanged from today ...
}

CBM_WORKERS_MAX = 256 is added to the file-local enum; happy to
drop the upper clamp if you'd rather trust the operator, or move
the constant somewhere shared. cbm_safe_getenv, CBM_DECIMAL_BASE,
CBM_SZ_32, SKIP_ONE, MIN_WORKERS, and cbm_log_warn all
already exist (see src/foundation/platform.h, constants.h, and
log.h).

Tests

tests/test_platform.c gains three cases inside the existing
SUITE(platform):

  1. platform_default_workers_env_overrideCBM_WORKERS=4 is honored for both initial=true and initial=false.
  2. platform_default_workers_env_invalid0, -1, 9999, and not-a-number all fall back to the sysconf-derived default (and emit workers.env.invalid at WARN).
  3. platform_default_workers_env_unset — when unset, cbm_default_worker_count(true) == cbm_system_info().total_cores, preserving today's contract.

Docs

One-line addition to the Environment Variables table in
README.md:

CBM_WORKERS — Override the parallel-indexing worker count
returned by cbm_default_worker_count. Useful inside containers
where sysconf(_SC_NPROCESSORS_ONLN) reports host CPUs rather
than the cgroup's effective quota. Range 1–256; invalid values
are ignored with a warning.

Behaviour when unset

Identical to today. No deployment that doesn't set CBM_WORKERS
sees any difference.

Verification on my side

I have not run scripts/build.sh / scripts/test.sh /
scripts/lint.sh locally because the full ASan + UBSan setup
isn't installed on my machine. Happy to iterate quickly on any
CI feedback (lint, clang-tidy, cppcheck, format) — the patch is
small enough that a turnaround should be fast.

Related

Adds a single env-knob, CBM_WORKERS, that explicitly sets the worker
count returned by cbm_default_worker_count(). When unset, behaviour
is unchanged.

In containerized deployments, cbm_default_worker_count(initial=true)
returns sysconf(_SC_NPROCESSORS_ONLN) (host-CPU count, not the
container's CPU quota). On a 1-vCPU pod scheduled onto a 16-core node
cbm spawns ~16 indexing workers, each with its own per-worker buffers
— the dominant OOMKill driver in container deployments. The broader
ask (full cgroup awareness for cbm_system_info / mem.init budget
logs) lives in a sibling issue; this knob is the smaller, lower-risk
path that ships independently.

Same precedence shape as the existing CBM_* env overrides: explicit
override > implicit detection. Range clamped to [1, 256]; invalid
values are ignored and the function logs `workers.env.invalid` at
WARN before falling through to the sysconf-derived default.

Files touched:
- src/foundation/system_info.c — env probe + clamp + warn-and-fall-through
- tests/test_platform.c — 3 new TEST cases covering override, invalid
  fallback, and unset baseline
- README.md — CBM_WORKERS row in the Environment Variables table

Related: cgroup-aware detection issue (filed alongside this PR).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant