Skip to content

perf(memory): per-AggOp memory profile tool + worst-offender report #68

@petrpan26

Description

@petrpan26

The Phase 12.9 CI tripwire at crates/beava-core/tests/per_entity_size_dump.rs enforces size_of::<AggOp>() ≤ 80 B — but that only measures the stack-inlined slot. Variants that box state (Box<UDDSketch>, Box<TrendResidualState>, etc.) still cost 80 B inline PLUS unbounded heap. The static bytes_per_entity_p99 = 7000 placeholder in /metrics (crates/beava-server/src/http_admin.rs:124) is a guess; we don't actually know which ops dominate.

Build a per-op profiler that takes a populated AggOp variant and returns:

  • stack_bytes — always 80 (the enum slot)
  • heap_bytes — recursive into Box / Vec / HashMap / sketch internals
  • breakdown — structured per data-structure-type within state (so we can see "Sum spends N bytes on Box<WindowedOp> overhead, UDDSketch spends X bytes on bucket map, EWMA spends Y bytes on Welford triple")

Run it against the fraud-team workload (the realistic 14-op / 110-feature mix per fraud-team.json config) at steady state and emit a sorted table.

Suspected offenders to verify

  • sum, mean, count — small state, but each may pay an outsized Box pointer overhead relative to useful bytes. Specific concern that motivated this issue.
  • Sketches (UDDSketch, HLL, count-min, bloom) — known-heavy but with workload-dependent variance.
  • Windowed wrappers — every windowed op adds a Vec of bucket states; the overhead amortizes badly for short windows.
  • TrendResidual, BurstCount — already flagged in v0.1 deferrals for borderline boxing (would drop the AggOp floor 80→64 B if all heavy variants box out).

Done when

  • A new binary at crates/beava-bench/src/bin/memprofile.rs runs the profile against the fraud-team config + writes memory-profile-fraud-team.md with the sorted op-by-op table.
  • The top 5 memory offenders have a concrete byte breakdown + a one-line recommendation per op (keep / box smaller / restructure).
  • An assertion verifies Σ per-op heap bytes ≈ /metrics bytes_per_entity_p99 so reality and the Prometheus value stay coherent (separate bug if they diverge).

Sibling work this unlocks

Replacing the static bytes_per_entity_p99 placeholder with live dynamic sampling (the Phase 12.8 D-04 deferral) becomes trivial once this profiler is built — file separately after this lands and points at what to instrument.

~250 LOC + the generated report. Cohort Track-1 sized — meaty perf measurement work, no architectural decisions, plays directly to the pointer-overhead accounting question.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area: serverRust server / core / runtime-core crates

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions