The Phase 12.9 CI tripwire at crates/beava-core/tests/per_entity_size_dump.rs enforces size_of::<AggOp>() ≤ 80 B — but that only measures the stack-inlined slot. Variants that box state (Box<UDDSketch>, Box<TrendResidualState>, etc.) still cost 80 B inline PLUS unbounded heap. The static bytes_per_entity_p99 = 7000 placeholder in /metrics (crates/beava-server/src/http_admin.rs:124) is a guess; we don't actually know which ops dominate.
Build a per-op profiler that takes a populated AggOp variant and returns:
stack_bytes — always 80 (the enum slot)
heap_bytes — recursive into Box / Vec / HashMap / sketch internals
breakdown — structured per data-structure-type within state (so we can see "Sum spends N bytes on Box<WindowedOp> overhead, UDDSketch spends X bytes on bucket map, EWMA spends Y bytes on Welford triple")
Run it against the fraud-team workload (the realistic 14-op / 110-feature mix per fraud-team.json config) at steady state and emit a sorted table.
Suspected offenders to verify
sum, mean, count — small state, but each may pay an outsized Box pointer overhead relative to useful bytes. Specific concern that motivated this issue.
- Sketches (UDDSketch, HLL, count-min, bloom) — known-heavy but with workload-dependent variance.
- Windowed wrappers — every windowed op adds a
Vec of bucket states; the overhead amortizes badly for short windows.
TrendResidual, BurstCount — already flagged in v0.1 deferrals for borderline boxing (would drop the AggOp floor 80→64 B if all heavy variants box out).
Done when
- A new binary at
crates/beava-bench/src/bin/memprofile.rs runs the profile against the fraud-team config + writes memory-profile-fraud-team.md with the sorted op-by-op table.
- The top 5 memory offenders have a concrete byte breakdown + a one-line recommendation per op (keep / box smaller / restructure).
- An assertion verifies
Σ per-op heap bytes ≈ /metrics bytes_per_entity_p99 so reality and the Prometheus value stay coherent (separate bug if they diverge).
Sibling work this unlocks
Replacing the static bytes_per_entity_p99 placeholder with live dynamic sampling (the Phase 12.8 D-04 deferral) becomes trivial once this profiler is built — file separately after this lands and points at what to instrument.
~250 LOC + the generated report. Cohort Track-1 sized — meaty perf measurement work, no architectural decisions, plays directly to the pointer-overhead accounting question.
The Phase 12.9 CI tripwire at
crates/beava-core/tests/per_entity_size_dump.rsenforcessize_of::<AggOp>() ≤ 80 B— but that only measures the stack-inlined slot. Variants that box state (Box<UDDSketch>,Box<TrendResidualState>, etc.) still cost 80 B inline PLUS unbounded heap. The staticbytes_per_entity_p99 = 7000placeholder in/metrics(crates/beava-server/src/http_admin.rs:124) is a guess; we don't actually know which ops dominate.Build a per-op profiler that takes a populated
AggOpvariant and returns:stack_bytes— always 80 (the enum slot)heap_bytes— recursive intoBox/Vec/HashMap/ sketch internalsbreakdown— structured per data-structure-type within state (so we can see "Sum spends N bytes onBox<WindowedOp>overhead, UDDSketch spends X bytes on bucket map, EWMA spends Y bytes on Welford triple")Run it against the fraud-team workload (the realistic 14-op / 110-feature mix per
fraud-team.jsonconfig) at steady state and emit a sorted table.Suspected offenders to verify
sum,mean,count— small state, but each may pay an outsizedBoxpointer overhead relative to useful bytes. Specific concern that motivated this issue.Vecof bucket states; the overhead amortizes badly for short windows.TrendResidual,BurstCount— already flagged in v0.1 deferrals for borderline boxing (would drop the AggOp floor 80→64 B if all heavy variants box out).Done when
crates/beava-bench/src/bin/memprofile.rsruns the profile against the fraud-team config + writesmemory-profile-fraud-team.mdwith the sorted op-by-op table.Σ per-op heap bytes ≈ /metrics bytes_per_entity_p99so reality and the Prometheus value stay coherent (separate bug if they diverge).Sibling work this unlocks
Replacing the static
bytes_per_entity_p99placeholder with live dynamic sampling (the Phase 12.8 D-04 deferral) becomes trivial once this profiler is built — file separately after this lands and points at what to instrument.~250 LOC + the generated report. Cohort Track-1 sized — meaty perf measurement work, no architectural decisions, plays directly to the pointer-overhead accounting question.