Enrich process telemetry with binary memory metrics and fix stack_id in OTEL resource#3621
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3621 +/- ##
===========================================
- Coverage 89.20% 67.86% -21.35%
===========================================
Files 25 15 -10
Lines 2520 557 -1963
Branches 641 0 -641
===========================================
- Hits 2248 378 -1870
+ Misses 270 179 -91
+ Partials 2 0 -2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
d419585 to
a2322ac
Compare
|
.DS_store |
|
test body via graphql |
✅ Deploy Preview for electric-next ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Claude Code ReviewSummaryIteration 7 adds one new commit ( What's Working Well
Issues FoundCritical (Must Fix)None. Important (Should Fix)None. Suggestions (Nice to Have)Unused variable in lux test (carry-over from iteration 4) File:
File: Hardcoded File: Issue ConformanceNo linked issue — acceptable for this observability improvement. Previous Review Status
Review iteration: 7 | 2026-04-14 |
3392db7 to
6eea74f
Compare
put_new would silently drop stack_id when the user provides a custom :resource map. Use Map.put_new on the existing resource map instead, so stack_id is always included without overwriting user-provided keys. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The fallback branch when info[:binary] is not a list was returning a map with only proc_mem, but the reduce function expects binary_mem, ref_count_sum, and num_binaries keys too. This would cause a KeyError if the branch were ever reached. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
process.memory.* now only exports total, sorted by aggregated process memory. The new process.bin_memory.* metric sorts process groups by referenced binary memory and exports: total, max_bin_count, avg_bin_count, max_ref_count, avg_ref_count. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Extract shared top_by/3 to deduplicate top_memory and top_bin_memory Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> wip
The new limit includes all process groups whose aggregated memory is at least n bytes. This is useful for binary memory telemetry where a percentage-of-total doesn't make sense due to refc binary double-counting. Also makes take_until_target accept the low cutoff as an argument instead of using the hardcoded @min_group_memory. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rget When sorting by binary_mem, the low cutoff and accumulator should compare against binary_mem, not proc_mem. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The sort order test now independently verifies each list is sorted by its respective key and asserts the top entries differ. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ETRY_INIT_DELAY The default remains 30s for production. Integration tests set it to 1s so stack metrics are exported quickly enough for test assertions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6eea74f to
9db1cc8
Compare
CI seems to be having trouble with this configuration option
|
|
||
| COPY mix.* /builder/electric/ | ||
| RUN mix deps.get | ||
| RUN MIX_OS_DEPS_COMPILE_PARTITION_COUNT=4 mix deps.compile |
There was a problem hiding this comment.
For some reason this causes trouble on CI runners occasionally. I can revert it just before merging.
This reverts commit e8c948d.
Summary
New binary memory metrics: Add separate
process.bin_memory.*metrics that track off-heap (refc) binary memory per process group, including total bytes, max/avg binary count, and max/avg reference count. This gives visibility into binary memory pressure that was previously invisible — process heap memory alone doesn't show refc binary accumulation.Fix stack_id in OTEL resource: Stack telemetry was missing
stack_idin its OTEL resource attributes, making it impossible to distinguish stacks in the metrics backend. Now merged into the resource so all stack metrics carry the identifier.Configurable stack telemetry init delay: New
ELECTRIC_STACK_TELEMETRY_INIT_DELAYenv var replaces the hardcoded 30s delay, enabling faster metric emission in tests and tunable startup behavior in production.Richer
Processesmodule: Generalizedtop_memory_by_typeintotop_by(sort_key)supporting sorting by either process memory or binary memory. Added{:at_least_bytes, n}limit variant. Each process group now returnsproc_mem,binary_mem,max_bin_count,avg_bin_count,max_ref_count, andavg_ref_count.Expanded integration test: Replaced
stack-telemetry.luxwithotel-export.luxthat verifies both application-level metrics (including the newprocess.bin_memory.*family) and stack-level metrics (includingstack_idin the resource and LSN/storage metrics).CI: Cache lux to avoid re-fetching it on every integration test run.
Test plan
electric-telemetryunit tests pass (including new tests fortop_bin_memory_by_typeand{:at_least_bytes, n})sync-servicecompiles cleanlyotel-export.luxintegration test via CI