Enrich process telemetry with binary memory metrics and fix stack_id in OTEL resource by alco · Pull Request #3621 · electric-sql/electric

alco · 2025-12-17T12:56:58Z

Summary

New binary memory metrics: Add separate process.bin_memory.* metrics that track off-heap (refc) binary memory per process group, including total bytes, max/avg binary count, and max/avg reference count. This gives visibility into binary memory pressure that was previously invisible — process heap memory alone doesn't show refc binary accumulation.
Fix stack_id in OTEL resource: Stack telemetry was missing stack_id in its OTEL resource attributes, making it impossible to distinguish stacks in the metrics backend. Now merged into the resource so all stack metrics carry the identifier.
Configurable stack telemetry init delay: New ELECTRIC_STACK_TELEMETRY_INIT_DELAY env var replaces the hardcoded 30s delay, enabling faster metric emission in tests and tunable startup behavior in production.
Richer Processes module: Generalized top_memory_by_type into top_by(sort_key) supporting sorting by either process memory or binary memory. Added {:at_least_bytes, n} limit variant. Each process group now returns proc_mem, binary_mem, max_bin_count, avg_bin_count, max_ref_count, and avg_ref_count.
Expanded integration test: Replaced stack-telemetry.lux with otel-export.lux that verifies both application-level metrics (including the new process.bin_memory.* family) and stack-level metrics (including stack_id in the resource and LSN/storage metrics).
CI: Cache lux to avoid re-fetching it on every integration test run.

Test plan

electric-telemetry unit tests pass (including new tests for top_bin_memory_by_type and {:at_least_bytes, n})
sync-service compiles cleanly
otel-export.lux integration test via CI

codecov · 2025-12-17T12:57:34Z

Codecov Report

❌ Patch coverage is 72.00000% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.86%. Comparing base (0bbb377) to head (5b9694e).
⚠️ Report is 2 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
...ry/lib/electric/telemetry/application_telemetry.ex	0.00%	12 Missing ⚠️
...tric-telemetry/lib/electric/telemetry/processes.ex	94.59%	2 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (0bbb377) and HEAD (5b9694e). Click for more details.

HEAD has 14 uploads less than BASE

Flag BASE (0bbb377) HEAD (5b9694e)

unit-tests 5 1

packages/y-electric 1 0

typescript 5 0

packages/experimental 1 0

packages/start 1 0

packages/react-hooks 1 0

packages/typescript-client 1 0

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #3621       +/-   ##
===========================================
- Coverage   89.20%   67.86%   -21.35%     
===========================================
  Files          25       15       -10     
  Lines        2520      557     -1963     
  Branches      641        0      -641     
===========================================
- Hits         2248      378     -1870     
+ Misses        270      179       -91     
+ Partials        2        0        -2

Flag	Coverage Δ
electric-telemetry	`67.86% <72.00%> (?)`
elixir	`67.86% <72.00%> (?)`
packages/experimental	`?`
packages/react-hooks	`?`
packages/start	`?`
packages/typescript-client	`?`
packages/y-electric	`?`
typescript	`?`
unit-tests	`67.86% <72.00%> (-21.35%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

claude · 2026-03-20T15:51:50Z

.DS_store
.vscode
**/.vitest-temp.json
/dist/
/packages/sync-service/state/
/website/.netlify
/website/.vitepress/cache
/website/.vitepress/dist
/website/public/openapi.html
build
caching/nginx_cache
file.jsonl
integration-tests/lux_logs/
integration-tests/lux/
integration-tests/_storage
json_files
node_modules
shape-data.json
test-dbs
tsconfig.tsbuildinfo
wal
/shapes
.sst
sst-.log
*/deps/
**/junit
**/coverage
response.tmp
.claude
!website/.claude/commands
!website/.claude/skills
_artifacts

claude · 2026-03-20T15:59:31Z

test body via graphql

netlify · 2026-03-23T16:30:03Z

✅ Deploy Preview for electric-next ready!

Name	Link
🔨 Latest commit	`0e3579d`
🔍 Latest deploy log	https://app.netlify.com/projects/electric-next/deploys/69c16a4e4710b90008919f8d
😎 Deploy Preview	https://deploy-preview-3621--electric-next.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

claude · 2026-03-25T14:45:22Z

Claude Code Review

Summary

Iteration 7 adds one new commit (5b9694e24): reverts the Dockerfile change from iteration 6, restoring MIX_OS_DEPS_COMPILE_PARTITION_COUNT=4. The revert was prompted by a reviewer question and confirmed by the author as a deliberate choice ahead of merge. No new issues introduced; the three carry-over suggestions remain open (all optional).

What's Working Well

The revert is the right call given the CI reliability concern the author noted — restoring the variable avoids flaky CI until the root cause is understood. The inline discussion (magnetised → alco) captures the rationale clearly.
All prior fixes (binary memory metrics, stack_id in OTEL resource, configurable ELECTRIC_STACK_TELEMETRY_INIT_DELAY) remain intact and unaffected by the revert.

Issues Found

Critical (Must Fix)

None.

Important (Should Fix)

None.

Suggestions (Nice to Have)

Unused variable in lux test (carry-over from iteration 4)

File: integration-tests/tests/otel-export.lux:27 — fn i -> should be fn _ -> to silence the Elixir compiler warning.

:infinity as arithmetic sentinel (carry-over from iteration 4)

File: packages/electric-telemetry/lib/electric/telemetry/processes.ex:65 — worth a one-liner explaining that integers always compare less-than atoms, so running_total >= :infinity never fires.

Hardcoded 1.000000 in lux assertion (carry-over from iterations 2–4)

File: integration-tests/tests/otel-export.lux:79 — a comment explaining why avg_ref_count is exactly 1.0 would help future readers.

Issue Conformance

No linked issue — acceptable for this observability improvement.

Previous Review Status

✅ Fixed (iter 3): off_heap_strings OOM.
✅ Fixed (iter 3): Binary double-counting comment at processes.ex:46-48.
✅ Fixed (iter 3): Timeout reset in _macros.luxinc.
✅ New (iter 5): ELECTRIC_STACK_TELEMETRY_INIT_DELAY — correctly implemented.
✅ New (iter 5): stack_id in OTEL resource via Map.put_new — correctly implemented.
✅ New (iter 7): MIX_OS_DEPS_COMPILE_PARTITION_COUNT=4 reverted — sensible CI stability choice, rationale documented in inline PR thread.
⏳ Still open (suggestion): fn _ -> at otel-export.lux:27.
⏳ Still open (suggestion): :infinity comment at processes.ex:65.
⏳ Still open (suggestion): 1.000000 comment at otel-export.lux:79.

Review iteration: 7 | 2026-04-14

put_new would silently drop stack_id when the user provides a custom :resource map. Use Map.put_new on the existing resource map instead, so stack_id is always included without overwriting user-provided keys. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The fallback branch when info[:binary] is not a list was returning a map with only proc_mem, but the reduce function expects binary_mem, ref_count_sum, and num_binaries keys too. This would cause a KeyError if the branch were ever reached. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

process.memory.* now only exports total, sorted by aggregated process memory. The new process.bin_memory.* metric sorts process groups by referenced binary memory and exports: total, max_bin_count, avg_bin_count, max_ref_count, avg_ref_count. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Extract shared top_by/3 to deduplicate top_memory and top_bin_memory Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> wip

The new limit includes all process groups whose aggregated memory is at least n bytes. This is useful for binary memory telemetry where a percentage-of-total doesn't make sense due to refc binary double-counting. Also makes take_until_target accept the low cutoff as an argument instead of using the hardcoded @min_group_memory. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…rget When sorting by binary_mem, the low cutoff and accumulator should compare against binary_mem, not proc_mem. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The sort order test now independently verifies each list is sorted by its respective key and asserts the top entries differ. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ETRY_INIT_DELAY The default remains 30s for production. Integration tests set it to 1s so stack metrics are exported quickly enough for test assertions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

CI seems to be having trouble with this configuration option

magnetised

excellent stuff.

magnetised · 2026-04-14T12:33:59Z


 COPY mix.* /builder/electric/
 RUN mix deps.get
-RUN MIX_OS_DEPS_COMPILE_PARTITION_COUNT=4 mix deps.compile


curious, why remove?

For some reason this causes trouble on CI runners occasionally. I can revert it just before merging.

This reverts commit e8c948d.

alco mentioned this pull request Dec 17, 2025

Cache lux on CI #3605

Closed

alco self-assigned this Dec 18, 2025

alco mentioned this pull request Dec 18, 2025

Improve BEAM Runtime Observability for Electric #3237

Closed

alco removed their assignment Mar 2, 2026

erik-the-implementer force-pushed the alco/stack-telemetry-stack-id branch from d419585 to a2322ac Compare March 20, 2026 15:44

alco added the claude label Mar 20, 2026

alco force-pushed the alco/stack-telemetry-stack-id branch 2 times, most recently from 3392db7 to 6eea74f Compare April 13, 2026 15:42

alco and others added 17 commits April 14, 2026 12:09

Verify that stack telemetry includes stack_id in the OTEL resource

9319bd3

Include stack_id in the OTEL resource for stack telemetry

cba11fc

Include binary_mem and average ref count in top process metrics

cda4795

Add application metrics to the otel telemetry integration test

26a1859

Reduce mem usage in the integration test

31e1048

Fix lingering timeout problem in otel_collector's e2e tests

db1659a

Cache lux on CI

2e05371

Include average number of binaries in top memory-heavy process metrics

01429a3

Fix electric-telemetry unit tests

931beea

Add changesets

0faba58

mix format

00de98c

Include max_bin_count and max_ref_count for each process group

82f9df5

Use sort_key for cutoff comparison and running total in take_until_ta…

933d22c

…rget When sorting by binary_mem, the low cutoff and accumulator should compare against binary_mem, not proc_mem. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

alco and others added 3 commits April 14, 2026 12:54

Add tests for top_bin_memory_by_type and {:at_least_bytes, n} limit

a9b3fb8

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Move Processes imports to module level and improve sort order test

22a670f

The sort order test now independently verifies each list is sorted by its respective key and asserts the top entries differ. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

alco force-pushed the alco/stack-telemetry-stack-id branch from 6eea74f to 9db1cc8 Compare April 14, 2026 10:55

alco changed the title ~~Include binary_mem and average ref count in top process metrics~~ Enrich process telemetry with binary memory metrics and fix stack_id in OTEL resource Apr 14, 2026

Remove MIX_OS_DEPS_COMPILE_PARTITION_COUNT from the Dockerfile

e8c948d

CI seems to be having trouble with this configuration option

magnetised approved these changes Apr 14, 2026

View reviewed changes

Revert "Remove MIX_OS_DEPS_COMPILE_PARTITION_COUNT from the Dockerfile"

5b9694e

This reverts commit e8c948d.

alco merged commit e9db22c into main Apr 14, 2026
16 of 17 checks passed

alco deleted the alco/stack-telemetry-stack-id branch April 14, 2026 13:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enrich process telemetry with binary memory metrics and fix stack_id in OTEL resource#3621

Enrich process telemetry with binary memory metrics and fix stack_id in OTEL resource#3621
alco merged 22 commits into
mainfrom
alco/stack-telemetry-stack-id

alco commented Dec 17, 2025 •

edited

Loading

Uh oh!

codecov Bot commented Dec 17, 2025 •

edited

Loading

Uh oh!

claude Bot commented Mar 20, 2026 •

edited

Loading

Uh oh!

claude Bot commented Mar 20, 2026 •

edited

Loading

Uh oh!

netlify Bot commented Mar 23, 2026

Uh oh!

claude Bot commented Mar 25, 2026 •

edited

Loading

Uh oh!

magnetised left a comment

Uh oh!

magnetised Apr 14, 2026

Uh oh!

alco Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

alco commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

codecov Bot commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

claude Bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude Bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify Bot commented Mar 23, 2026

✅ Deploy Preview for electric-next ready!

Uh oh!

claude Bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude Code Review

Summary

What's Working Well

Issues Found

Critical (Must Fix)

Important (Should Fix)

Suggestions (Nice to Have)

Issue Conformance

Previous Review Status

Uh oh!

magnetised left a comment

Choose a reason for hiding this comment

Uh oh!

magnetised Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

alco Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alco commented Dec 17, 2025 •

edited

Loading

codecov Bot commented Dec 17, 2025 •

edited

Loading

claude Bot commented Mar 20, 2026 •

edited

Loading

claude Bot commented Mar 20, 2026 •

edited

Loading

claude Bot commented Mar 25, 2026 •

edited

Loading