Skip to content

wall profiler: track live PCPs via an intrusive list#341

Open
szegedi wants to merge 1 commit into
mainfrom
szegedi/no-live-context-tracking
Open

wall profiler: track live PCPs via an intrusive list#341
szegedi wants to merge 1 commit into
mainfrom
szegedi/no-live-context-tracking

Conversation

@szegedi
Copy link
Copy Markdown

@szegedi szegedi commented May 27, 2026

Summary

Replaces the unordered_set<PersistentContextPtr*> that the wall profiler uses to track live PersistentContextPtr (PCP) instances with an intrusive doubly-linked list threaded through the PCPs themselves. Same shutdown semantics — ~WallProfiler still deletes any PCP that V8 hasn't reclaimed yet — but with materially less per-PCP overhead.

Stacked on top of #322 (thread-local active profiler) and depends on the thread-local already being in place.

Why

liveContextPtrs_ exists because, at process exit, V8's CPED map can still hold PCPs that V8's late-shutdown finalizers haven't reached yet. LSAN's leak check fires before those finalizers, so without an explicit cleanup pass each surviving PCP gets reported as a leak (we verified this on node:22-bookworm with --experimental-async-context-frame: a baseline build with this PR's liveContextPtrs_ removed and no replacement leaks ~288 bytes across ~8 PCP allocations per run). The set is a fine tool for the cleanup job, but it carries hash-table overhead unrelated to that job.

What changed

  • New: each PersistentContextPtr carries a WallProfiler* const profiler_, a PersistentContextPtr** pprev_, and a PersistentContextPtr* next_. The two list pointers form a standard intrusive doubly-linked list rooted at WallProfiler::liveContextPtrHead_.
  • ~PersistentContextPtr unlinks itself when pprev_ != nullptr. That predicate also serves as proof that profiler_ is still valid, so the destructor dereferences it directly to decrement the live counter — no thread-local lookup, no ID compare.
  • ~WallProfiler walks the list, nulls each PCP's pprev_ (so the soon-to-fire destructor short-circuits the unlink instead of poking at the head pointer in dying memory), then deletes each PCP. ~node::ObjectWrap clears the V8 weak callback during that delete, so the dangling internal-field pointer left behind in the wrap object stays inert even if V8 GCs the wrap later.
  • GetMetrics reads a std::atomic<size_t> liveContextPtrCount_ updated alongside the list link/unlink, replacing the old liveContextPtrs_.size(). usedAsyncContextCount / totalAsyncContextCount keep their existing semantics (both report the live count, just like before).

Memory savings

Per live PCP, the tracking cost goes from:

Before (unordered_set) After (intrusive list)
Inside the PCP object 8 B (back-pointer to set) 8 B (back-pointer) + 16 B (two list pointers) = 24 B
Outside the PCP object ~40 B (hash node + bucket slot in the set) 0 B
Total ~48 B/PCP 24 B/PCP

A net saving of ~24 bytes per live async context. The set's per-bucket fixed overhead also goes away, though that's negligible compared to the per-element savings on real workloads.

The list operations are O(1) (insert-at-head, unlink-by-pointer), matching the previous set's amortized characteristics.

Verification

Ran the project's Linux ASAN+LSAN suite (npm run test:js-asan) in a node:22-bookworm container with --node-option=experimental-async-context-frame so the CPED-mode tests actually execute (without that flag the tests gate themselves out and the code path under test never runs in CI). Compared baseline (this PR reverted) vs patched:

Tests LSAN summaries
Baseline 95 passing / 1 pending 40528 B / 75936 B leaked — all V8/Node internal
Patched 95 passing / 1 pending 40528 B / 79648 B leaked — all V8/Node internal

Zero references to PersistentContextPtr, WallProfiler, dd_pprof.node, or wall.cc in the patched LSAN output. The remaining leaks are pure V8/Node-internal cleanup-on-exit noise, present in identical shape on the baseline (small magnitude differences are noise — V8's final GC state at exit is non-deterministic).

Test plan

  • npm run build (macOS Release): clean
  • npm run test:js (macOS): 96 passing
  • npm run test:js-asan (Linux container, Node 22 + --experimental-async-context-frame): 95 passing, 1 pending, no PCP/WallProfiler/dd_pprof leak reports
  • CI passes

Jira: PROF-14791

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 27, 2026

Overall package size

Self size: 2.08 MB
Deduped: 2.44 MB
No deduping: 2.44 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | source-map | 0.7.6 | 185.63 kB | 185.63 kB | | pprof-format | 2.2.1 | 163.06 kB | 163.06 kB | | node-gyp-build | 4.8.4 | 13.86 kB | 13.86 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

@szegedi szegedi force-pushed the szegedi/no-live-context-tracking branch from 4c8ad2f to 05cfbdf Compare May 28, 2026 11:41
@datadog-prod-us1-3
Copy link
Copy Markdown

datadog-prod-us1-3 Bot commented May 28, 2026

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 1 Pipeline job failed

Pull Request Labels | label   View in Datadog   GitHub Actions

🛟 This job is unlikely to succeed on retry. Please review your pipeline configuration. Label error. Requires exactly 1 of: semver-patch, semver-minor, semver-major.

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 867c9bc | Docs | Datadog PR Page | Give us feedback!

@szegedi szegedi force-pushed the szegedi/no-live-context-tracking branch 2 times, most recently from 5cf6964 to 466e69c Compare May 28, 2026 12:48
@szegedi szegedi force-pushed the szegedi/thread-local-profiler branch 2 times, most recently from 2a0942a to 4910ad6 Compare May 28, 2026 13:28
@szegedi szegedi force-pushed the szegedi/no-live-context-tracking branch from a498ae9 to 2173255 Compare May 28, 2026 13:34
Base automatically changed from szegedi/thread-local-profiler to main May 28, 2026 14:43
PersistentContextPtr instances were registered in an unordered_set on
the WallProfiler so that ~WallProfiler could delete any PCPs that V8
had not GC'd yet — otherwise they show up as leaks under LSAN at
process exit, since LSAN's check runs before V8's late-shutdown
finalizers. The hash-set was heavier than that single use case
deserved: every live PCP cost a back-pointer into the set (8 bytes)
plus a separate hash node + bucket overhead (~40 bytes) external to
the object.

Replace it with an intrusive doubly-linked list threaded through the
PCPs themselves. Each PCP carries:

  WallProfiler* const profiler_      // 8 bytes
  PersistentContextPtr** pprev_      // 8 bytes
  PersistentContextPtr*  next_       // 8 bytes

~PCP unlinks itself when pprev_ != nullptr — the "still on a live
profiler's list" predicate. ~WallProfiler walks the list and nulls
each pprev_ before delete, so the destructor that fires as part of
the walk becomes a no-op rather than poking at the dying profiler's
head pointer. Outside of teardown, pprev_ != nullptr is also the
proof that the back-pointer is still valid, so ~PCP can dereference
profiler_ to decrement its counter without any thread-local or ID
lookup.

PCP-tracking memory drops from ~48 bytes/PCP (8-byte back-pointer
into set + ~40 bytes of hash node / bucket) to 24 bytes/PCP (back-
pointer + two list pointers), a saving of ~24 bytes per live async
context.

Replace GetMetrics's liveContextPtrs_.size() read with an atomic
counter incremented/decremented alongside the list link/unlink so the
metric path doesn't depend on the container. usedAsyncContextCount /
totalAsyncContextCount keep their existing semantics (both report the
live count, as before).
@szegedi szegedi force-pushed the szegedi/no-live-context-tracking branch from 2173255 to 867c9bc Compare May 28, 2026 14:48
@nsavoire
Copy link
Copy Markdown

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Already looking forward to the next diff.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@szegedi szegedi added the semver-patch Bug or security fixes, mainly label May 29, 2026
// decremented in its destructor when it unlinks from the list (i.e., while
// the profiler is still alive). PCPs cleaned up by ~WallProfiler's walk
// skip the decrement — the counter is going away with the profiler anyway.
std::atomic<size_t> liveContextPtrCount_{0};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that liveContextPtrCount_ is only accessed by isolate thread and not from a signal handler, thus it could be a plain size_t.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only worry is GC callbacks, but I'll try to confirm from source code that they always run on the isolate thread (I know that some V8 GC stages can be multithreaded.)

// ~PCP. pprev_ == nullptr is the "detached" sentinel — set by
// ~WallProfiler before deleting us, so our unlink becomes a no-op and we
// don't poke at the dying profiler's memory.
PersistentContextPtr** pprev_ = nullptr;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the claim that pprev (double pointer) enables O(1) unlink is dubious. Maybe better explain the benefits vs using a single pointer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

semver-patch Bug or security fixes, mainly

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants