wall profiler: track live PCPs via an intrusive list#341
Conversation
Overall package sizeSelf size: 2.08 MB Dependency sizes| name | version | self size | total size | |------|---------|-----------|------------| | source-map | 0.7.6 | 185.63 kB | 185.63 kB | | pprof-format | 2.2.1 | 163.06 kB | 163.06 kB | | node-gyp-build | 4.8.4 | 13.86 kB | 13.86 kB |🤖 This report was automatically generated by heaviest-objects-in-the-universe |
4c8ad2f to
05cfbdf
Compare
|
5cf6964 to
466e69c
Compare
2a0942a to
4910ad6
Compare
a498ae9 to
2173255
Compare
PersistentContextPtr instances were registered in an unordered_set on the WallProfiler so that ~WallProfiler could delete any PCPs that V8 had not GC'd yet — otherwise they show up as leaks under LSAN at process exit, since LSAN's check runs before V8's late-shutdown finalizers. The hash-set was heavier than that single use case deserved: every live PCP cost a back-pointer into the set (8 bytes) plus a separate hash node + bucket overhead (~40 bytes) external to the object. Replace it with an intrusive doubly-linked list threaded through the PCPs themselves. Each PCP carries: WallProfiler* const profiler_ // 8 bytes PersistentContextPtr** pprev_ // 8 bytes PersistentContextPtr* next_ // 8 bytes ~PCP unlinks itself when pprev_ != nullptr — the "still on a live profiler's list" predicate. ~WallProfiler walks the list and nulls each pprev_ before delete, so the destructor that fires as part of the walk becomes a no-op rather than poking at the dying profiler's head pointer. Outside of teardown, pprev_ != nullptr is also the proof that the back-pointer is still valid, so ~PCP can dereference profiler_ to decrement its counter without any thread-local or ID lookup. PCP-tracking memory drops from ~48 bytes/PCP (8-byte back-pointer into set + ~40 bytes of hash node / bucket) to 24 bytes/PCP (back- pointer + two list pointers), a saving of ~24 bytes per live async context. Replace GetMetrics's liveContextPtrs_.size() read with an atomic counter incremented/decremented alongside the list link/unlink so the metric path doesn't depend on the container. usedAsyncContextCount / totalAsyncContextCount keep their existing semantics (both report the live count, as before).
2173255 to
867c9bc
Compare
|
@codex review |
|
Codex Review: Didn't find any major issues. Already looking forward to the next diff. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
| // decremented in its destructor when it unlinks from the list (i.e., while | ||
| // the profiler is still alive). PCPs cleaned up by ~WallProfiler's walk | ||
| // skip the decrement — the counter is going away with the profiler anyway. | ||
| std::atomic<size_t> liveContextPtrCount_{0}; |
There was a problem hiding this comment.
I think that liveContextPtrCount_ is only accessed by isolate thread and not from a signal handler, thus it could be a plain size_t.
There was a problem hiding this comment.
My only worry is GC callbacks, but I'll try to confirm from source code that they always run on the isolate thread (I know that some V8 GC stages can be multithreaded.)
| // ~PCP. pprev_ == nullptr is the "detached" sentinel — set by | ||
| // ~WallProfiler before deleting us, so our unlink becomes a no-op and we | ||
| // don't poke at the dying profiler's memory. | ||
| PersistentContextPtr** pprev_ = nullptr; |
There was a problem hiding this comment.
nit: the claim that pprev (double pointer) enables O(1) unlink is dubious. Maybe better explain the benefits vs using a single pointer.
Summary
Replaces the
unordered_set<PersistentContextPtr*>that the wall profiler uses to track livePersistentContextPtr(PCP) instances with an intrusive doubly-linked list threaded through the PCPs themselves. Same shutdown semantics —~WallProfilerstill deletes any PCP that V8 hasn't reclaimed yet — but with materially less per-PCP overhead.Stacked on top of #322 (thread-local active profiler) and depends on the thread-local already being in place.
Why
liveContextPtrs_exists because, at process exit, V8's CPED map can still hold PCPs that V8's late-shutdown finalizers haven't reached yet. LSAN's leak check fires before those finalizers, so without an explicit cleanup pass each surviving PCP gets reported as a leak (we verified this onnode:22-bookwormwith--experimental-async-context-frame: a baseline build with this PR'sliveContextPtrs_removed and no replacement leaks ~288 bytes across ~8 PCP allocations per run). The set is a fine tool for the cleanup job, but it carries hash-table overhead unrelated to that job.What changed
PersistentContextPtrcarries aWallProfiler* const profiler_, aPersistentContextPtr** pprev_, and aPersistentContextPtr* next_. The two list pointers form a standard intrusive doubly-linked list rooted atWallProfiler::liveContextPtrHead_.~PersistentContextPtrunlinks itself whenpprev_ != nullptr. That predicate also serves as proof thatprofiler_is still valid, so the destructor dereferences it directly to decrement the live counter — no thread-local lookup, no ID compare.~WallProfilerwalks the list, nulls each PCP'spprev_(so the soon-to-fire destructor short-circuits the unlink instead of poking at the head pointer in dying memory), thendeletes each PCP.~node::ObjectWrapclears the V8 weak callback during that delete, so the dangling internal-field pointer left behind in the wrap object stays inert even if V8 GCs the wrap later.GetMetricsreads astd::atomic<size_t> liveContextPtrCount_updated alongside the list link/unlink, replacing the oldliveContextPtrs_.size().usedAsyncContextCount/totalAsyncContextCountkeep their existing semantics (both report the live count, just like before).Memory savings
Per live PCP, the tracking cost goes from:
unordered_set)A net saving of ~24 bytes per live async context. The set's per-bucket fixed overhead also goes away, though that's negligible compared to the per-element savings on real workloads.
The list operations are O(1) (insert-at-head, unlink-by-pointer), matching the previous set's amortized characteristics.
Verification
Ran the project's Linux ASAN+LSAN suite (
npm run test:js-asan) in anode:22-bookwormcontainer with--node-option=experimental-async-context-frameso the CPED-mode tests actually execute (without that flag the tests gate themselves out and the code path under test never runs in CI). Compared baseline (this PR reverted) vs patched:Zero references to
PersistentContextPtr,WallProfiler,dd_pprof.node, orwall.ccin the patched LSAN output. The remaining leaks are pure V8/Node-internal cleanup-on-exit noise, present in identical shape on the baseline (small magnitude differences are noise — V8's final GC state at exit is non-deterministic).Test plan
npm run build(macOS Release): cleannpm run test:js(macOS): 96 passingnpm run test:js-asan(Linux container, Node 22 +--experimental-async-context-frame): 95 passing, 1 pending, no PCP/WallProfiler/dd_pprof leak reportsJira: PROF-14791