Skip to content

fix(inputs): close libuv loop after run() returns (fixes flaky macOS SIGBUS)#789

Open
leoparente wants to merge 3 commits into
developfrom
fix/macos-uvloop-close-crash
Open

fix(inputs): close libuv loop after run() returns (fixes flaky macOS SIGBUS)#789
leoparente wants to merge 3 commits into
developfrom
fix/macos-uvloop-close-crash

Conversation

@leoparente

Copy link
Copy Markdown
Contributor

Problem

The macOS CI job (unit-tests-mac) flakily fails the Test step with unit-tests-input-dnstap and/or unit-tests-input-flow crashing with a Bus error / SIGSEGV. It is non-deterministic on the runner and became much more frequent on the macOS 26 image.

Examples:

Catch2 randomizes test-case order, so the reported crashing case varies between runs — a sign the fault is in shared lifecycle code, not one test's logic.

Root cause

Reproduced locally on macOS 26 (30/30 crashes) and captured under lldb:

  • io-loop thread: EXC_BAD_ACCESS (address=0x10) in uv__io_polluv_run
  • main thread: blocked in DnstapInputStream::stop()std::thread::join()

Each libuv-based input's async_event stop callback runs on the loop thread, inside uv_run, and called _io_loop->close() (uv_loop_close). uv_loop_close() is only valid after uv_run() has returned — calling it from within a running callback frees the loop's internal poll structures, so when the callback returns uv__io_poll keeps iterating freed memory and the process dies. It is timing-dependent, hence the flakiness.

Fix

Keep _io_loop->stop() in the async callback (so uv_run() returns promptly, even with active client handles), but defer _io_loop->close() to the io thread, after run() returns — never inside the running loop. The same anti-pattern existed in all four libuv-based inputs and is fixed in each:

  • src/inputs/dnstap/DnstapInputStream.cpp (tcp + unix socket paths)
  • src/inputs/flow/FlowInputStream.cpp
  • src/inputs/netprobe/NetProbeInputStream.cpp
  • src/inputs/netprobe/PingProbe.cpp

(netprobe carried the same latent bug but wasn't crashing yet; fixed for consistency.)

Verification

Rebuilt and stress-ran the affected binaries from the ctest working directory on macOS 26:

binary before after
unit-tests-input-dnstap 30/30 crash (SIGBUS) 20/20 pass, 0 crashes
unit-tests-input-flow crashes 30/30 pass, 0 crashes
unit-tests-input-netprobe (latent) 20/20 pass, 0 crashes

No behavior change beyond the lifecycle ordering; the streams still stop cleanly and join their io thread.

🤖 Generated with Claude Code

…nc callback

The async stop callback (run on the loop thread inside uv_run) called
_io_loop->close() (uv_loop_close). Closing the loop while uv_run is still on
the stack frees structures that uv__io_poll keeps iterating, so when the
callback returns uv__io_poll dereferences freed memory and the process dies
with SIGSEGV/SIGBUS. It is timing-dependent, which is why CI saw it as a flaky
macOS crash in unit-tests-input-dnstap / unit-tests-input-flow (deterministic
on the macOS 26 runner).

Stop the loop in the callback (so uv_run returns) but defer uv_loop_close() to
the io thread, after run() returns. Applied to all four libuv-based inputs
(dnstap, flow, netprobe receiver + stream).

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a libuv lifecycle bug in several uvw/libuv-based inputs by ensuring the event loop is only closed after run() returns, avoiding loop closure from within a running libuv callback (a known cause of flaky macOS SIGBUS/SIGSEGV crashes).

Changes:

  • Remove *_io_loop->close() calls from async_event stop callbacks (while uv_run() is still on the stack).
  • Add *_io_loop->close() on the IO thread immediately after *_io_loop->run() returns, across all affected inputs.
  • Add inline comments documenting the libuv constraint and why closure must be deferred.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/inputs/netprobe/PingProbe.cpp Defers loop close to IO thread after run() returns; stop callback only stops loop.
src/inputs/netprobe/NetProbeInputStream.cpp Defers loop close to IO thread after run() returns; stop callback stops/closes timer then stops loop.
src/inputs/flow/FlowInputStream.cpp Defers loop close to IO thread after run() returns; stop callback stops/closes timer & UDP handle then stops loop.
src/inputs/dnstap/DnstapInputStream.cpp Defers loop close to IO thread after run() returns for both TCP and unix-socket loops; stop callbacks stop/close server + timer then stop loop.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/inputs/dnstap/DnstapInputStream.cpp
Comment thread src/inputs/dnstap/DnstapInputStream.cpp
@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown

LCOV of commit 574f846 during Debug Builds #179

  lines......: 82.3% (14293 of 17362 lines)
  functions..: 73.0% (1448 of 1984 functions)
  branches...: no data found

Files changed coverage rate: n/a

Full coverage report

…loop is quiescent

Follow-up to the loop-close fix, addressing review feedback: when stop() runs
while a dnstap client is connected, the server/timer were closed but the
accepted client handles in _tcp_sessions/_unix_sessions were not, so the loop
was not quiescent and uv_loop_close() would return EBUSY (and uvw frees the
uv_loop_t regardless). Add FrameSessionData::close_handle() and close every
connected client on the loop thread in the async stop callback before stopping
the loop. Also close the PingReceiver heartbeat timer, which was never closed.

dnstap and flow loops are now provably quiescent at close() time. (NetProbe
probe handles are still stopped from the control thread via probe->stop(); a
loop-thread shutdown for those is a separate pre-existing concern.)
@leoparente leoparente requested a review from Copilot June 26, 2026 12:50
@leoparente

Copy link
Copy Markdown
Contributor Author

@codex review

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Comment thread src/inputs/netprobe/NetProbeInputStream.cpp Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8d0e4005f4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/inputs/netprobe/NetProbeInputStream.cpp
…op is quiescent

Address review feedback on the netprobe shutdown path. probe->stop() was
called from the control thread and did not close every uvw handle: TcpProbe
left _client and _interval_timer open, and PingProbe left _internal_timer
open. Those handles stayed active in the loop's handle queue, so when the io
thread reached _io_loop->close() uv_loop_close() returned UV_EBUSY (and uvw
frees the uv_loop_t regardless).

Make each probe's stop() close all handles it owns (guarded against
double-close), and invoke probe->stop() from inside the async stop callback on
the loop thread instead of from the control thread. This both leaves the loop
quiescent for close() and fixes the pre-existing cross-thread uv_close on probe
handles. All three input loops (dnstap, flow, netprobe) are now quiescent at
close() time.
@leoparente leoparente requested a review from Copilot June 26, 2026 14:10
@leoparente

Copy link
Copy Markdown
Contributor Author

@codex review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants