Run @parcel/watcher in a self-healing child process#250
Open
ymichael wants to merge 3 commits into
Open
Conversation
Run the recursive filesystem watcher in a forked child process behind the BB_WATCHER_SUBPROCESS flag (off by default, zero behavior change when off). When the child dies or stops answering liveness pings, the parent SIGKILLs it — reclaiming leaked inotify fds and parked threads wholesale, which in-process recovery cannot do — then respawns it and replays subscriptions, so a parcel inotify EINTR crash/hang degrades to a transparent restart instead of taking down the daemon. This mirrors how VS Code isolates the same library (forked watcher process, restart on exit). - Swap RootSubscription's direct @parcel/watcher import for a backend accessor (the single runtime chokepoint); default stays the real in-process watcher. - Add the subprocess backend: parent proxy (subscription registry, liveness ping, respawn + replay, bounded restart budget), child handler, JSON-safe IPC protocol, and fork channel. - Close the restart gap: replayed subscriptions carry a rescan flag so the new child re-emits the root's current entries and callers reconcile to on-disk state. - Emit the child as its own daemon bundle (bb-parcel-watcher-child.mjs); @parcel/watcher stays an external runtime require. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Remove the BB_WATCHER_SUBPROCESS flag. The daemon now installs the subprocess-isolated watcher backend at startup, so it is the actual behavior rather than an opt-in. Unit tests inject a fake watcher and stay on the in-process backend, so they can still mock parcel directly. Critically, recover from the bug instead of only containing it: a watch-error from the child (parcel's shared inotify backend dying on EINTR) now recycles the whole child. The SIGKILL lets the OS reclaim the leaked inotify fds and parked threads, and the respawn re-arms every watch on a fresh backend, so watches self-heal instead of going permanently dead. Respawn/recycle events are logged through the daemon's pino logger. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A multi-agent QA pass on this branch surfaced several merge-blocking defects; this commit fixes the critical + high-severity ones (all with regression tests). - PACKAGING (critical): the published bb-app `files` whitelist and the startup artifact check both omitted bb-parcel-watcher-child.mjs, so an `npx bb-app` install would throw on the first subscription and have permanently dead file watching. Add it to both. - SHUTDOWN (high): the proxy was never disposed and its ping interval was not unref'd, so a graceful daemon shutdown hung with the child orphaned — the same hang/leak class this change exists to prevent. Dispose the backend in shutdownRuntimes and unref the ping/respawn timers. - REPLAY vs pathExists (high): a transient missing path during respawn produced a child subscribe-failed that RootSubscription classified as TERMINAL (no retry), permanently killing the watch. Surface replay subscribe failures as the recoverable rescan signal so RootSubscription re-establishes via its existence-gated, backed-off retry. - RESTART LOOP (high): a permanent give-up after a fixed restart budget plus a no-backoff respawn loop could kill all watches for the daemon's lifetime. Replace with capped exponential backoff that resets when a child proves healthy, so the watcher always recovers and never permanently gives up. - DOUBLE-SUBSCRIBE (high): a subscribe landing in the spawn->ready window was sent eagerly AND replayed on ready, orphaning one parcel watch (leaked inotify fd) and double-delivering events. Gate the eager send on childReady so replay-on-ready is the single source. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The host daemon's recursive filesystem watcher (
@parcel/watcher) throws on a benignpoll()EINTR in its inotify backend instead of retrying. One throw kills the shared native backend, leaks its inotify fds + threads, and can hang the daemon (requiring a manual restart); watches also die silently. There is no upstream fix to wait for: 2.5.6 is latest, issue #141 is open with no PR, and the bug is unchanged on the default branch.VS Code depends on the same library/version for recursive watching and runs it in a separate forked process it restarts on exit. This PR brings that pattern to bb.
What
Run
@parcel/watcherin a forked child process — installed by the daemon at startup (no flag). When the child dies, stops answering liveness pings, or reports a backend error (EINTR), the parentSIGKILLs it (the OS reclaims the leaked fds/threads wholesale) and respawns it, replaying subscriptions. The bug self-heals instead of taking down the daemon or leaving watches dead.RootSubscription(the only runtime importer of parcel) calls a backend accessor; the daemon installs the subprocess backend, tests stay in-process and mock parcel directly. Everything aboveRootSubscriptionis unchanged.subscribe/unsubscribeover IPC, holds the subscription registry, pings for liveness, and recovers on death/wedge/backend-error.RootSubscriptionre-establishes via its existence-gated retry.bb-parcel-watcher-child.mjs), in thefileswhitelist + startup artifact check; it exits itself when the parent IPC disconnects (no orphan). On shutdown the daemon disposes the proxy (SIGKILL child + unref timers) so the event loop drains.QA
Two adversarial multi-agent review passes (decompose → independent skeptics refute each finding → dynamic probes). The first found the happy path solid but surfaced 5 critical/high merge-blockers, since fixed in this PR (each with a regression test):
bb-apptarball omitted the child bundle (files+ artifact check) → permanently dead watching fornpxusers.unref'd → daemon hung on graceful shutdown with the child orphaned.pathExistsgate → a transient missing path during respawn became a permanently dead watch (terminal, no retry).A second focused re-QA of the fixes found no new defects (overall risk: low).
Testing
@bb/host-watchertests (9 proxy tests incl. crash/EINTR-recycle/ping-wedge/backoff/recoverable-replay/no-double-subscribe), 355@bb/host-daemon, 37bb-app; typecheck + full daemon build +check-bundlesgreen.SIGKILLchild → respawn (new pid) → events resume with no caller action; anddispose()reaps the child and the event loop drains on its own (shutdown no longer hangs).Deferred follow-ups (medium/low, not blocking)
refs/heads/*) and gap-window deletions aren't reconciled until the next real fs event (short, self-correcting). A precise fix is parcel'sgetEventsSincesnapshot API./proc; point it at the child pid.🤖 Generated with Claude Code