feat(sync-service): adaptive poll-based wait_until under StatusMonitor congestion#4376
feat(sync-service): adaptive poll-based wait_until under StatusMonitor congestion#4376alco wants to merge 13 commits into
Conversation
…r congestion When the StatusMonitor's waiter set crosses the congestion threshold, new wait_until/3 callers now poll on PollWait.until/3 against service_status/1 instead of enqueuing into the StatusMonitor mailbox. This bounds StatusMonitor mailbox growth under burst load while keeping the low-latency GenServer.call path for the uncongested common case. The fast path (:active short-circuit, :waiting+:read_only short-circuit, :sleeping returning :conn_sleeping when not blocking) is unchanged. Congestion is consulted only on the not-ready and sleeping+blocking branches via the shared dispatch_wait/3 chokepoint. The ETS status table is switched from :protected to :public so tests can force the congestion flag directly. Reads (congested?/1) already went through ETS outside the GenServer process.
✅ Deploy Preview for electric-next ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4376 +/- ##
==========================================
+ Coverage 55.84% 59.56% +3.72%
==========================================
Files 245 302 +57
Lines 24847 29061 +4214
Branches 6878 7859 +981
==========================================
+ Hits 13876 17311 +3435
- Misses 10957 11733 +776
- Partials 14 17 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Claude Code ReviewSummaryIteration-2 review. Most of the iteration-1 feedback is addressed (dedup with Previous Review StatusAddressed:
Still open from iteration 1:
Issues FoundImportant (Should Fix)1. File: Issue: The previous Impact: When the monitor takes K retries to come up during boot, total wait ≈ Suggested fix: Compute a per-attempt remaining timeout from the PollWait deadline (or pass 2. File: Issue: The PR title/body/changeset describe a single change: route the not-ready branch to Impact: Reviewers landing on the changeset/PR description won't know to look at the non-congested path. If the timeout overshoot in #1 turns out to be a regression, the bisect target will be misleading. Suggested fix: Either split patch 12 into its own PR with its own changeset entry, or update the PR description + changeset to call out the Suggestions (Nice to Have)3. File: The floor was raised 1 → 10 in patch 11, presumably to match Suggested fix: Either document the floor in the moduledoc (it's a hidden surprise for callers configuring small initial intervals), or apply the floor only in the StatusMonitor caller (e.g. pass 4. File: After PollWait times out, this function does another 5. Test-only cast still part of public dispatch. File:
Issue ConformancePR body covers the adaptive-polling work but does not mention the The two related issues (#4266 umbrella, #4371 specific) are referenced in the PR body but not linked formally — adding Review iteration: 2 | 2026-05-21 |
…add congested_threshold spec
…iter + forced flag
Summary
Implements #4371: replace the mailbox-based
StatusMonitor.wait_until/3not-ready path with adaptive per-process polling once the StatusMonitor's waiter set crosses a congestion threshold. Bottleneck 2 of the thundering-herd umbrella (#4266).Electric.PollWaitprimitive: per-process bounded polling with exponential backoff + jitter.StatusMonitor: an ETS-backed boolean that flips totrueonceMapSet.size(state.waiters) >= 100and clears back tofalseonce the set drains to0. Wired into all three transitions: enqueue (handle_call({:wait_until, …})), bulk drain (maybe_reply_to_waiters/1), single-timeout drain (:timeout_waiterhandle_info).wait_until/3dispatch: the fast path is unchanged. The not-ready and sleeping+block_on_conn_sleepingbranches consultcongested?/1.state.waiters.Fast path for already-active stacks remains a single ETS read in the caller process.
Test coverage highlights:
PollWait: ready-immediately, timeout, monotonic backoff growth under per-call opts, jitter never produces zero/negative sleeps,:infinitytimeout terminates only on ready.handle_callenqueue, clears via readiness-drain, clears via:timeout_waiter-drain, stays unset below threshold, gracefully returnsfalsewhen the table doesn't exist.{:ok, :active}on readiness flip, returns{:ok, :read_only}on metadata-ready, returns{:error, _}on timeout, sleeping branch short-circuits before the congestion check, uncongested callers continue to use the GenServer.call path.