Skip to content

[SLOP(claude-opus-4-8-high)] perf(universaldb): batch leader apply and fold follower commit round-trips#5342

Open
MasterPtato wants to merge 1 commit into
stack/slopfix-docs-self-hosting-promote-postgres-from-experimental-to-recommended-oss-multi-node-backend-zsmysqukfrom
stack/slop-claude-opus-4-8-high-perf-universaldb-batch-leader-apply-and-fold-follower-commit-round-trips-vtrtqpyp
Open

[SLOP(claude-opus-4-8-high)] perf(universaldb): batch leader apply and fold follower commit round-trips#5342
MasterPtato wants to merge 1 commit into
stack/slopfix-docs-self-hosting-promote-postgres-from-experimental-to-recommended-oss-multi-node-backend-zsmysqukfrom
stack/slop-claude-opus-4-8-high-perf-universaldb-batch-leader-apply-and-fold-follower-commit-round-trips-vtrtqpyp

Conversation

@MasterPtato

@MasterPtato MasterPtato commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

No description provided.

@MasterPtato

MasterPtato commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

@claude

claude Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

PR Review: perf(universaldb): batch leader apply and fold follower commit round-trips

This PR eliminates per-row Postgres round-trips in the leader's drain loop via three changes: (1) a new in-memory Overlay that folds a whole batch of winners before writing the merged write-set in O(1) statements, (2) splitting the lead loop into sibling renew_loop/drain_loop tasks so lease renewal can never be starved by a slow drain, and (3) a fast reply path where the leader encodes the commit outcome directly in the NOTIFY payload so followers resolve without a status SELECT. The fold logic and test coverage are thorough and the correctness reasoning is sound.

Two issues survived verification.


1. Rolling-update protocol break — 250 ms latency per commit during upgrade window

File: engine/packages/universaldb/src/driver/postgres/commit.rsparse_reply (≈ line 258) and engine/packages/universaldb/src/driver/postgres/resolver/mod.rsnotify_after_commit (≈ line 452)

The old leader sent just r.id.to_string() (e.g. "123") as the NOTIFY payload. The new parse_reply expects "<id>:committed:<version>" or "<id>:conflict". When an old-code leader replies to a new-code follower, parse_reply splits on :, gets a single segment, calls parts.next() for the verb and receives None — so the reply is silently treated as "not for me" and discarded. The follower then rides the 250 ms backstop poll for every commit for the entire upgrade window. There is no version negotiation or backward-compat branch.

Failure scenario: Any rolling deploy where some nodes are on old code and some are on new code causes every follower commit to add up to RESULT_POLL_INTERVAL = 250 ms of latency. Commits still succeed via the backstop poll (correctness is preserved), but under load this can noticeably spike p50/p99 tail latencies during the upgrade.

Suggested fix: In parse_reply, if the second segment is absent (no colon), fall back to matching the whole payload as a bare integer and treat it as a committed reply (the old semantics). This is a one-liner backward-compat guard that can be removed after all nodes are on the new code:

match parts.next() {
    Some("committed") => Some(ReplyOutcome::Committed),
    Some("conflict")  => Some(ReplyOutcome::Conflict),
    None              => Some(ReplyOutcome::Committed), // old-leader bare-id payload
    _                 => None,
}

Or document the expected upgrade path (drain queues before rolling).


2. RUST_LOG=universaldb::driver::postgres=debug committed to production template

Files: self-host/compose/template/src/docker-compose.ts (≈ line 295), self-host/dev-multinode/docker-compose.yml (lines 182, 223, 262)

The engine's tracing init (build_filter_from_spec) pre-bakes a global info directive, so this directive does not silence other modules — other logs still flow at info. However, it permanently enables debug-level universaldb output in every production deployment generated from this template. Debug logs include per-batch "udb leader processed commit batch" emissions and per-renewal "udb leader renewed lease" lines which will fire on every batch and every RENEW_INTERVAL tick indefinitely.

This looks like a temporary debugging aid that was not meant to be committed to the production template. The dev-multinode compose is fine for development, but the template change propagates to production configs on the next pnpm start regeneration.


Minor / low-priority notes (no action required)

  • drain_loop first-tick behaviour: poll_interval is not pre-consumed (unlike renew_loop which explicitly calls interval.tick().await before its loop). On each busy-to-idle transition, the select! fires the poll interval immediately, causing one extra no-op drain_batch call. This is one cheap empty-queue SELECT + rollback per idle transition — not a correctness issue, just a minor inconsistency with renew_loop's explicit tick-consume pattern.

  • Stale read_seq_high doc comment in failover.rs: Lines 54–55 say "A freshly elected leader must continue from at least this value, never regress below it." The PR intentionally breaks this invariant (and explains at length why it was wrong). No test assertion enforces it, so tests still pass, but the doc comment now contradicts the new recovery_floor logic and could mislead future authors.

@NathanFlurry NathanFlurry changed the title [SLOP(claude-opus-4-8-high)] perf(universaldb): batch leader apply and fold follower commit round-trips perf(universaldb): batch leader apply and fold follower commit round-trips Jun 26, 2026
@MasterPtato MasterPtato force-pushed the stack/slop-claude-opus-4-8-high-perf-universaldb-batch-leader-apply-and-fold-follower-commit-round-trips-vtrtqpyp branch from 234bfb3 to 099acee Compare June 29, 2026 23:18
@MasterPtato MasterPtato changed the title perf(universaldb): batch leader apply and fold follower commit round-trips [SLOP(claude-opus-4-8-high)] perf(universaldb): batch leader apply and fold follower commit round-trips Jun 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant