ACE-186: mtree init publication ordering + exclude metadata from spock by danolivo · Pull Request #119 · pgEdge/ace

danolivo · 2026-05-15T12:09:28Z

Summary

Fixes the SQLSTATE 42704 - publication "ace_mtree_pub" does not exist error during ace mtree table-diff, plus two BuildMtree bugs surfaced while testing.

Root causes

Init ordering. MtreeInit was creating the replication slot before the publication was committed to WAL, so the slot's consistent_point preceded the publication. pgoutput's get_publication_oid then failed during change-callback replay. This is a pure pgoutput correctness issue that applies to any logical-replication setup, spock or vanilla.
Spock DDL replication. ACE's CREATE TABLE / CREATE PUBLICATION / CREATE OR REPLACE FUNCTION statements on its private pgedge_ace schema were being DDL-replicated to peer nodes. That produced a cascade of cross-node interactions:

ace_cdc_metadata got auto-added to peer nodes' default repsets via spock's CREATE TABLE event trigger,
leaking per-node LSNs cross-node and clobbering start_lsn.
CREATE OR REPLACE FUNCTION pgedge_ace.bytea_xor raced on pg_proc when n1's build replicated to n2 while
n2's build ran the same statement (SQLSTATE XX000 "tuple concurrently updated").
n2's SetupPublication DROP+CREATE replicated back to n1 and landed in n1's pgoutput stream mid-replay,
producing the original 42704.

Fix shape

Init-ordering split (commit 1): Split MtreeInit into three phases per node — Phase A commits schema + metadata table + publication and captures the publication's commit LSN; Phase B creates the slot (consistent point is now strictly after the publication); Phase C persists slot/start_lsn/pub_commit_lsn. A new pub_commit_lsn column (additive, ALTER TABLE ... ADD IF NOT EXISTS) lets processReplicationStream refuse to start a stream whose start_lsn predates it, surfacing an actionable error rather than the cryptic 42704. Empty pub_commit_lsn (legacy metadata rows) skips the check with a warning.

Spock DDL suppression (commit 5): Set spock.enable_ddl_replication = off and spock.include_ddl_repset = off as connection startup options on every ACE pool in applyRuntimeParams. ACE issues DDL exclusively against pgedge_ace and the ace_mtree_pub publication — never against user objects — so suppressing replication of its own DDL is correct unconditionally. Both GUCs are PGC_USERSET in spock so connection-level options override cluster defaults; both have dotted names so vanilla-PG instances without spock accept them as placeholder custom vars with no effect (dual-mode safe).

acquired conn to be released — but the tx's conn won't be released until the deferred tx.Rollback runs, and that defer cannot run until the function returns, which is blocked by pool.Close(). Classic defer-vs-explicit-close deadlock. Any failure inside BuildMtree's per-node body hung the process and swallowed the error. Wrapped the per-iteration body in a closure so defers are scoped per-iteration and fire in LIFO order (tx.Rollback first, then pool.Close).

BuildMtree redundant DDL (commit 4): createMtreeObjects was calling CreateSchema and CreateXORFunction at the start of every node's build, leftover from the original BuildMtree (July 2025, predating MtreeInit). MtreeInit's Phase A creates both, and BuildMtree implicitly requires init to have run. The redundancy was harmless on vanilla PG but racy on spock — the redundant CREATE OR REPLACE FUNCTION was the specific call hitting the XX000 race above. Removed both redundant calls. CREATE TABLE IF NOT EXISTS for ace_mtree_metadata stays — IF NOT EXISTS short-circuits without touching pg_class.

Regression coverage

tests/integration/cdc_init_ordering_test.go:

TestMtreeInitSlotIsAfterPublication — asserts the slot consistent point ≥ publication commit LSN on
every node after init.
TestProcessReplicationStreamRejectsStaleStartLSN — corrupts start_lsn to a value below pub_commit_lsn
and asserts UpdateFromCDC returns the new guard's actionable error.

Both run against the existing spock cluster fixture; no fixture changes required.

Upgrade notes

Existing broken installs are not auto-repaired. Users hitting the 42704 must run:

ace mtree teardown -d '<schema.table>'
ace mtree init -d
ace mtree build -d '<schema.table>'

If teardown itself fails, drop manually on every node:

DROP TABLE IF EXISTS pgedge_ace.ace_mtree__

CASCADE;
DROP TABLE IF EXISTS pgedge_ace.ace_cdc_metadata CASCADE;
DROP TABLE IF EXISTS pgedge_ace.ace_mtree_metadata CASCADE;
DROP PUBLICATION IF EXISTS ace_mtree_pub;
SELECT pg_drop_replication_slot('ace_mtree_slot') WHERE EXISTS (SELECT 1 FROM pg_replication_slots WHERE
slot_name = 'ace_mtree_slot');

--skip-cdc remains a valid interim workaround for pre-upgrade clusters.

coderabbitai · 2026-05-15T12:09:40Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds pub_commit_lsn to CDC metadata, captures publication commit LSN during node init, enforces a publication-commit ordering guard at replication stream startup, isolates metadata writes from Spock repsets via session-scoped repair mode and connection GUCs, and adds integration tests validating ordering and guards.

Changes

CDC Publication-Commit Ordering and Isolation

Layer / File(s)	Summary
SQL Templates and Query Functions for Publication-Commit Tracking `db/queries/templates.go`, `db/queries/queries.go`	New SQL templates and query functions add `pub_commit_lsn` support, `InitCDCMetadata` upsert with `pub_commit_lsn`, `CurrentWalInsertLSN`, and update `getCDCMetadata` and the metadata table schema to include `pub_commit_lsn`.
Node Initialization with Publication-Commit Capture `internal/consistency/mtree/merkle.go`	`MtreeInit` delegates per-node setup to `initOneNode`, which captures `pub_commit_lsn` during publication creation, creates replication slots outside transactions, and persists metadata under session-scoped `spock.repair_mode` with cleanup on failures.
Publication-Commit Guard and Metadata-Update Behavior `internal/infra/cdc/listen.go`	Stream processing now reads `pub_commit_lsn`, warns on legacy empty values, rejects stale `start_lsn` values older than `pub_commit_lsn`, and removes confirmed_flush_lsn-based backward-move logic.
Spock Helper Functions and Connection Params `internal/infra/cdc/setup.go`, `internal/infra/db/auth.go`	Adds `ExcludeMetadataFromSpockRepsets` and `SetSpockRepairMode` helpers (no-ops if Spock absent) and injects Spock GUCs (`spock.enable_ddl_replication=off`, `spock.include_ddl_repset=off`) on ACE connections.
Test Updates and Integration Tests `tests/integration/cdc_busy_table_test.go`, `tests/integration/cdc_init_ordering_test.go`	Adjusted call sites for expanded `GetCDCMetadata` signature and added integration tests asserting slot `confirmed_flush_lsn >= pub_commit_lsn` and that the replication stream rejects stale `start_lsn` relative to publication commit LSN.

Poem

Little rabbit hums beneath the nodes,
I mark the commit LSN where history grows.
Repair mode snug, the metadata stays near,
Repsets drop their tables, quiet and clear.
Tests hop in to prove the order here. 🐰

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'ACE-186: mtree init publication ordering + exclude metadata from spock' directly addresses the two root causes fixed: publication ordering in MtreeInit and excluding metadata from Spock replication, matching the primary changes across all modified files.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The pull request description clearly explains the root causes (init ordering and Spock DDL replication), the fix approach across multiple commits, regression test coverage, and upgrade notes—all directly related to the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch ace-186

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codacy-production · 2026-05-15T12:10:49Z

Up to standards ✅

🟢 Issues 2 medium

Results:
2 new issues

Category Results

Complexity 2 medium

View in Codacy

🟢 Metrics 8 complexity · 4 duplication

Metric Results

Complexity 8

Duplication 4

View in Codacy

_{NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer}
_{TIP This summary will be updated as you push new changes.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

db/queries/templates.go (1)
376-387: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Split this multi-statement template into separate Exec calls.

The template emits CREATE TABLE ...; ALTER TABLE ... as a single SQL string. pgx v5 uses the extended protocol by default, which does not support multiple statements in a single Exec() call. This will fail at runtime. Either split into two separate templates with two Exec calls, or explicitly configure the connection to use QueryExecModeSimpleProtocol (not found in the codebase).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@db/queries/templates.go` around lines 376 - 387, The template
CreateCDCMetadataTable currently emits two SQL statements ("CREATE TABLE ...;"
and "ALTER TABLE ...;") in one string which fails under pgx v5 extended
protocol; split this into two separate templates and Exec calls: keep the CREATE
TABLE statement in CreateCDCMetadataTable (or a renamed
createAceCDCMetadataTable) and move the ALTER TABLE ... ADD COLUMN IF NOT EXISTS
pub_commit_lsn into a new template (e.g., AddPubCommitLsnToAceCDCMetadata) and
call Exec for each template separately, referencing {{aceSchema}} and
ace_cdc_metadata in both so they run sequentially at migration time.

🧹 Nitpick comments (1)

internal/infra/cdc/setup.go (1)
90-99: 🏗️ Heavy lift

Consider constraining the API signature to enforce session pinning.

All current call sites correctly use pinned transactions (tx from pool.Begin()), but the function signature accepts a generic queries.DBQuerier. While the wrapper at db/queries/queries.go:3007 documents that Set(true) and Set(false) must pair on the same session, the generic API allows future callers to pass a pool handle and inadvertently separate the toggles across sessions, poisoning one connection.

Consider adding compile-time enforcement (e.g., a type wrapper or interface constraint) to make session-pinning explicit and prevent misuse.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/infra/cdc/setup.go` around lines 90 - 99, SetSpockRepairMode
currently accepts a generic queries.DBQuerier which lets callers pass a pool
handle and accidentally toggle repair mode on different sessions; change the API
to require a session-bound type (e.g., accept the transaction/session interface
used by pool.Begin() such as a queries.DBTx or a new interface like
queries.SessionQuerier) so SetSpockRepairMode(ctx context.Context, session
queries.SessionQuerier, on bool) enforces compile-time session pinning; update
all callers to pass the tx returned from Begin() and call
queries.CheckSpockInstalled and queries.SetSpockRepairMode on that session so
Set(true)/Set(false) always occur on the same connection.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/consistency/mtree/merkle.go`:
- Around line 866-870: The call to queries.CurrentWalInsertLSN that sets
pubCommitLSN is happening while Phase A's transaction (tx) is still open; move
the WAL LSN capture so it occurs after Phase A commits and before Phase B starts
so the read reflects the commit LSN. Concretely: finish and commit the Phase A
transaction first, then call queries.CurrentWalInsertLSN (using an appropriate
context/connection outside the committed tx or a new read-only tx) to set
pubCommitLSN, and only then proceed to Phase B logic (the code that relies on
pubCommitLSN and uses tx/slot-guard checks).

In `@internal/infra/cdc/listen.go`:
- Around line 467-489: After calling SetSpockRepairMode(processingCtx, tx, true)
you must immediately schedule cleanup so session-scoped repair_mode is always
disabled; add defer SetSpockRepairMode(processingCtx, tx, false) right after the
successful true call (before calling queries.UpdateCDCMetadata or committing tx)
so any early return or error path will run the disable. Update or remove the
later explicit SetSpockRepairMode(..., false) calls as needed to avoid
double-disabling or handle their errors gracefully; ensure the deferred call
runs regardless of conn.Close or tx.Commit failures and continues to return the
connection to the pool in a neutral state.

---

Outside diff comments:
In `@db/queries/templates.go`:
- Around line 376-387: The template CreateCDCMetadataTable currently emits two
SQL statements ("CREATE TABLE ...;" and "ALTER TABLE ...;") in one string which
fails under pgx v5 extended protocol; split this into two separate templates and
Exec calls: keep the CREATE TABLE statement in CreateCDCMetadataTable (or a
renamed createAceCDCMetadataTable) and move the ALTER TABLE ... ADD COLUMN IF
NOT EXISTS pub_commit_lsn into a new template (e.g.,
AddPubCommitLsnToAceCDCMetadata) and call Exec for each template separately,
referencing {{aceSchema}} and ace_cdc_metadata in both so they run sequentially
at migration time.

---

Nitpick comments:
In `@internal/infra/cdc/setup.go`:
- Around line 90-99: SetSpockRepairMode currently accepts a generic
queries.DBQuerier which lets callers pass a pool handle and accidentally toggle
repair mode on different sessions; change the API to require a session-bound
type (e.g., accept the transaction/session interface used by pool.Begin() such
as a queries.DBTx or a new interface like queries.SessionQuerier) so
SetSpockRepairMode(ctx context.Context, session queries.SessionQuerier, on bool)
enforces compile-time session pinning; update all callers to pass the tx
returned from Begin() and call queries.CheckSpockInstalled and
queries.SetSpockRepairMode on that session so Set(true)/Set(false) always occur
on the same connection.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5811b76d-c669-4f27-abcf-7fa388daaab6

📥 Commits

Reviewing files that changed from the base of the PR and between a1b3388 and d0e483c.

📒 Files selected for processing (7)

db/queries/queries.go
db/queries/templates.go
internal/consistency/mtree/merkle.go
internal/infra/cdc/listen.go
internal/infra/cdc/setup.go
tests/integration/cdc_busy_table_test.go
tests/integration/cdc_init_ordering_test.go

ibrarahmad

Diagnosis and ordering invariant look sound; tests cover the right invariants. Notes inline. PR-level nit: the body says "two commits (fix-first, then test)" but the branch has four; either update the description or rebase back to two for a bisect-clean history.

…ming) Round of fixes for the human review comments on PR #119: - flushMetadata (listen.go:542): same pool-poisoning hole as Site 1. Port to the borrowed-*pgxpool.Conn pattern so Set(false) is deferred on the live session before Release. This site fires every 10 s on the periodic-flush path AND on shutdown, so it was the highest-risk remaining leak. - MtreeInit Phase C (merkle.go:884): same hole. Borrow the conn, defer Set(false) on it, run the InitCDCMetadata write through a tx opened on the same conn. - Guard scope comment (listen.go:103): clarify that the pub_commit_lsn guard is a tripwire for partial-state-on-upgrade and for cross-node leaks whose LSN lands below local pub_commit_lsn — NOT a general leak detector. The actual defenses against cross-node leak are repset exclusion + repair_mode wrapping every metadata write. Reader-of-the-future shouldn't mistake this for an end-to-end check. - Test timing (cdc_init_ordering_test.go:186): bump the sentinel-propagation window from 3 s to 10 s. 3 s was tight under load on slow CI; 10 s gives Spock apply latency real headroom without making the test materially slower in the common case. Two non-blocking comments deliberately deferred to follow-ups: - CheckSpockInstalled cache on the pool (~12 RTTs/min savings) - Hoist 'ace_cdc_metadata' relname into a shared constant CodeRabbit's pubCommitLSN-timing finding remains skipped: the invariant chain captured_mid_tx < commit_lsn(Phase A) <= slot.consistent_point holds for freshly-created slots, which is the only state Phase B produces. The hypothetical "broken slot whose consistent_point sits between captured and commit" cannot arise from this code path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/infra/cdc/listen.go`:
- Around line 568-575: The deferred reset of spock.repair_mode in flushMetadata
currently uses the caller's cancelable ctx which can be canceled before defer
runs; change the defer to use a non-cancelable context (like the pattern in
processReplicationStream) so the reset always runs: keep the initial call
SetSpockRepairMode(ctx, c, true) as-is, then create a non-cancelable context
(e.g. ctxNoCancel := context.WithoutCancel(ctx)) and call
SetSpockRepairMode(ctxNoCancel, c, false) inside the defer; reference
SetSpockRepairMode and flushMetadata to locate and update the defer.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 582e9095-53ab-40a8-a71b-ea48d3c87de7

📥 Commits

Reviewing files that changed from the base of the PR and between d0e483c and 5862fa7.

📒 Files selected for processing (3)

internal/consistency/mtree/merkle.go
internal/infra/cdc/listen.go
tests/integration/cdc_init_ordering_test.go

🚧 Files skipped from review as they are similar to previous changes (2)

tests/integration/cdc_init_ordering_test.go
internal/consistency/mtree/merkle.go

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/integration/cdc_init_ordering_test.go (1)

184-186: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stale comment: mentions "3 s window" but code uses 10 seconds.

The comment references the original 3-second timeout, but the deadline was bumped to 10 seconds per prior review feedback.

📝 Suggested fix

 // assertSentinelDoesNotPropagate writes a sentinel start_lsn on n1 and
-// verifies it never appears on n2 within a 3 s window. Under bug 2 the
+// verifies it never appears on n2 within a 10 s window. Under bug 2 the
 // sentinel would propagate via Spock apply within seconds.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration/cdc_init_ordering_test.go` around lines 184 - 186, Update
the stale comment above assertSentinelDoesNotPropagate to reflect the current 10
second timeout (instead of "3 s window"); locate the comment that mentions "3 s
window" in the assertSentinelDoesNotPropagate test and change it to state "10 s
window" or "10-second window" so it matches the test's actual deadline.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@tests/integration/cdc_init_ordering_test.go`:
- Around line 184-186: Update the stale comment above
assertSentinelDoesNotPropagate to reflect the current 10 second timeout (instead
of "3 s window"); locate the comment that mentions "3 s window" in the
assertSentinelDoesNotPropagate test and change it to state "10 s window" or
"10-second window" so it matches the test's actual deadline.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c2d4b8b7-a44a-434f-9397-23eb7b69ae3e

📥 Commits

Reviewing files that changed from the base of the PR and between 5862fa7 and 5f50b9a.

📒 Files selected for processing (6)

db/queries/queries.go
db/queries/templates.go
internal/consistency/mtree/merkle.go
internal/infra/cdc/listen.go
internal/infra/cdc/setup.go
tests/integration/cdc_init_ordering_test.go

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@db/queries/queries.go`:
- Around line 2919-2928: GetCDCMetadata currently fails when the
ace_cdc_metadata table lacks pub_commit_lsn; modify the function
(GetCDCMetadata) to detect Postgres "undefined_column" (SQLSTATE 42703) from the
db.QueryRow(...).Scan(...) error and treat it as a non-fatal condition: if the
error is a Postgres error with Code == "42703" (use errors.As to check for the
driver-specific error type such as *pgconn.PgError or *pq.Error depending on the
codebase), return slotName, startLSN, tables and an empty pubCommitLSN with nil
error so callers can run the legacy fallback; otherwise return the original
error as before. Ensure the detection targets the error code "42703"
specifically.

In `@internal/infra/cdc/listen.go`:
- Around line 113-126: Update the recovery guidance text in the ace_cdc_metadata
LSN checks so it includes "ace mtree build" alongside "ace mtree teardown" and
"ace mtree init"; specifically change the message passed to logger.Warn (when
pubCommitLSNStr == "") and the error returned in the startLSN < pubCommitLSN
branch where the fmt.Errorf is created (these are the strings used in
logger.Warn, logger.Error and the fmt.Errorf in listen.go) to list "ace mtree
teardown", "ace mtree init", and "ace mtree build" in that order so users are
directed to run build after init.

In `@internal/infra/cdc/setup.go`:
- Around line 68-71: The schema-qualified table name constructed in the variable
qualified (currently fmt.Sprintf("%s.ace_cdc_metadata", aceSchema)) must use
proper SQL identifier quoting so Spock sees the exact relation; update the code
that builds qualified before calling queries.RepsetRemoveTable(ctx, db, set,
qualified) to quote aceSchema (and the ace_cdc_metadata identifier) with the
project’s identifier-quoting helper (or a safe QuoteIdentifier function) and
then pass the quoted qualified name (e.g., quotedSchema + "." + quotedTable) to
RepsetRemoveTable so identifiers with capitals or special chars resolve
correctly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 81b5d968-23d1-46f0-bf8b-f18b6c962f24

📥 Commits

Reviewing files that changed from the base of the PR and between 5f50b9a and 51b5c1c.

📒 Files selected for processing (7)

db/queries/queries.go
db/queries/templates.go
internal/consistency/mtree/merkle.go
internal/infra/cdc/listen.go
internal/infra/cdc/setup.go
tests/integration/cdc_busy_table_test.go
tests/integration/cdc_init_ordering_test.go

Fixes 'publication "ace_mtree_pub" does not exist' (SQLSTATE 42704) during 'ace mtree table-diff'. Init order: the replication slot was being created before the publication was committed to WAL. The slot's consistent point therefore preceded the publication, and pgoutput's get_publication_oid failed during change-callback replay. MtreeInit is now split into three phases per node: Phase A commits schema + metadata table + publication and captures the publication commit LSN; Phase B creates the slot (its consistent point is now strictly after the publication); Phase C persists slot/start_lsn/ pub_commit_lsn. A new pub_commit_lsn column is added to ace_cdc_metadata (additive, ALTER TABLE IF NOT EXISTS). processReplicationStream refuses to open a stream whose start_lsn is older than pub_commit_lsn, returning an actionable error instead of silently rewinding to a pre-publication LSN. Empty pub_commit_lsn (legacy metadata rows) skips the check with a warning. Existing broken installs are not auto-repaired; users must run 'ace mtree teardown' + 'ace mtree init' + 'ace mtree build' after upgrade. '--skip-cdc' remains a valid interim workaround for pre-upgrade clusters.

Two integration tests against the spock cluster fixture: - TestMtreeInitSlotIsAfterPublication: after MtreeInit, every node's slot consistent_point must be >= the publication's commit LSN recorded in ace_cdc_metadata.pub_commit_lsn. This is the ordering invariant the 3-phase init split restores. - TestProcessReplicationStreamRejectsStaleStartLSN: corrupts n1's start_lsn to a value below pub_commit_lsn and asserts UpdateFromCDC returns the new guard's actionable error rather than the cryptic SQLSTATE 42704 that pgoutput would otherwise emit.

BuildMtree's per-node body opens a tx with defer tx.Rollback in the outer function scope, then on any error explicitly calls pool.Close() and returns the error. pool.Close() waits on puddle's WaitGroup for every acquired conn to be released — but the conn the tx holds will not be released until the deferred tx.Rollback runs, and that defer cannot run until the function returns, which is being blocked by pool.Close(). Classic Go defer-vs-explicit-close deadlock: the process hangs and the underlying error never surfaces. The shape repeats at every error site inside the per-iteration body (AlterPublicationAddTable, GetCDCMetadata, UpdateCDCMetadata, createMtreeObjects, insertBlockRanges, computeLeafHashes, buildParentNodes, tx.Commit). All eight are reachable; any one being hit hangs the build. Wrap the per-iteration body in a closure so defers are scoped to one iteration. defer pool.Close lands above defer tx.Rollback in registration order, so LIFO fires tx.Rollback first (releases the conn) then pool.Close (no acquired conns to wait for). The explicit pool.Close calls inside each error branch go away — defers handle it uniformly on success and failure. Confirmed via SIGQUIT goroutine dump showing main blocked in puddle.Pool.Close.deferwrap1 → WaitGroup.Wait, called from BuildMtree's createMtreeObjects error branch. The same defer-stacking pattern exists in the loop further down BuildMtree (around the UpdateMtree block) and in a few other *.Close sites elsewhere in this file. They don't deadlock today because either no tx is held or the order happens to be safe, but the closure pattern would make them uniformly correct. Left for a follow-up so this commit stays focused on the reproduced bug.

createMtreeObjects has called CreateSchema and CreateXORFunction at the start of every node's build iteration since the original BuildMtree commit (a05445e, July 2025) — back when build was the entry point and had to set up the schema and XOR function itself. When CDC-based MtreeInit landed (fd02c30, August 2025), those calls became redundant: init's Phase A creates both, and BuildMtree implicitly requires init to have run (it consumes the CDC publication, slot, and metadata that init sets up). The build-side calls were never cleaned up. The redundancy was harmless on vanilla PG. On Spock-replicated clusters it surfaces as SQLSTATE XX000 "tuple concurrently updated": CREATE OR REPLACE FUNCTION unconditionally writes pg_proc, and Spock's DDL apply of n1's BuildMtree replicating to n2 races with n2's BuildMtree running the same statement. The race was masked before the previous commit by the BuildMtree pool.Close deadlock that swallowed the error and hung the process. CREATE TABLE IF NOT EXISTS for ace_mtree_metadata stays — IF NOT EXISTS short-circuits without touching pg_class when the relation already exists, so the same race doesn't apply.

ACE issues DDL exclusively against its own pgedge_ace schema and the ace_mtree_pub publication — never against user objects — and every piece of that DDL is intentionally per-node. Letting Spock replicate it produces a chain of cross-node interactions: - ace_cdc_metadata gets auto-added to peer nodes' default repsets by Spock's CREATE TABLE event trigger, leaking per-node LSNs cross-node. - CREATE OR REPLACE FUNCTION on pgedge_ace.bytea_xor races on pg_proc when n1's build replicates to n2 while n2's build runs the same statement (SQLSTATE XX000). - n2's SetupPublication DROP+CREATE replicates back to n1, landing in n1's pgoutput stream mid-replay and producing the original SQLSTATE 42704 "publication does not exist" the PR was meant to fix. Set spock.enable_ddl_replication = off and spock.include_ddl_repset = off as connection startup options on every ACE pool. Both are PGC_USERSET in Spock, so connection-level options override cluster defaults. Both are dotted custom GUCs, so PostgreSQL accepts them on vanilla-PG instances without Spock loaded — the parameters become placeholder custom variables with no effect, keeping the dual-mode native-PG path intact. Once this is verified in production, the per-DDL workarounds in earlier commits (post-init repset cleanup sweep, SetSpockRepairMode wrapping at every metadata-write site, ExcludeMetadataFromSpockRepsets itself) all become removable. Leaving them in place for now as defense-in-depth.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (3)

internal/infra/cdc/listen.go (1)
113-126: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Include ace mtree build in the recovery guidance.

The supplied reviewer notes say broken installs need teardown, init, and build. Stopping the warning/error text at init still leaves users with an unbuilt tree after they follow the message.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/infra/cdc/listen.go` around lines 113 - 126, The user-facing
recovery guidance strings in the error and warning paths (logger.Warn call and
the fmt.Errorf returned when startLSN < pubCommitLSN in listen.go) currently
instruct to run 'ace mtree teardown' and 'ace mtree init' but omit the required
'ace mtree build' step; update both the logger.Warn message (the call to
logger.Warn referencing ace_cdc_metadata and publication) and the returned
fmt.Errorf message (the error constructed when startLSN < pubCommitLSN) to
append "and 'ace mtree build'" so the guidance reads "run 'ace mtree teardown',
'ace mtree init' and 'ace mtree build' to recover".
internal/consistency/mtree/merkle.go (1)
861-869: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't persist a pre-commit WAL position as pub_commit_lsn.

CurrentWalInsertLSN is still called while Phase A is open. The commit record lands later, so a stale start_lsn that falls between this captured value and the real commit LSN will bypass the new guard and can still reproduce SQLSTATE 42704. Capture the WAL position only after Phase A commits.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/consistency/mtree/merkle.go` around lines 861 - 869, The code
captures pubCommitLSN via CurrentWalInsertLSN while Phase A is still open,
risking a pre-commit WAL position; move the capture so it runs after the Phase A
transaction is committed: call tx.Commit(m.Ctx) first (or ensure the transaction
has committed) and only then call queries.CurrentWalInsertLSN(m.Ctx, tx) to set
pubCommitLSN, updating any error handling and the surrounding logic in merkle.go
(references: pubCommitLSN, CurrentWalInsertLSN, tx.Commit) so the persisted
pub_commit_lsn reflects the post-commit WAL position.
db/queries/queries.go (1)
2919-2928: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle pre-migration ace_cdc_metadata layouts before reading pub_commit_lsn.

This path still hard-fails on upgraded clusters whose metadata table predates the new column: the SELECT fails before callers can treat pubCommitLSN == "" as the legacy case. Catch undefined_column and retry the legacy 3-column read, or run the additive migration before every read.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@db/queries/queries.go` around lines 2919 - 2928, GetCDCMetadata currently
fails when the pub_commit_lsn column is missing on older schemas; modify
GetCDCMetadata to detect the undefined_column error (SQLSTATE 42703 or error
text containing "undefined_column") returned from db.QueryRow when using
SQLTemplates.GetCDCMetadata, and on that specific error re-run a legacy 3-column
query (e.g. render and QueryRow using a legacy template like
SQLTemplates.GetCDCMetadataLegacy or a SQL string selecting only slot_name,
start_lsn, tables), Scan into slotName, startLSN, tables, set pubCommitLSN = ""
and return nil; for any other error continue to return the original error.
Ensure you reference GetCDCMetadata, RenderSQL, SQLTemplates.GetCDCMetadata and
the DBQuerier.QueryRow/Scan calls when implementing the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/consistency/mtree/merkle.go`:
- Around line 1453-1535: During the per-node loop you only defer pool.Close()
for the node handled in the closure, leaving other entries in the pools map open
on early return; fix by removing each pool from the pools map once you take
ownership and also add an outer cleanup after the loop to close any remaining
pools. Specifically, inside the closure after obtaining pool :=
pools[nodeInfo["Name"].(string)] and deferring pool.Close(), call delete(pools,
nodeInfo["Name"].(string)) so the map no longer retains that pool; then after
the for _, nodeInfo := range m.ClusterNodes loop finishes (or on error return
path), iterate remaining entries in pools and call Close() on each to ensure
none are leaked. Refer to symbols: pools, nodeInfo, pool.Close(), the
per-iteration closure around the loop and m.ClusterNodes.

---

Duplicate comments:
In `@db/queries/queries.go`:
- Around line 2919-2928: GetCDCMetadata currently fails when the pub_commit_lsn
column is missing on older schemas; modify GetCDCMetadata to detect the
undefined_column error (SQLSTATE 42703 or error text containing
"undefined_column") returned from db.QueryRow when using
SQLTemplates.GetCDCMetadata, and on that specific error re-run a legacy 3-column
query (e.g. render and QueryRow using a legacy template like
SQLTemplates.GetCDCMetadataLegacy or a SQL string selecting only slot_name,
start_lsn, tables), Scan into slotName, startLSN, tables, set pubCommitLSN = ""
and return nil; for any other error continue to return the original error.
Ensure you reference GetCDCMetadata, RenderSQL, SQLTemplates.GetCDCMetadata and
the DBQuerier.QueryRow/Scan calls when implementing the change.

In `@internal/consistency/mtree/merkle.go`:
- Around line 861-869: The code captures pubCommitLSN via CurrentWalInsertLSN
while Phase A is still open, risking a pre-commit WAL position; move the capture
so it runs after the Phase A transaction is committed: call tx.Commit(m.Ctx)
first (or ensure the transaction has committed) and only then call
queries.CurrentWalInsertLSN(m.Ctx, tx) to set pubCommitLSN, updating any error
handling and the surrounding logic in merkle.go (references: pubCommitLSN,
CurrentWalInsertLSN, tx.Commit) so the persisted pub_commit_lsn reflects the
post-commit WAL position.

In `@internal/infra/cdc/listen.go`:
- Around line 113-126: The user-facing recovery guidance strings in the error
and warning paths (logger.Warn call and the fmt.Errorf returned when startLSN <
pubCommitLSN in listen.go) currently instruct to run 'ace mtree teardown' and
'ace mtree init' but omit the required 'ace mtree build' step; update both the
logger.Warn message (the call to logger.Warn referencing ace_cdc_metadata and
publication) and the returned fmt.Errorf message (the error constructed when
startLSN < pubCommitLSN) to append "and 'ace mtree build'" so the guidance reads
"run 'ace mtree teardown', 'ace mtree init' and 'ace mtree build' to recover".

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1d519589-d288-4641-bff9-3afbd4610262

📥 Commits

Reviewing files that changed from the base of the PR and between 51b5c1c and 93511c1.

📒 Files selected for processing (8)

db/queries/queries.go
db/queries/templates.go
internal/consistency/mtree/merkle.go
internal/infra/cdc/listen.go
internal/infra/cdc/setup.go
internal/infra/db/auth.go
tests/integration/cdc_busy_table_test.go
tests/integration/cdc_init_ordering_test.go

✅ Files skipped from review due to trivial changes (1)

internal/infra/db/auth.go

BuildMtree pre-builds a pool per node up front in the row-estimate loop, then hands the populated map to the per-node build loop which closes each pool only as its own iteration completes. If a later iteration errors out and exits, pools for nodes that loop never reached stay open for the rest of the process lifetime — a real leak in HTTP API server invocations, mostly invisible under CLI invocation where the process exits anyway. Add a defer at the top of the function that closes every entry in the pools map on exit. pgxpool.Pool.Close uses sync.Once internally, so the per-iteration closes still take effect promptly and this defer is a no-op for pools already closed by them. Pre-existing leak; the closure refactor for the pool.Close deadlock didn't introduce or remove it.

The pub_commit_lsn column is added by MtreeInit's Phase A via the additive ALTER TABLE in CreateCDCMetadataTable. On clusters upgraded to this binary without re-running init — say, going straight to mtree build or mtree table-diff — the table still has the old 3-column layout. The previous SELECT referenced pub_commit_lsn as a column, so PostgreSQL raised SQLSTATE 42703 ("column does not exist") before the listen.go fallback for an empty pub_commit_lsn could fire. CDC startup / build aborted instead of warning and continuing. Extract pub_commit_lsn via to_jsonb(row) ->> 'pub_commit_lsn' instead of a direct column reference. The SQL parses against any schema version; on legacy rows the JSON object has no key for the column, ->> returns NULL, COALESCE produces the empty string the listen.go guard already treats as "invariant uncheckable — warn and skip". The next MtreeInit still backfills the real column via the additive ALTER TABLE in CreateCDCMetadataTable.

The two recovery messages in processReplicationStream told users to run 'ace mtree teardown' and 'ace mtree init' but stopped there. After init, the merkle tree itself is empty; a subsequent table-diff would either fail or produce incorrect results. 'ace mtree build' is the final step that repopulates the tree and makes table-diff usable again. Updated both messages to spell out the full three-step sequence.

danolivo requested a review from mason-sharp May 15, 2026 12:09

danolivo changed the title ~~fix: mtree init publication ordering + exclude metadata from spock~~ ACE-186: mtree init publication ordering + exclude metadata from spock May 15, 2026

coderabbitai Bot reviewed May 15, 2026

View reviewed changes

Comment thread internal/consistency/mtree/merkle.go

Comment thread internal/infra/cdc/listen.go Outdated

ibrarahmad self-requested a review May 16, 2026 02:20

ibrarahmad reviewed May 16, 2026

View reviewed changes

coderabbitai Bot reviewed May 19, 2026

View reviewed changes

Comment thread internal/infra/cdc/listen.go Outdated

danolivo force-pushed the ace-186 branch 3 times, most recently from dbd3724 to dcc6d9c Compare May 19, 2026 09:42

danolivo requested a review from ibrarahmad May 19, 2026 10:06

coderabbitai Bot reviewed May 20, 2026

View reviewed changes

danolivo force-pushed the ace-186 branch from 5f50b9a to 04b11af Compare May 20, 2026 06:51

ibrarahmad approved these changes May 21, 2026

View reviewed changes

mason-sharp force-pushed the ace-186 branch from 04b11af to 51b5c1c Compare May 24, 2026 23:35

coderabbitai Bot reviewed May 24, 2026

View reviewed changes

Comment thread db/queries/queries.go

Comment thread internal/infra/cdc/listen.go Outdated

Comment thread internal/infra/cdc/setup.go Outdated

Andrei Lepikhov and others added 5 commits May 24, 2026 19:04

mason-sharp force-pushed the ace-186 branch from 51b5c1c to 93511c1 Compare May 25, 2026 02:14

coderabbitai Bot reviewed May 25, 2026

View reviewed changes

Comment thread internal/consistency/mtree/merkle.go

mason-sharp added 3 commits May 24, 2026 19:28

Conversation

danolivo commented May 15, 2026 • edited by mason-sharp Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root causes

Fix shape

Regression coverage

Upgrade notes

Uh oh!

coderabbitai Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Poem

❌ Failed checks (1 warning)

Uh oh!

codacy-production Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Up to standards ✅

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ibrarahmad left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

danolivo commented May 15, 2026 •

edited by mason-sharp

Loading

coderabbitai Bot commented May 15, 2026 •

edited

Loading

codacy-production Bot commented May 15, 2026 •

edited

Loading

ibrarahmad left a comment •

edited

Loading