Fix fill skipping old rows when dest has mirrored data by stefanmb · Pull Request #22 · workos/pgslice

stefanmb · 2026-02-19T22:51:54Z

Summary

When enable_mirroring runs before fill, the mirroring trigger copies recent rows into the intermediate table. When fill then runs without --start, it calls destTable.maxId() which returns a high ID (from the mirrored row). Fill queries WHERE id > <high_id>, finds nothing, and returns "nothing to fill". All old rows below that ID are silently skipped.

Root cause

The non-swapped, no---start branch assumed dest was empty and used destTable.maxId() as a resume cursor. With mirroring active, dest already has scattered high-ID rows, so this cursor skips everything below.

Fix

1. Start from source min ID instead of dest max ID (src/filler.ts)

The non-swapped, no---start branch now uses sourceTable.minId(tx, timeFilter) instead of destTable.maxId(tx), matching the pattern already used by Synchronizer.init().

2. Track source progress independently in batch loop (src/filler.ts, src/types.ts)

Restructured the batch CTE into two stages (source_batch SELECT, then inserted INSERT from it). The loop now terminates on sourceCount === 0 (no more source rows) instead of rowsInserted === 0 (no rows inserted). Without this, re-running fill after a partial completion would hit an all-duplicate first batch, RETURNING would produce 0 rows, and the loop would stop before reaching uncopied rows.

3. Fix minId() ordering for non-monotonic PKs (src/table.ts)

Backs out the ORDER BY time LIMIT 1 optimization from #21. That optimization was fast when a time index exists, but assumes PK/time monotonicity and can skip rows with smaller PKs and later timestamps (ULID skew, backfills, manual timestamps). We now order by PK to preserve correctness. A new test ("fills rows with non-monotonic PK and created_at") demonstrates the #21 failure mode where ordering by time picks a higher PK and skips an in-range row. This also fixes Synchronizer.init() which calls the same method.

Resume behavior change

Previously, re-running fill without --start would auto-resume from destTable.maxId(). This was already broken when mirroring was active (the common case), since mirrored rows inflated the max ID and caused fill to skip everything below it.

Now, fill without --start always starts from the beginning of the source table. ON CONFLICT DO NOTHING makes re-scanning already-copied rows cheap (no writes). For very large tables, resume explicitly using the endId from the batch log:

/* batch 1: 10000 rows inserted, endId=10000 */
/* batch 2: 10000 rows inserted, endId=20000 */
/* batch 3: 10000 rows inserted, endId=30000 */  <-- interrupted here

Resume with --start 30000.

Files changed

File	Change
`src/filler.ts`	Start from source min ID; two-CTE batch with `sourceCount` termination
`src/types.ts`	Added `sourceCount` to `FillBatchResult`
`src/table.ts`	`minId()` orders by PK only; docstring updated
`src/pgslice.ts`	Initialize fill/synchronize inside transaction
`src/commands/fill.ts`	Batch log includes `rowsInserted` and `endId`
`src/commands/fill.test.ts`	Updated assertions for new log format
`src/fill.test.ts`	New tests for mirrored data, partial fill resume, non-monotonic PKs
`README.md`	Document fill startup scan and resume guidance

Test plan

All 300 tests pass (297 existing + 3 new) — prior run
New test: "fills old rows when dest has recent mirrored row" — the original bug
New test: "copies remaining rows when first batch is all duplicates" — partial fill resume
New test: "fills rows with non-monotonic PK and created_at" — demonstrates the Optimize minId query and improve test coverage #21 failure mode

When enable_mirroring runs before fill, the mirroring trigger copies recent rows into the intermediate table. Fill without --start called destTable.maxId() which returned a high ID from the mirrored row, causing WHERE id > <high_id> to find nothing and skip all old rows. - Start from sourceTable.minId() instead of destTable.maxId() - Restructure batch CTE to track source progress independently, preventing early termination on all-duplicate batches - Fix minId() to order by PK only (not time+PK) for non-monotonic PKs - Add endId to CLI batch log output for resume support Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

This avoids 30s timeouts on large table startup scans and documents resume guidance. Co-authored-by: Cursor <cursoragent@cursor.com>

Let fill and synchronize minId lookups use the session timeout and drop the README note about temporarily raising statement_timeout. Co-authored-by: Cursor <cursoragent@cursor.com>

mthadley

Looks good! Just a few small changes before approving.

src/fill.test.ts

src/sql-utils.ts

stefanmb and others added 2 commits February 19, 2026 17:51

Raise startup statement timeout for minId lookups.

7ec7ab8

This avoids 30s timeouts on large table startup scans and documents resume guidance. Co-authored-by: Cursor <cursoragent@cursor.com>

stefanmb force-pushed the fix-fill-mirrored-data branch from 8049136 to 7ec7ab8 Compare February 20, 2026 15:05

Remove startup statement timeout override

8064b02

Let fill and synchronize minId lookups use the session timeout and drop the README note about temporarily raising statement_timeout. Co-authored-by: Cursor <cursoragent@cursor.com>

stefanmb marked this pull request as ready for review February 20, 2026 19:01

stefanmb requested a review from mthadley February 20, 2026 19:01

mthadley requested changes Feb 20, 2026

View reviewed changes

src/fill.test.ts Outdated Show resolved Hide resolved

src/sql-utils.ts Outdated Show resolved Hide resolved

review comments

5e929a8

mthadley approved these changes Feb 20, 2026

View reviewed changes

stefanmb merged commit 7796a2f into master Feb 20, 2026
7 checks passed

stefanmb deleted the fix-fill-mirrored-data branch February 20, 2026 20:07

claude bot added the claude-code-assisted label Feb 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix fill skipping old rows when dest has mirrored data#22

Fix fill skipping old rows when dest has mirrored data#22
stefanmb merged 4 commits intomasterfrom
fix-fill-mirrored-data

stefanmb commented Feb 19, 2026 •

edited

Loading

Uh oh!

mthadley left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

stefanmb commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Fix

Resume behavior change

Files changed

Test plan

Uh oh!

mthadley left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

stefanmb commented Feb 19, 2026 •

edited

Loading