Skip to content

Fix fill skipping old rows when dest has mirrored data#22

Merged
stefanmb merged 4 commits intomasterfrom
fix-fill-mirrored-data
Feb 20, 2026
Merged

Fix fill skipping old rows when dest has mirrored data#22
stefanmb merged 4 commits intomasterfrom
fix-fill-mirrored-data

Conversation

@stefanmb
Copy link

@stefanmb stefanmb commented Feb 19, 2026

Summary

When enable_mirroring runs before fill, the mirroring trigger copies recent rows into the intermediate table. When fill then runs without --start, it calls destTable.maxId() which returns a high ID (from the mirrored row). Fill queries WHERE id > <high_id>, finds nothing, and returns "nothing to fill". All old rows below that ID are silently skipped.

Root cause

The non-swapped, no---start branch assumed dest was empty and used destTable.maxId() as a resume cursor. With mirroring active, dest already has scattered high-ID rows, so this cursor skips everything below.

Fix

1. Start from source min ID instead of dest max ID (src/filler.ts)

The non-swapped, no---start branch now uses sourceTable.minId(tx, timeFilter) instead of destTable.maxId(tx), matching the pattern already used by Synchronizer.init().

2. Track source progress independently in batch loop (src/filler.ts, src/types.ts)

Restructured the batch CTE into two stages (source_batch SELECT, then inserted INSERT from it). The loop now terminates on sourceCount === 0 (no more source rows) instead of rowsInserted === 0 (no rows inserted). Without this, re-running fill after a partial completion would hit an all-duplicate first batch, RETURNING would produce 0 rows, and the loop would stop before reaching uncopied rows.

3. Fix minId() ordering for non-monotonic PKs (src/table.ts)

Backs out the ORDER BY time LIMIT 1 optimization from #21. That optimization was fast when a time index exists, but assumes PK/time monotonicity and can skip rows with smaller PKs and later timestamps (ULID skew, backfills, manual timestamps). We now order by PK to preserve correctness. A new test ("fills rows with non-monotonic PK and created_at") demonstrates the #21 failure mode where ordering by time picks a higher PK and skips an in-range row. This also fixes Synchronizer.init() which calls the same method.

Resume behavior change

Previously, re-running fill without --start would auto-resume from destTable.maxId(). This was already broken when mirroring was active (the common case), since mirrored rows inflated the max ID and caused fill to skip everything below it.

Now, fill without --start always starts from the beginning of the source table. ON CONFLICT DO NOTHING makes re-scanning already-copied rows cheap (no writes). For very large tables, resume explicitly using the endId from the batch log:

/* batch 1: 10000 rows inserted, endId=10000 */
/* batch 2: 10000 rows inserted, endId=20000 */
/* batch 3: 10000 rows inserted, endId=30000 */  <-- interrupted here

Resume with --start 30000.

Files changed

File Change
src/filler.ts Start from source min ID; two-CTE batch with sourceCount termination
src/types.ts Added sourceCount to FillBatchResult
src/table.ts minId() orders by PK only; docstring updated
src/pgslice.ts Initialize fill/synchronize inside transaction
src/commands/fill.ts Batch log includes rowsInserted and endId
src/commands/fill.test.ts Updated assertions for new log format
src/fill.test.ts New tests for mirrored data, partial fill resume, non-monotonic PKs
README.md Document fill startup scan and resume guidance

Test plan

  • All 300 tests pass (297 existing + 3 new) — prior run
  • New test: "fills old rows when dest has recent mirrored row" — the original bug
  • New test: "copies remaining rows when first batch is all duplicates" — partial fill resume
  • New test: "fills rows with non-monotonic PK and created_at" — demonstrates the Optimize minId query and improve test coverage #21 failure mode

stefanmb and others added 2 commits February 19, 2026 17:51
When enable_mirroring runs before fill, the mirroring trigger copies
recent rows into the intermediate table. Fill without --start called
destTable.maxId() which returned a high ID from the mirrored row,
causing WHERE id > <high_id> to find nothing and skip all old rows.

- Start from sourceTable.minId() instead of destTable.maxId()
- Restructure batch CTE to track source progress independently,
  preventing early termination on all-duplicate batches
- Fix minId() to order by PK only (not time+PK) for non-monotonic PKs
- Add endId to CLI batch log output for resume support

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This avoids 30s timeouts on large table startup scans and documents resume guidance.

Co-authored-by: Cursor <cursoragent@cursor.com>
@stefanmb stefanmb force-pushed the fix-fill-mirrored-data branch from 8049136 to 7ec7ab8 Compare February 20, 2026 15:05
Let fill and synchronize minId lookups use the session timeout and drop
the README note about temporarily raising statement_timeout.

Co-authored-by: Cursor <cursoragent@cursor.com>
@stefanmb stefanmb marked this pull request as ready for review February 20, 2026 19:01
@stefanmb stefanmb requested a review from mthadley February 20, 2026 19:01
Copy link
Collaborator

@mthadley mthadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a few small changes before approving.

@stefanmb stefanmb merged commit 7796a2f into master Feb 20, 2026
7 checks passed
@stefanmb stefanmb deleted the fix-fill-mirrored-data branch February 20, 2026 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants