Skip to content

Fix migration memory leak: old collection meta doc not deleted (#7791)#7792

Open
OskarD wants to merge 1 commit intopubkey:masterfrom
OskarD:fix/migration-meta-doc-leak-7791
Open

Fix migration memory leak: old collection meta doc not deleted (#7791)#7792
OskarD wants to merge 1 commit intopubkey:masterfrom
OskarD:fix/migration-meta-doc-leak-7791

Conversation

@OskarD
Copy link

@OskarD OskarD commented Feb 5, 2026

This PR contains:

  • A BUGFIX

Describe the problem you have without this PR

Two bugs in the migration-schema plugin combine to create a memory leak (~100 MB/min) that worsens on every app restart (issue #7791).

Bug 1: After startMigration() migrates all documents, it tries to delete the old collection meta doc using a reference cached at the start of the function. If the doc's _rev has changed (due to concurrent writes or storage internals), the writeSingle() fails with a conflict. The conflict handler only ignores already-deleted docs, so the meta doc persists. On every restart, mustMigrate() finds this orphaned doc and returns true.

Bug 2: migratePromise() calls startMigration() without await (fire-and-forget, by design for multi-tab leader election). When combined with Bug 1, a stale DONE status doc from a previous migration run satisfies the Promise.race immediately, while the orphaned startMigration() creates leaked storage instances and subscriptions. The fire-and-forget also means any errors thrown from startMigration() become unhandled promise rejections.

Changes

  1. Replace single-attempt meta doc deletion with a conflict-retry loop that re-fetches the doc before each attempt (same pattern used by updateStatus() and addConnectedStorageToCollection()).
  2. Add .catch(() => {}) to the fire-and-forget startMigration() call to prevent unhandled promise rejections. Errors still propagate via the status observable.

Todos

  • Tests
  • Typings

…migration (pubkey#7791)

The old collection meta doc deletion failed with a _rev conflict because
the cached reference from the start of startMigration() became stale by
the time deletion ran. The conflict was not retried, leaving the meta doc
orphaned. On every app restart, mustMigrate() found this doc and returned
true, spinning up leaked storage instances and subscriptions (~100 MB/min).

Fix: replace single-attempt deletion with a conflict-retry loop that
re-fetches the doc before each attempt. Also add .catch() to the
fire-and-forget startMigration() call in migratePromise() to prevent
unhandled promise rejections.
@OskarD
Copy link
Author

OskarD commented Feb 5, 2026

Note on .catch(() => {}) (line 585)

The original code called this.startMigration(batchSize) as fire-and-forget without await or .catch(). Errors before the inner try/catch (e.g. createStorageInstance(), leader election) were already unhandled promise rejections. The .catch(() => {}) makes this explicit and prevents runtime crashes in strict environments.

A better fix would wrap the entire startMigration() body in the existing try/catch so early failures also set status = ERROR and surface through the status observable. That way migratePromise() would resolve with ERROR instead of hanging. But that is a separate improvement beyond the scope of this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments