Skip to content

feat(pg/catalog): add DropsFromDiff for fast destructive-change detection#21

Draft
vsai12 wants to merge 5 commits intomainfrom
feat/drops-from-diff-helper
Draft

feat(pg/catalog): add DropsFromDiff for fast destructive-change detection#21
vsai12 wants to merge 5 commits intomainfrom
feat/drops-from-diff-helper

Conversation

@vsai12
Copy link
Copy Markdown
Contributor

@vsai12 vsai12 commented Apr 10, 2026

Problem

A Bytebase Cloud customer running bytebase-action check --declarative against a ~650-file exported schema gets a consistent 504 (OOMKill) on a 2 GiB Cloud Run pod. Root cause: pgSDLDropAdvices in Bytebase calls catalog.GenerateMigration to build a full migration plan, then throws away everything except the drop ops. At scale (~20k table-equivalent metadata objects), GenerateMigration takes ~46 seconds and allocates heavily — this is the dominant cost that pushes the pod past its memory limit.

The drop-advice function only needs to know what will be dropped. It doesn't need DDL text, topological ordering, or any of the CREATE/ALTER ops that make up the bulk of GenerateMigration's work.

Measured phase timings (from a standalone benchmark at customer scale)

Current tables catalog.LoadSDL(current) Full pgSDLDropAdvices
5,000 1.27 s 2.75 s
10,000 4.32 s 12.1 s
20,000 15.6 s 60.6 s

GenerateMigration accounts for ~46s of the 60.6s total at 20k tables (the rest is LoadSDL).

Solution

Add a new exported function:

func DropsFromDiff(from, to *Catalog, diff *SchemaDiff) *MigrationPlan

It returns the exact same set of destructive MigrationOps that GenerateMigration would produce for the same inputs, with three documented differences:

  1. SQL is always empty — callers needing DDL text must call GenerateMigration.
  2. Warning is always emptyGenerateMigration populates warnings on some destructive ops (e.g., table recreates); DropsFromDiff omits them.
  3. Ops are sorted lexicographically (SchemaName, ObjectName, ParentObject, Type) for determinism, not topologically — the expensive dependency-driven topo sort is skipped.

All other metadata fields (Type, SchemaName, ObjectName, ParentObject, Phase, ObjType, ObjOID, Priority) are populated identically to what GenerateMigration sets.

Coverage

18 dropsForX helpers covering every Drop* op type in GenerateMigration:

Category Op Type Source mirrored Notes
Schema OpDropSchema migration_schema.go
Extension OpDropExtension migration_extension.go
Enum OpDropType migration_enum.go OID via resolveTypeOIDByName
Domain OpDropType migration_domain.go OID via entry.From.TypeOID
Range OpDropType migration_range.go
Composite OpDropType migration_composite.go ObjType='r' (composites are relations)
Sequence OpDropSequence migration_sequence.go isIdentitySequence filter preserved
Function OpDropFunction migration_function.go entry.Identity for overload disambiguation; DiffModify+signatureChanged
Table (drop) OpDropTable migration_table.go RelKind 'r'/'p' filter
Table (recreate) OpDropTable migration_partition.go RelKind flip / inheritance change
Column OpDropColumn migration_column.go
CHECK cascade OpDropConstraint migration_column.go Column type change cascades to CHECK constraints
Constraint OpDropConstraint migration_constraint.go DiffDrop + DiffModify; skips ConstraintTrigger
View/Matview OpDropView migration_view.go 5 emission sites incl. matview-always-DROP
Index OpDropIndex migration_index.go Skips constraint-backed indexes
Trigger OpDropTrigger migration_trigger.go DiffDrop + DiffModify
Policy OpDropPolicy migration_policy.go DiffModify only when CmdType/Permissive changed
Dependent views OpDropView migration.go BFS over from.deps for transitive view cascade

Why duplicate logic instead of refactoring generate*DDL?

A refactor that splits each generate*DDL into decide + emit would touch 11 existing files (~1k lines of churn). This PR is a low-risk additive change: two new files, zero modifications to existing code. The differential test pins both paths together so drift fails loudly. Once validated in production, a follow-up can fold the duplication.

Test plan

  • Per-category regression tests — one per drop category, including edge cases (identity-sequence filter, function overload disambiguation, matview-always-drop, partitioned table RelKind, constraint-trigger skip, etc.)
  • Precondition assertions on DiffModify path tests to prevent vacuous passes
  • Differential test (TestDropsFromDiff_DifferentialAgainstGenerateMigration) — builds a non-trivial fixture, runs both GenerateMigration and DropsFromDiff, compares 15 drop ops across 11 categories field-by-field. Zero mismatches.
  • Race detector clean (go test -race)
  • Full omni test suite passes (go test ./... -count=1 -short)

Caller (Bytebase side)

Once this lands, a small follow-up in bytebase/bytebase rewires pgSDLDropAdvices in backend/plugin/schema/pg/sdl_migration_omni.go (~5 lines changed) to call DropsFromDiff instead of buildMigrationPlan. The existing extractDropAdvicesFromPlan helper works unchanged since DropsFromDiff returns *MigrationPlan.

🤖 Generated with Claude Code

vsai12 and others added 4 commits April 8, 2026 23:05
DropsFromDiff returns a MigrationPlan containing only the destructive
operations that GenerateMigration would produce for the same inputs,
without building SQL text or running the dependency-driven topological
sort. Callers that only need to know *what* will be dropped (e.g.,
destructive-change advisories) can skip GenerateMigration's cost — on
large schemas this can save tens of seconds and over a gigabyte of
transient heap.

All metadata fields (Type, SchemaName, ObjectName, ParentObject, Phase,
ObjType, ObjOID, Priority) are populated identically to what
GenerateMigration sets; only SQL is left empty (documented contract).
Operations are sorted lexicographically by (SchemaName, ObjectName,
ParentObject, Type) for determinism instead of topologically.

This commit adds the helper skeleton plus 8 of ~17 producer arms,
each mirroring the DiffDrop (and where applicable DiffModify) arm of
its corresponding generate*DDL function field-for-field:

  - dropsForSchemas       (migration_schema.go)
  - dropsForEnums         (migration_enum.go)
  - dropsForDomains       (migration_domain.go)
  - dropsForRanges        (migration_range.go)
  - dropsForComposites    (migration_composite.go)  [ObjType='r', not 't']
  - dropsForSequences     (migration_sequence.go)   [isIdentitySequence filter]
  - dropsForFunctions     (migration_function.go)   [entry.Identity, signatureChanged]
  - dropsForTables        (migration_table.go)      [RelKind 'r'/'p' only]

Remaining producers (table recreate, columns + cascaded checks,
constraints, triggers, views/matviews, dependent-view cascade) plus
the differential test against GenerateMigration will land in follow-up
commits.

Tests: per-category regression tests using a shared assertSingleDropOp
helper and a table-driven pattern for the trivial categories. Function
drops have dedicated subtests for DiffDrop, overload disambiguation,
DiffModify via return-type change (with precondition assertions to
prevent vacuous passes), and body-only changes. No existing tests
touched.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…erential test

Add the remaining drop producers to DropsFromDiff, completing coverage
of all 21 drop-emission sites across omni's generate*DDL functions:

  - dropsForTableRecreates  (migration_partition.go — RelKind/inheritance change)
  - dropsForColumns         (migration_column.go — DiffDrop)
  - dropsForCheckCascades   (migration_column.go — CHECK cascade from type change)
  - dropsForConstraints     (migration_constraint.go — DiffDrop + DiffModify)
  - dropsForViews           (migration_view.go — 5 emission sites including matview modify)
  - dropsForTriggers        (migration_trigger.go — DiffDrop + DiffModify)
  - dropsForDependentViews  (migration.go — BFS over from.deps for transitive view cascade)

The dependent-view cascade is the most complex producer: it walks the
unexported catalog dep graph to find views that transitively depend on
tables undergoing column type changes, column drops, RelKind changes,
or inheritance changes, then deduplicates against ops already emitted
by dropsForViews.

Adds a differential test that builds a non-trivial fixture (schema
drop, column type change with CHECK cascade, matview modification,
function drop, enum drop, sequence drop, trigger drop, dependent-view
cascade), runs both GenerateMigration and DropsFromDiff on the same
input, and asserts the drop subsets match field-by-field (12 ops across
8 categories, zero mismatches). This test is the primary correctness
gate preventing the duplicated decide logic from drifting.

Full omni test suite passes (go test ./... -count=1 -short), race
detector clean on pg/catalog.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…fix doc comment

Addresses review findings:

1. Add three missing dropsForX helpers that GenerateMigration covers but
   DropsFromDiff previously omitted:

   - dropsForExtensions  (migration_extension.go — DiffDrop)
   - dropsForIndexes     (migration_index.go — DiffDrop + DiffModify, skipping
                          constraint-backed indexes)
   - dropsForPolicies    (migration_policy.go — DiffDrop + DiffModify when
                          CmdType or Permissive changed)

2. Fix DropsFromDiff doc comment: document Warning as a third difference
   (was "two differences", now correctly says "three differences").

3. Broaden the differential test fixture to include extensions, explicit
   indexes, and RLS policies. The test now compares 15 drop ops across
   11 categories with zero mismatches.

Per-category regression tests added for all three new producers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address code quality findings:

1. Collapse identical DiffDrop/DiffModify case bodies in
   dropsForConstraints and dropsForTriggers — both arms produce the
   same drop op (modify = DROP old + ADD new; we only emit the drop).

2. Extract dropViewOp helper (mirrors dropFunctionOp pattern) to
   replace 5 identical OpDropView struct literals across dropsForViews
   and dropsForDependentViews.

3. Add explanatory comment on dropsForConstraints documenting why
   OpDropConstraint has two different field conventions in the same
   file (dropsForCheckCascades puts table name in ObjectName with
   PhaseMain; dropsForConstraints puts constraint name in ObjectName
   with ParentObject=tableName and PhasePre). Both mirror their
   respective omni sources faithfully.

Net -44 lines of production code.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vsai12 vsai12 requested review from h3n4l and rebelice April 10, 2026 04:37
Adds a defensive nil check on the DiffDrop branch of dropsForIndexes,
matching the pattern used by every other dropsForX helper. The omni
source (migration_index.go) also lacks this check, so this is not a
correctness fix — purely a consistency improvement.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vsai12
Copy link
Copy Markdown
Contributor Author

vsai12 commented Apr 10, 2026

I don't know how useful this will be, but since I had the agents running it for a customer problem (that had a different root cause), I'll just leave it here for reference in the future in case we need it later.

@h3n4l h3n4l marked this pull request as draft April 10, 2026 06:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant