Skip to content

Latest commit

 

History

History
344 lines (233 loc) · 9.04 KB

File metadata and controls

344 lines (233 loc) · 9.04 KB

Developer: s4gor
Github: https://github.com/s4gor


Schema Sync - Architecture Summary

Project Structure

schema-sync/
├── Cargo.toml              # Project configuration
├── README.md               # User-facing documentation
├── DESIGN.md               # Detailed design rationale
├── ARCHITECTURE.md         # This file - architecture overview
├── .gitignore              # Git ignore rules
├── src/
│   ├── lib.rs              # Main library entry point with architecture diagram
│   ├── adapters.rs         # Database adapter traits (DatabaseAdapter, SchemaInspector, MigrationRunner)
│   ├── diff.rs             # Schema diff calculation and representation
│   ├── engine.rs           # Main engine orchestration
│   ├── errors.rs           # Error types
│   ├── planner.rs          # Migration planning
│   ├── executor.rs         # Migration execution
│   ├── snapshot.rs         # Schema snapshot system
│   └── cli.rs              # CLI types and context
└── examples/
    ├── basic_usage.rs       # Basic sync example
    ├── dry_run_mode.rs      # Dry-run mode example
    └── ci_validation.rs     # CI validation example

Core Abstractions

1. DatabaseAdapter Trait

Purpose: Main entry point for database operations.

Key Methods:

  • inspector()Box<dyn SchemaInspector>
  • migration_runner()Box<dyn MigrationRunner>
  • database_type()&str
  • test_connection()Result<()>

Why it exists: Factory pattern for creating inspectors and runners. Enables multi-database support.

2. SchemaInspector Trait

Purpose: Read-only schema introspection.

Key Methods:

  • inspect_schema(tenant)Result<SchemaSnapshot>
  • schema_exists(tenant)Result<bool>
  • list_tenants()Result<Vec<TenantContext>>

Why it exists:

  • Enables audit mode without write permissions
  • Allows dry-run mode to calculate diffs without locks
  • Supports testing with mock inspectors

3. MigrationRunner Trait

Purpose: Execute schema changes.

Key Methods:

  • execute_migration(tenant, plan)Result<MigrationResult>
  • acquire_lock(tenant, timeout)Result<Box<dyn LockGuard>>
  • validate_migration(tenant, plan)Result<()>

Why it exists:

  • Pluggable migration engines (SQL files, Rust code, external tools)
  • Different strategies per database type
  • Testing with mock runners

4. Planner Trait

Purpose: Create executable migration plans from schema diffs.

Key Methods:

  • create_plan(current, target, diff)Result<MigrationPlan>
  • validate_plan(plan)Result<()>

Why it exists:

  • Dry-run mode can show what would happen
  • Validation of plans before execution
  • Different planning strategies (safe ordering, dependency resolution)

5. Executor Trait

Purpose: Orchestrate the execution of migration plans.

Key Methods:

  • execute(tenant, plan, runner)Result<ExecutionResult>
  • dry_run(tenant, plan, runner)Result<ExecutionResult>

Why it exists:

  • Different execution strategies (transactional, non-transactional)
  • Progress reporting
  • Retry logic

6. DiffCalculator Trait

Purpose: Calculate differences between schema snapshots.

Key Methods:

  • calculate_diff(from, to)SchemaDiff

Why it exists:

  • Different diff algorithms
  • Three-way merge support
  • Conflict detection

7. SnapshotStore Trait

Purpose: Store and retrieve schema snapshots.

Key Methods:

  • store(tenant, snapshot)Result<()>
  • get_latest(tenant)Result<Option<SchemaSnapshot>>
  • get_by_hash(tenant, hash)Result<Option<SchemaSnapshot>>
  • list(tenant)Result<Vec<SchemaSnapshot>>
  • compare(tenant, hash_a, hash_b)Result<SchemaDiff>

Why it exists:

  • Multiple storage backends (filesystem, database, version control)
  • Version history
  • Deterministic versioning

Data Structures

SchemaSnapshot

Normalized, database-agnostic representation of a schema.

Properties:

  • Deterministic: Same schema always produces same snapshot
  • Order-independent (uses HashMaps)
  • Database-agnostic

Contains:

  • Tables (with columns, constraints, indexes)
  • Views
  • Functions
  • Types

SchemaDiff

Represents differences between two snapshots.

Structure: Hierarchical (schema → table → column → constraint)

Contains:

  • Tables: added, removed, modified
  • Views: added, removed, modified
  • Functions: added, removed, modified
  • Types: added, removed, modified

MigrationPlan

Executable sequence of operations to transform schema.

Structure: Ordered steps with dependencies

Contains:

  • Steps (ordered operations)
  • Estimated duration
  • Downtime requirements
  • Warnings

TenantContext

Explicit tenant scoping for all operations.

Properties:

  • Single field: tenant_id: String
  • Required for all operations
  • Prevents cross-tenant leakage

Extension Points

Adding a New Database Type

  1. Implement DatabaseAdapter
  2. Implement SchemaInspector (convert DB schema → SchemaSnapshot)
  3. Implement MigrationRunner (convert MigrationPlan → DB SQL)
  4. Use with engine

No changes needed to: Engine, planner, executor, diff calculator.

Adding a New Migration Strategy

  1. Implement MigrationRunner
  2. Convert MigrationPlan to your format
  3. Execute using your tool

Example: SQL file migrations, diesel migrations, sqlx migrations.

Adding a New Snapshot Storage Backend

  1. Implement SnapshotStore
  2. Store/retrieve SchemaSnapshot in your backend
  3. Use with engine

Example: Filesystem, database, S3, version control.

Adding a New Planning Strategy

  1. Implement Planner
  2. Create MigrationPlan from SchemaDiff
  3. Use with engine

Example: Safe planner for zero-downtime migrations.

Adding a New Diff Algorithm

  1. Implement DiffCalculator
  2. Calculate SchemaDiff from two SchemaSnapshots
  3. Use with engine

Example: Three-way merge calculator.

Operation Modes

Sync Mode

Implementation: Engine::sync_tenant(tenant, target, execute=true)

Behavior: Calculate diff, create plan, execute plan.

Dry-Run Mode

Implementation: Engine::sync_tenant(tenant, target, execute=false)

Behavior: Calculate diff, create plan, validate plan, return diff without executing.

Validation Mode (CI)

Implementation:

  • Engine::sync_tenant(tenant, target, execute=false) for all tenants
  • Check already_in_sync flag
  • Exit non-zero if any tenant has already_in_sync=false

Behavior: Verify all tenants match expected schema.

Audit Mode

Implementation: Use SchemaInspector directly, no MigrationRunner.

Behavior: Read-only inspection, no changes allowed.

Design Decisions

Why Traits Over Enums?

Traits allow:

  • Multiple implementations to coexist
  • Pluggable components
  • Testing with mocks
  • Extension without modification

Why Separate Inspector and Runner?

  • Inspector can be used without runner (audit mode)
  • Different migration strategies can be implemented
  • Testing is easier with mock implementations

Why Separate Planner and Executor?

  • Dry-run mode can plan without executing
  • Validation of plans before execution
  • Different execution strategies

Why Snapshot System?

  • Enables diffing schema version A vs B
  • Supports version control integration
  • Allows deterministic versioning

Why TenantContext Everywhere?

  • Type safety: Can't accidentally operate on wrong tenant
  • Makes tenant isolation explicit
  • Supports batch operations
  • Enables per-tenant locking

Future Implementation Tasks

Phase 1: Core Implementations

  • Default Planner implementation
  • Default Executor implementation
  • Default DiffCalculator implementation
  • File-based SnapshotStore implementation

Phase 2: PostgreSQL Support

  • PostgresAdapter implementation
  • PostgresInspector implementation
  • PostgresMigrationRunner implementation

Phase 3: CLI

  • CLI argument parsing
  • Mode handling (sync, diff, validate, audit)
  • Output formatting (text, JSON)
  • Exit codes for CI

Phase 4: Additional Databases

  • MySQL support
  • SQLite support

Phase 5: Advanced Features

  • Three-way merge support
  • Conflict detection
  • Zero-downtime migration strategies
  • Progress reporting
  • Audit trail

Testing Strategy

Unit Tests

  • Mock implementations of all traits
  • Test each component in isolation

Integration Tests

  • Test database (testcontainers or in-memory)
  • Test full sync flow
  • Test error handling and rollback

Property Tests

  • Deterministic snapshots
  • Reversibility (plan + execution = target)

Conclusion

This architecture provides a solid, extensible foundation for schema synchronization. The trait-based design enables growth without breaking changes, and the separation of concerns makes the codebase maintainable.

The key insight: design for extension first. Every abstraction exists to enable a future feature, not just to solve the current problem.