Skip to content

Normalize catalog tables: drop denormalized snapshot/source columns from domain tables #576

@ako

Description

@ako

Problem

Every domain table in mdl/catalog/tables.go (~30 tables) carries the same eight columns:

ProjectId, ProjectName,
SnapshotId, SnapshotDate, SnapshotSource,
SourceId, SourceBranch, SourceRevision

The projects and snapshots lookup tables already exist as proper normalized tables — six of those columns (ProjectName, SnapshotDate, SnapshotSource, SourceId, SourceBranch, SourceRevision) are pure denormalization of snapshots.

This costs:

  • Storage — six redundant TEXT columns × ~30 tables × N rows.
  • Write code — every builder_*.go threads snapshot metadata into every INSERT (~70 references across 16 builder files; builder_modules.go alone has 24).
  • Drift risk — a row's SnapshotDate can disagree with snapshots.SnapshotDate if a future builder forgets to update it. No single source of truth.
  • Schema noisetables.go is 1082 lines, much of it the same six column lines repeated.

Investigation

No external consumers of the denormalized columns:

Location Hits Relevant?
mdl/catalog/ (Go) ~70 yes — builders + views
mdl/linter/, mdl/executor/, mdl-examples/ a few no — unrelated ProjectName Go fields / microflow parameters
reference/mendix-repl/templates/ (user skills) 0 nothing user-facing depends on these
docs/ (excluding archived proposals) 0 undocumented for users

Proposed Approach

  1. Drop the six denormalized columns from every domain table. Keep ProjectId and SnapshotId as FKs.
  2. ProjectId stays alongside SnapshotId so direct project filtering doesn't force a JOIN — pragmatic, one column of denormalization for ergonomics on the most common filter.
  3. Replace each table with a view to preserve query compatibility: rename entitiesentities_data, then create entities as SELECT e.*, s.ProjectName, s.SnapshotDate, … FROM entities_data e LEFT JOIN snapshots s …. Existing queries — including the objects UNION view and ad-hoc WHERE SnapshotSource = 'git' queries — keep working unchanged.

Out of scope

  • No change to Mendix BSON, MPR format, or any Mendix version. Pure SQLite-side refactor of the local catalog cache.
  • Catalog is a cache, not source-of-truth: existing caches regenerate on next build (bump schemaVersion).

Proposal

Full design, file-by-file plan, test plan, and open questions in docs/11-proposals/PROPOSAL_normalize_catalog_tables.md (to be merged with this issue's branch).

Open questions

  1. In-place ALTER TABLE migration vs. bump schemaVersion and regenerate?
  2. Naming for the underlying storage table — entities_data, _entities, entities_raw?
  3. objects UNION view performance with 20+ JOINs — measure on a large fixture before merging.

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions