From de191c74dd1483ad478b5937aa0599f241eae61e Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Wed, 10 Jun 2026 16:51:13 -0500 Subject: [PATCH] docs(#1454): document in-memory lineage auto-heal + its limitation Docs companion to datajoint-python #1467, which adds an in-memory check at @schema decoration time: when an already-declared table's heading shows any PK attribute with lineage=None, the table's ~lineage rows are auto-refreshed. Healthy schemas pay zero extra DB queries; the refresh only fires when the symptom is present. - src/explanation/semantic-matching.md: new "The ~lineage table" section explaining how the table is maintained, the in-memory check on decoration (new in 2.3), and explicitly calling out the limitation (stale-but-non-None entries require manual rebuild). Adds the dj.migrate.rebuild_lineage(schema, dry_run) alternative for users who want a preview before applying. - src/reference/specs/semantic-matching.md: version-added admonition on "Rebuilding Lineage" updated to describe the in-memory check rather than an unconditional refresh. The "When you still need to call this explicitly" list now leads with the stale-but-non-None case (the primary scenario the auto-heal can't reach), production-mode suppression, and cross-schema upstream changes. Slated for DataJoint 2.3 alongside datajoint-python #1467. --- src/explanation/semantic-matching.md | 39 ++++++++++++++++++++++++ src/reference/specs/semantic-matching.md | 13 ++++++-- 2 files changed, 50 insertions(+), 2 deletions(-) diff --git a/src/explanation/semantic-matching.md b/src/explanation/semantic-matching.md index bdd5d040..88c24fac 100644 --- a/src/explanation/semantic-matching.md +++ b/src/explanation/semantic-matching.md @@ -158,6 +158,45 @@ This means you're trying to join tables that have a namesake attribute (`id`) wi """ ``` +## The `~lineage` table + +Lineage is stored per-schema in a hidden table called `~lineage`. DataJoint writes one row per `(table, attribute)` pair recording the lineage string for that attribute, and consults `~lineage` whenever it needs to compare lineages across query expressions. + +`~lineage` is maintained as part of normal table declaration: + +- Declaring a new table inserts its lineage rows as the last step of `Table.declare()`. +- When `@schema(MyTable)` runs on an already-declared table, DataJoint checks the heading's loaded lineage values for the symptom of missing rows — any primary-key attribute with `lineage=None`. The check is in memory, against values already loaded when the heading was constructed, so a healthy schema pays zero additional database queries on re-decoration. If the check fires, DataJoint refreshes the table's `~lineage` rows from the current foreign-key definition. *(New in DataJoint 2.3.)* + +This automatic check addresses the load-bearing failure mode: a partial declare or an upgrade that left some rows missing entirely. **It does not auto-heal stale-but-non-None entries** (e.g. older DataJoint versions that wrote lineage strings in a slightly different format). Those still require a manual rebuild — see below. + +### Manual lineage rebuild + +When the in-memory check is not enough — stale rows that exist but carry the wrong value, schemas whose tables have not been re-decorated under DataJoint 2.3+, or schemas in `create_tables=False` production mode — call `schema.rebuild_lineage()` or the equivalent migration helper: + +```python +import datajoint as dj +from datajoint import migrate + +schema = dj.Schema("my_schema") + +# Option 1: instance method +schema.rebuild_lineage() + +# Option 2: migration helper with dry-run preview +migrate.rebuild_lineage(schema, dry_run=True) # show what would change +migrate.rebuild_lineage(schema, dry_run=False) # apply +``` + +This is also the recovery path when a join fails with: + +``` +DataJointError: Cannot join on attribute `X`: lineage missing on one side +(None vs ...). This usually indicates a stale `~lineage` entry from an older +DataJoint version or an incomplete declare. Run `schema.rebuild_lineage()` ... +``` + +`rebuild_lineage()` is safe to run on a healthy schema — it's idempotent and produces the same end state regardless of starting condition. + ## Examples ### Valid Join (Shared Lineage) diff --git a/src/reference/specs/semantic-matching.md b/src/reference/specs/semantic-matching.md index 094edfd0..fa8ed1b2 100644 --- a/src/reference/specs/semantic-matching.md +++ b/src/reference/specs/semantic-matching.md @@ -65,7 +65,10 @@ class Student(dj.Manual): ### Rebuilding Lineage for Existing Schemas -If you have existing schemas created before DataJoint 2.0, rebuild their lineage tables: +!!! version-added "New in 2.3" + `@schema` decoration now performs an **in-memory check** for missing `~lineage` rows when a table is already declared. The check inspects the heading's loaded lineage values (no extra DB queries on healthy schemas); if any primary-key attribute has `lineage=None`, the table's rows are refreshed automatically from the current foreign-key definition. This auto-heals the most common failure mode (partial declares, upgrades that left rows missing) without user action. The explicit `rebuild_lineage()` utility remains the path for cases the in-memory check cannot reach (see below). + +If your schema has any of the limitations below — or you want to apply a clean reset — rebuild explicitly: ```python import datajoint as dj @@ -73,7 +76,7 @@ import datajoint as dj # Connect and get your schema schema = dj.Schema('my_database') -# Rebuild lineage (do this once per schema) +# Rebuild lineage schema.rebuild_lineage() # Restart Python kernel to pick up changes @@ -81,6 +84,12 @@ schema.rebuild_lineage() **Important**: If your schema references tables in other schemas, rebuild those upstream schemas first. +**When you still need to call this explicitly:** + +- **Stale-but-non-None rows.** The 2.3 in-memory check fires only on missing rows (`lineage=None`). DataJoint versions older than 2.3 that wrote lineage strings in a different format leave non-None rows that look healthy to the check but no longer match what current code computes. Symptom: a join error of the form `different lineages (a.b.c vs a.b.c)` (values look right but compare unequal). Fix: `rebuild_lineage()`. +- **Production mode (`create_tables=False`).** The auto-heal is suppressed when the schema disallows table creation, to keep production deployments read-only. Migrations to production should rebuild lineage before flipping the flag. +- **Cross-schema upstream changes.** When an upstream schema's lineage was rebuilt, downstream schemas don't automatically pick up the new strings until they're re-decorated. Running `rebuild_lineage()` on the downstream schema propagates the change. + --- ## API Reference