Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions src/explanation/semantic-matching.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,45 @@ This means you're trying to join tables that have a namesake attribute (`id`) wi
"""
```

## The `~lineage` table

Lineage is stored per-schema in a hidden table called `~lineage`. DataJoint writes one row per `(table, attribute)` pair recording the lineage string for that attribute, and consults `~lineage` whenever it needs to compare lineages across query expressions.

`~lineage` is maintained as part of normal table declaration:

- Declaring a new table inserts its lineage rows as the last step of `Table.declare()`.
- When `@schema(MyTable)` runs on an already-declared table, DataJoint checks the heading's loaded lineage values for the symptom of missing rows — any primary-key attribute with `lineage=None`. The check is in memory, against values already loaded when the heading was constructed, so a healthy schema pays zero additional database queries on re-decoration. If the check fires, DataJoint refreshes the table's `~lineage` rows from the current foreign-key definition. *(New in DataJoint 2.3.)*

This automatic check addresses the load-bearing failure mode: a partial declare or an upgrade that left some rows missing entirely. **It does not auto-heal stale-but-non-None entries** (e.g. older DataJoint versions that wrote lineage strings in a slightly different format). Those still require a manual rebuild — see below.

### Manual lineage rebuild

When the in-memory check is not enough — stale rows that exist but carry the wrong value, schemas whose tables have not been re-decorated under DataJoint 2.3+, or schemas in `create_tables=False` production mode — call `schema.rebuild_lineage()` or the equivalent migration helper:

```python
import datajoint as dj
from datajoint import migrate

schema = dj.Schema("my_schema")

# Option 1: instance method
schema.rebuild_lineage()

# Option 2: migration helper with dry-run preview
migrate.rebuild_lineage(schema, dry_run=True) # show what would change
migrate.rebuild_lineage(schema, dry_run=False) # apply
```

This is also the recovery path when a join fails with:

```
DataJointError: Cannot join on attribute `X`: lineage missing on one side
(None vs ...). This usually indicates a stale `~lineage` entry from an older
DataJoint version or an incomplete declare. Run `schema.rebuild_lineage()` ...
```

`rebuild_lineage()` is safe to run on a healthy schema — it's idempotent and produces the same end state regardless of starting condition.

## Examples

### Valid Join (Shared Lineage)
Expand Down
13 changes: 11 additions & 2 deletions src/reference/specs/semantic-matching.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,22 +65,31 @@ class Student(dj.Manual):

### Rebuilding Lineage for Existing Schemas

If you have existing schemas created before DataJoint 2.0, rebuild their lineage tables:
!!! version-added "New in 2.3"
`@schema` decoration now performs an **in-memory check** for missing `~lineage` rows when a table is already declared. The check inspects the heading's loaded lineage values (no extra DB queries on healthy schemas); if any primary-key attribute has `lineage=None`, the table's rows are refreshed automatically from the current foreign-key definition. This auto-heals the most common failure mode (partial declares, upgrades that left rows missing) without user action. The explicit `rebuild_lineage()` utility remains the path for cases the in-memory check cannot reach (see below).

If your schema has any of the limitations below — or you want to apply a clean reset — rebuild explicitly:

```python
import datajoint as dj

# Connect and get your schema
schema = dj.Schema('my_database')

# Rebuild lineage (do this once per schema)
# Rebuild lineage
schema.rebuild_lineage()

# Restart Python kernel to pick up changes
```

**Important**: If your schema references tables in other schemas, rebuild those upstream schemas first.

**When you still need to call this explicitly:**

- **Stale-but-non-None rows.** The 2.3 in-memory check fires only on missing rows (`lineage=None`). DataJoint versions older than 2.3 that wrote lineage strings in a different format leave non-None rows that look healthy to the check but no longer match what current code computes. Symptom: a join error of the form `different lineages (a.b.c vs a.b.c)` (values look right but compare unequal). Fix: `rebuild_lineage()`.
- **Production mode (`create_tables=False`).** The auto-heal is suppressed when the schema disallows table creation, to keep production deployments read-only. Migrations to production should rebuild lineage before flipping the flag.
- **Cross-schema upstream changes.** When an upstream schema's lineage was rebuilt, downstream schemas don't automatically pick up the new strings until they're re-decorated. Running `rebuild_lineage()` on the downstream schema propagates the change.

---

## API Reference
Expand Down
Loading