Skip to content

Refine Entity Integrity: formal definition + two required components#187

Open
dimitri-yatsenko wants to merge 1 commit into
mainfrom
refine/entity-integrity-formal-definition
Open

Refine Entity Integrity: formal definition + two required components#187
dimitri-yatsenko wants to merge 1 commit into
mainfrom
refine/entity-integrity-formal-definition

Conversation

@dimitri-yatsenko

Copy link
Copy Markdown
Member

Context

The Entity Integrity concept page describes the 1:1 correspondence between
real-world entities and database records, but doesn't make the formal
precision
explicit. In particular, it understates that entity integrity
requires two components working together — a real-world identification
process plus a database uniqueness constraint — and treats the bidirectional
nature of the correspondence as implicit.

This PR opens the page with a precise formal definition, names the two
required components explicitly, and reorganizes around that framing.

The source of the formal definition is
datajoint/datajoint-book,
where the 1:1 bidirectional framing and the external-process requirement are
worked out at textbook length (book/20-concepts/04-integrity.md and
book/30-design/018-primary-key.md). This PR brings that precision into the
official docs in platform-doc voice.

What changes

  • New opening: "Entity integrity is the guarantee of a one-to-one
    correspondence between real-world entities and their representations in
    the database."
    Followed by the two bidirectional bullets and an explicit
    statement that a primary-key constraint alone is not sufficient.

  • New "Two required components" section (table format): external
    identification process + database uniqueness constraint. Names what each
    does and where each lives.

  • Elevated "Explicit keys, no auto-increment" subsection under
    primary-key requirements, with the four reasons (identification before
    insertion, duplicate detection, composite keys, reproducibility). Was
    buried before; now stated as a direct consequence of the formal
    definition.

  • New "Partial entity integrity" section for applications that need only
    one direction of the 1:1 mapping (record→entity uniqueness only, or
    entity→record completeness only).

  • New "When no natural key exists" section covering the multi-step
    token-issuance pattern (generate, deliver, require, trust the external
    process).

  • New "What the database can and cannot do" table as the closing
    conceptual statement.

  • Tightened and reorganized the existing Schema Dimensions content
    (preserved with light edits — same diagrams, same examples).

  • Refreshed See-also links.

Net change: +240 / -161 lines on one file.

Voice and audience

The original page was already in the docs section; this revision keeps that
voice. Where the book uses textbook devices (admonitions, learning
objectives, dropdowns, tab-set SQL alongside DataJoint), this revision
prefers tight prose suitable for the dj-core reference docs.

…nents

Open the page with a precise formal definition: entity integrity is the
guarantee of a bidirectional one-to-one correspondence between real-world
entities and their database representations.

Make explicit that this requires TWO components, not one:
- A real-world identification process (external) — establishes the
  reliable association between physical entities and their identifiers.
- A database uniqueness constraint (internal) — enforces, via the
  primary key, that no two records share the same identifier.

Neither component alone is sufficient. The database can enforce
uniqueness but cannot create it; the real-world process is what links a
physical entity to its identifier in the first place.

Additional structural improvements:
- Elevate the "no auto-increment / explicit keys" rule from a buried
  paragraph to its own subsection under primary-key requirements, with
  the four reasons (identification before insertion, duplicate
  detection, composite keys, reproducibility).
- Add a "Partial entity integrity" section for applications that need
  only one direction of the 1:1 mapping (record-to-entity only, or
  entity-to-record only).
- Add a "When no natural key exists" section covering the
  multi-step token-issuance pattern.
- Add an explicit "What the database can and cannot do" table at the
  end of the conceptual material.
- Tighten and reorganize the existing Schema Dimensions content
  (preserved with light edits).
- Refresh See-also links to point to the related concept pages.

Source for the formal definition: datajoint/datajoint-book, where the
1:1 bidirectional framing and the external-process requirement are
worked out at textbook length (book/20-concepts/04-integrity.md and
book/30-design/018-primary-key.md). This PR brings that precision into
the official docs in platform-doc voice.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant