Expand Relational Workflow Model concept page by dimitri-yatsenko · Pull Request #184 · datajoint/datajoint-docs

dimitri-yatsenko · 2026-06-13T17:43:09Z

Context

The current Relational Workflow Model (RWM) concept page (src/explanation/relational-workflow-model.md) is understated relative to the model's significance. It reads as a brief positioning statement rather than as an entry point that lands the structural argument for a reader who already knows informatics (databases, FK graphs, ER modeling, workflow managers, lakehouses).

This PR expands the page to function as that entry point — the audience pictured is a knowledgeable peer (e.g., an infrastructure architect from pharma R&D evaluating where DataJoint sits in the landscape they already know).

Changes

Lead with the three-interpretations taxonomy (Codd / Chen / RWM) and the computational substrate framing from the DataJoint 2.0 preprint (Yatsenko & Nguyen, 2026, arXiv:2602.16585).
Name the surrounding tool categories explicitly and what each is silent on:
- File-based workflow systems (CWL, Snakemake, Nextflow) — fragment provenance across the filesystem
- Task orchestrators (Airflow, Argo, Prefect) — agnostic to data structure
- Data catalogs (DataHub, Atlan, Marquez) — describe data after it lands
- Lakehouses (Delta, Iceberg, Hudi) — treat computation as external
Add a worked-example pipeline (Mouse → Session → Scan → AverageFrame → Segmentation → Fluorescence, with SegmentationParam as Lookup) rendered as a mermaid diagram with tier-color classes.
Add a "deliberate trade-off" section that acknowledges the legitimate strengths of decoupled architectures and frames DataJoint's coupling as a chosen trade-off — directly drawn from the preprint Section 5.
Add a "substrate consequences" section that covers:
- Provenance and lineage as structural properties of the substrate (mapping to W3C PROV / OpenLineage is translation, not reconstruction)
- The five agent-substrate properties from the preprint: self-describing, safe by default, explicit dependencies, idempotent, observable
Preserve the existing detailed sections (table tiers, master-part, workflow normalization, entity integrity, query algebra with closure, transactions vs transformations) under a "Beneath the model" header for readers who want the structural detail.

Net change

+177 / -112 lines; one file.

Sources

Yatsenko & Nguyen, 2026 — DataJoint 2.0 whitepaper (computational substrate, four innovations, substrate properties for agents, deliberate-trade-off discussion)
Yatsenko et al., 2018 — original theoretical formalization (relational workflow model, query algebra)

Notes for reviewers

The mermaid diagram uses tier-color classDefs. If the docs site's mermaid theme overrides these, we may need to drop colors or adapt to the site theme.
Cross-references in See also all resolve against current src/explanation/ and src/how-to/ content.

The previous intro understated the model's significance. The expansion positions the RWM for an informatics-knowledgeable reader: - Lead with the three-interpretations taxonomy (Codd / Chen / RWM) and the computational-substrate framing from the DataJoint 2.0 preprint. - Name the surrounding tool categories explicitly (CWL/Snakemake/Nextflow, Airflow/Argo/Prefect, DataHub/Atlan/Marquez, Delta/Iceberg/Hudi) and what each is silent on. - Add a worked example pipeline (Mouse > Session > Scan > AverageFrame > Segmentation > Fluorescence, with SegmentationParam as Lookup) rendered as a mermaid diagram with tier colors. - Add a "deliberate trade-off" section addressing the legitimate strengths of decoupled architectures and why DataJoint accepts coupling. - Add a substrate-consequences section: provenance and lineage as structural properties (mapping to W3C PROV / OpenLineage is translation, not reconstruction), and the five agent-substrate properties (self-describing, safe by default, explicit dependencies, idempotent, observable) from the preprint. - Preserve the existing detailed sections (table tiers, master-part, normalization, entity integrity, query algebra, transactions vs transformations) under a "Beneath the model" header for readers who want the structural detail.

dimitri-yatsenko requested review from MilagrosMarin, esutlie and ttngu207 June 13, 2026 17:43

This was referenced Jun 13, 2026

Add two deeper concept pages: Schema as a Workflow Specification + Comparison to Workflow Languages #185

Open

WIP: trim deliberate-trade-off prose from RWM page (depends on #184 + #185) #186

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand Relational Workflow Model concept page#184

Expand Relational Workflow Model concept page#184
dimitri-yatsenko wants to merge 1 commit into
mainfrom
expand/relational-workflow-model-intro

dimitri-yatsenko commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dimitri-yatsenko commented Jun 13, 2026

Context

Changes

Net change

Sources

Notes for reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant