Add two deeper concept pages: Schema as a Workflow Specification + Comparison to Workflow Languages#185
Open
dimitri-yatsenko wants to merge 2 commits into
Open
Add two deeper concept pages: Schema as a Workflow Specification + Comparison to Workflow Languages#185dimitri-yatsenko wants to merge 2 commits into
dimitri-yatsenko wants to merge 2 commits into
Conversation
…mparison to Workflow Languages
Two new pages under Concepts > Data Model that follow from the
Relational Workflow Model overview and address the informed-reader
questions the overview page cannot answer in its scope:
1. Schema as a Workflow Specification
- Names the Relational Workflow Model as DataJoint's major innovation
- Describes the schema as a formal language: grammar (annotated DDL
excerpt for the Scan / AverageFrame / SegmentationParam /
Segmentation pipeline), typed semantics (three-condition existence
rule for a Computed row), the make() contract recording the git
hash of the producing code, the five-operator algebra with
closure, the type system, populate() as the self-healing engine
that brings the world into compliance with the schema, and
machine-readability / export pathways (DOT, Mermaid, YAML, JSON,
W3C PROV, OpenLineage, PROV-O, workflow-language conversion).
- Closes with the schema-as-control-plane framing (parallel to
routing tables in a network control plane).
2. Comparison to Workflow Languages
- Fair, structural comparison against CWL, Snakemake, Nextflow
(file-based workflows) and Airflow, Argo, Prefect, Dagster (task
orchestrators). Adjacent categories (data catalogs, lakehouses)
noted but flagged as solving different problems.
- Side-by-side table across nine concerns (data structure, types,
FK integrity, computation, execution order, provenance, drift
detection, query interface, retry semantics).
- What workflow languages offer, what they omit, DataJoint's
deliberate trade-off (paraphrasing Section 5 of Yatsenko & Nguyen
2026).
- Convertibility: any CWL workflow translates mechanically to a
DataJoint schema and back, with the data-structure layer the
workflow language omits supplied on conversion. GATK WGS pipeline
used as the empirical reference.
- "When to choose what" guidance including the "use both" pattern
(DataJoint inside an Airflow / Argo / Prefect orchestration).
Nav: both pages inserted under Concepts > Data Model after Relational
Workflow Model and before Entity Integrity, in mkdocs.yaml.
…ines Cohesion pass after adding Schema as a Workflow Specification and Comparison to Workflow Languages: - Nav (mkdocs.yaml): move the two new pages to the end of the Data Model group so the progression reads paradigm > components > synthesis > comparison: Relational Workflow Model > Entity Integrity > Normalization > Computation Model > Schema as a Workflow Specification > Comparison to Workflow Languages. - Concepts index (explanation/index.md): add cards for both new pages. - FAQ (faq.md): the "Is DataJoint a Workflow Management System?" answer was duplicating the Comparison page; trim it to a two-paragraph pointer to the new page. - Data Pipelines (data-pipelines.md): the "Comparing Approaches" table was a mini version of the new Comparison page; trim to a short paragraph + pointer.
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
The Relational Workflow Model concept page (overview / paradigm) and the
component pages under Concepts > Data Model (Entity Integrity,
Normalization, Computation Model) leave two reader needs unmet:
reader asks for the grammar, the typed semantics, the algebra, and the
machine-readable surface — Hal Stern's question on the June 12 call:
"Python is not a formal spec — is there a grammar? Can it be published
as YAML? Is there an API set for it?"
already know? A fair structural comparison against CWL, Snakemake,
Nextflow, Airflow, Argo, Prefect, and Dagster — and guidance on when
each fits.
This PR adds two new pages that close those gaps and integrates them
with the existing concept set.
Changes
New pages
explanation/schema-as-workflow-specification.md(~1,150 words)and positions the schema as the formal language expressing it
SegmentationParam, Segmentation) showing the
---separator,->foreign keys, codec types, tier decoration
make()as a typed function, git-hash code provenance per rowpopulate()brings the worldinto compliance with the schema
OpenLineage, PROV-O, workflow-language conversion
observable (parallel to network routing tables)
explanation/comparison-to-workflow-languages.md(~870 words)Snakemake, Nextflow) and task orchestrators (Airflow, Argo, Prefect,
Dagster), with adjacent categories (data catalogs, lakehouses) noted
but separated
integrity, computation spec, execution order, provenance, drift
detection, query interface, retry/idempotence)
trade-off (paraphrased from Yatsenko & Nguyen 2026 Section 5)
DataJoint schema and back; DataJoint adds the data-structure layer
that workflow languages omit; GATK WGS example referenced
(DataJoint inside an Airflow / Argo / Prefect orchestration)
Integration with existing concept set
mkdocs.yaml): place the two new pages at the end of theData Model group so the progression reads
paradigm > components > synthesis > comparison:
RWM > Entity Integrity > Normalization > Computation Model >
Schema as a Workflow Specification > Comparison to Workflow Languages.
explanation/index.md): cards added forboth new pages.
faq.md): the "Is DataJoint a Workflow Management System?"answer overlapped substantively with the new Comparison page; trimmed
it to a two-paragraph pointer.
data-pipelines.md): the "Comparing Approaches"table was a mini-version of the new Comparison page; trimmed to a
short paragraph + pointer.
Merge order with PR #184
Both new pages cross-reference the expanded Relational Workflow Model
page from PR #184. Suggested merge order:
If merged in the opposite order, the new pages still resolve their links
correctly — the cross-references just read against the older, shorter
RWM page until #184 lands.