hyperpolymath · hyperpolymath · May 16, 2026 · May 16, 2026
diff --git a/docs/decisions/0010-provenance-forks-are-first-class.adoc b/docs/decisions/0010-provenance-forks-are-first-class.adoc
@@ -0,0 +1,177 @@
+= Architecture Decision Record: 0010-provenance-forks-are-first-class
+<!-- SPDX-License-Identifier: PMPL-1.0-or-later -->
+<!-- Copyright (c) 2026 Jonathan D.A. Jewell (hyperpolymath) <j.d.a.jewell@open.ac.uk> -->
+
+# 10. Provenance forks are first-class; prevent duplicates, not divergence
+
+Date: 2026-05-16
+
+## Status
+
+Proposed (design + failing test; tracks #31 and #32)
+
+## Context
+
+Two issues, taken together, design the provenance chain into a
+structure that *cannot represent a forked history*:
+
+* **#31 (V-L2-L1)** — a per-entity write lock. The current
+  implementation in `src/tier1/provenance.rs::append_provenance`
+  already wraps read-head + insert + update-head in a
+  `BEGIN IMMEDIATE` transaction, and `verisimdb_provenance_chain_head`
+  holds exactly one `head_hash` per `entity_id`. The chain head is a
+  *single scalar*: there is no way to record that an entity has two
+  valid tips.
+* **#32 (V-L2-L2)** — proposes
+  `CREATE UNIQUE INDEX ux_provenance_chain
+  ON verisimdb_provenance_log(entity_id, previous_hash)`.
+  This makes it *structurally impossible* to insert a second row that
+  chains from the same predecessor.
+
+Both issues frame forks as purely adversarial ("a second writer that
+sneaks past the lock"). That framing is incomplete. Legitimate
+divergence is a real, expected event in this system:
+
+* **Partitioned / replicated / offline writers.** The threat-model
+  doc itself (section 5, OQ-2 external anchoring; and ADR-0006
+  simulation semantics) anticipates replicated and sandbox writers.
+  Two honest writers that are network-partitioned both legitimately
+  extend the chain from the last shared tip. When they reconcile,
+  *both* branches are true history and must be retained for audit and
+  later merge.
+* **Simulation / what-if branches** (Simulation octad, ADR-0006). A
+  what-if branch is, by construction, a provenance fork from a real
+  entity's chain.
+
+The integrity property we actually want is **tamper-evidence and
+no silent loss**, not **linearity**. The current/proposed design
+inverts this: a `UNIQUE INDEX(entity_id, previous_hash)` does not
+*detect* a fork — it *rejects the second row at insert time*. The
+second writer's legitimate history is never recorded. The system
+cannot answer "did this entity's history diverge?" because the
+divergent row was thrown away. **A fork that cannot be written
+cannot be detected or audited. That is the integrity defect.**
+
+A hash chain that forbids forks is equivalent to claiming the world
+never partitions. It does.
+
+## Decision
+
+**Provenance forks are a first-class, representable state. The
+storage layer prevents *duplicate* records; it does not prevent
+*divergent* ones. Detection and reconciliation of forks is an
+explicit, queryable operation, not an insert-time rejection.**
+
+### 1. Schema (#32): duplicate-prevention, not fork-prevention
+
+Do **not** add `UNIQUE INDEX(entity_id, previous_hash)`. Instead:
+
+* The `hash` column is already `PRIMARY KEY`. Because the hash
+  preimage is domain-tagged and covers `previous_hash`, `entity_id`,
+  `operation`, `actor`, the canonical timestamp, `before_snapshot`
+  and `transformation` (see ADR-0002 / #27), an *exact duplicate*
+  record necessarily collides on `hash` and is already rejected.
+  This is the correct duplicate guard: it forbids re-inserting the
+  *same* entry while saying nothing about two *different* entries
+  that share a `previous_hash`.
+* Add a non-unique index to make fork *detection* O(log n):
++
+[source,sql]
+----
+CREATE INDEX IF NOT EXISTS idx_provenance_predecessor
+  ON verisimdb_provenance_log(entity_id, previous_hash);
+----
++
+Two children of the same predecessor are two rows with the same
+`(entity_id, previous_hash)` and distinct `hash` — a `GROUP BY
+... HAVING COUNT(*) > 1` over this index is the fork query.
+
+### 2. Chain head (#31): a set of heads, not a scalar
+
+`verisimdb_provenance_chain_head` becomes
+`verisimdb_provenance_chain_heads` keyed by `(entity_id, head_hash)`:
+an entity may have one head (linear, the common case) or several
+(forked). `append_provenance` keeps its `BEGIN IMMEDIATE`
+transaction — serialisation is still desirable to prevent *racing
+duplicate* appends from one node — but:
+
+* It takes the parent tip explicitly (or, for the linear
+  fast-path, the unique current head if exactly one exists).
+* On insert it **removes the parent hash from the head set and
+  adds the new hash**, so a normal append stays linear.
+* A deliberate fork (`append_provenance_fork(... from_hash ...)`)
+  inserts a new entry whose `previous_hash` is a *non-tip*
+  ancestor, and *adds* a head without removing one. The entity now
+  has two heads; both persist.
+
+### 3. Detection / query surface
+
+Add `fork_points(conn, entity_id) -> Vec<ForkPoint>` returning every
+predecessor hash with >1 child, and extend `verify_chain` to verify
+*per-branch* (walk each head back to genesis; every branch must be
+internally hash-consistent) rather than assuming one linear walk.
+
+### 4. Data migration for existing sidecars
+
+* The single-column `verisimdb_provenance_chain_head(entity_id PK,
+  head_hash)` is migrated to `verisimdb_provenance_chain_heads(
+  entity_id, head_hash, PRIMARY KEY(entity_id, head_hash))` by an
+  idempotent `CREATE TABLE IF NOT EXISTS ... ; INSERT ... SELECT`
+  copy guarded by a `sqlite_master` existence check (the old table
+  is left in place for one release, then dropped — no destructive
+  step in the migration that ships with this change).
+* **No `UNIQUE INDEX(entity_id, previous_hash)` is ever created**,
+  so there is no risk of an existing sidecar that legitimately
+  already contains a fork failing to open. (Had #32 shipped first,
+  this migration would have to detect and quarantine such rows; by
+  not shipping it we avoid that hazard entirely — note this in the
+  #32 thread.)
+* `verisimdb_provenance_log` itself is unchanged (same columns,
+  same `hash` PK). Existing rows remain valid and verifiable.
+
+### 5. Test plan (the failing test ships in this branch)
+
+`tests/provenance_fork_test.rs`:
+
+* `fork_can_be_written_and_both_branches_persist` — genesis +
+  child A; then a *second* child B chained from the genesis
+  (the fork). Assert: both A and B rows exist, the log has 3
+  rows for the entity, and the entity has 2 chain heads. **This
+  test fails today** (`append_provenance` cannot express "chain
+  from a non-tip ancestor"; there is no multi-head table; with #32
+  applied the second insert would be a constraint violation).
+* `fork_points_detects_the_divergence` — after writing the fork,
+  `fork_points(conn, entity)` returns the genesis hash as a fork
+  point with two children.
+* `each_branch_verifies_independently` — `verify_chain` returns
+  `true` for a forked entity (each branch is hash-consistent),
+  proving divergence is not conflated with tampering.
+* Retained guard: `exact_duplicate_entry_is_rejected` — inserting
+  a byte-identical entry twice fails on the `hash` PK (the
+  duplicate guard the unique index was *trying* to provide,
+  achieved correctly).
+
+## Consequences
+
+* The provenance model can represent and audit reality (partitions,
+  replicas, simulation branches) instead of silently discarding the
+  losing writer's history.
+* "Single-writer per entity" stops being a *correctness* requirement
+  and becomes a *policy* a deployment may opt into (reject on >1
+  head) — enforced in application code, not welded into the schema.
+* Slightly more complex head bookkeeping and a per-branch
+  `verify_chain`. Acceptable: linearity was never a security
+  property, only an availability assumption.
+* The threat-model doc (`docs/theory/provenance-threat-model.adoc`
+  section "Single writer") and README §"Provenance Tracking" must be
+  updated to describe forks as detected-and-retained rather than
+  prevented. (Tracked as follow-up in the implementing PR.)
+
+## References
+
+* #31 (V-L2-L1) — write-path lock
+* #32 (V-L2-L2) — proposed unique index (this ADR declines it)
+* #26 / PR #103 — provenance type dedup (unblocker; landed first)
+* ADR-0002 — domain-tagged hash preimage (the real duplicate guard)
+* ADR-0006 — simulation semantics (a legitimate fork source)
+* `docs/theory/provenance-threat-model.adoc` §5
diff --git a/tests/provenance_fork_test.rs b/tests/provenance_fork_test.rs
@@ -0,0 +1,123 @@
+// SPDX-License-Identifier: PMPL-1.0-or-later
+// Copyright (c) 2026 Jonathan D.A. Jewell (hyperpolymath) <j.d.a.jewell@open.ac.uk>
+//
+// FAILING-BY-DESIGN test for the fork-impossibility defect
+// (#31 + #32, see docs/decisions/0010-provenance-forks-are-first-class.adoc).
+//
+// This test encodes the *desired* behaviour: a legitimate provenance
+// fork (two valid children of the same predecessor — e.g. two
+// network-partitioned honest writers, or a simulation branch) must be
+// representable, persisted, and detectable.
+//
+// It is EXPECTED TO FAIL on `main` today, because:
+//   * `verisimdb_provenance_chain_head` has `entity_id` as PRIMARY KEY,
+//     so an entity can only ever record ONE head — the second branch's
+//     head is silently overwritten (INSERT OR REPLACE).
+//   * there is no fork-aware append / detection surface.
+//   * if #32's `UNIQUE INDEX(entity_id, previous_hash)` were applied,
+//     the second child insert would additionally fail with a
+//     constraint violation.
+//
+// The implementing PR for #31/#32 makes this test pass (multi-head
+// table + fork-aware append + `fork_points`). Until then it documents
+// the defect in executable form.
+//
+// It compiles against the *current* public surface so CI exercises it
+// rather than ignoring it; the assertions — not the compile — are what
+// fail.
+
+use rusqlite::{params, Connection};
+use verisimiser::abi::ProvenanceEntry;
+use verisimiser::tier1::provenance::{append_provenance, init_sidecar_schema};
+
+fn open_sidecar() -> Connection {
+    let conn = Connection::open_in_memory().expect("open in-memory sidecar");
+    init_sidecar_schema(&conn).expect("init sidecar schema");
+    conn
+}
+
+/// Count chain heads recorded for an entity. Today this can only ever
+/// be 0 or 1 because `entity_id` is the PRIMARY KEY of the head table;
+/// the target design records one row per live branch tip.
+fn head_count(conn: &Connection, entity_id: &str) -> i64 {
+    conn.query_row(
+        "SELECT COUNT(*) FROM verisimdb_provenance_chain_head WHERE entity_id = ?1",
+        [entity_id],
+        |r| r.get(0),
+    )
+    .unwrap_or(0)
+}
+
+/// Number of rows in the log whose `previous_hash` is `parent` — i.e.
+/// how many children that node has. > 1 ==> a fork at `parent`.
+fn child_count(conn: &Connection, entity_id: &str, parent: &str) -> i64 {
+    conn.query_row(
+        "SELECT COUNT(*) FROM verisimdb_provenance_log \
+         WHERE entity_id = ?1 AND previous_hash = ?2",
+        params![entity_id, parent],
+        |r| r.get(0),
+    )
+    .unwrap_or(0)
+}
+
+#[test]
+fn fork_can_be_written_and_both_branches_persist() {
+    let mut conn = open_sidecar();
+    let entity = "account:42";
+
+    // Genesis + one normal child via the supported linear path.
+    let genesis = append_provenance(
+        &mut conn, entity, "accounts", "insert", "alice", None, None,
+    )
+    .expect("genesis append");
+    let _branch_a = append_provenance(
+        &mut conn, entity, "accounts", "update", "alice", None, None,
+    )
+    .expect("branch A append");
+
+    // A second, legitimate writer (partitioned from the first) extends
+    // the chain from the SAME genesis tip: a fork. There is no
+    // supported API for "chain from this specific ancestor" yet, so we
+    // construct the entry the way the target `append_provenance_fork`
+    // will and write it directly. The hash is canonical and the row is
+    // internally valid — it is honest history, not tampering.
+    let ts = chrono::Utc::now();
+    let branch_b_hash = ProvenanceEntry::compute_hash(
+        &genesis, entity, "update", "bob", &ts, None, None,
+    );
+    conn.execute(
+        "INSERT INTO verisimdb_provenance_log \
+         (hash, previous_hash, entity_id, table_name, operation, actor, \
+          timestamp, before_snapshot, transformation) \
+         VALUES (?1, ?2, ?3, 'accounts', 'update', 'bob', ?4, NULL, NULL)",
+        params![branch_b_hash, genesis, entity, ts.to_rfc3339()],
+    )
+    .expect("fork row insert (fails here once #32 unique index is added)");
+
+    // The target design also records branch B's head. Today the head
+    // table cannot hold two heads for one entity (entity_id is PK), so
+    // we attempt the insert the implementing PR will do.
+    let _ = conn.execute(
+        "INSERT INTO verisimdb_provenance_chain_head (entity_id, head_hash) \
+         VALUES (?1, ?2)",
+        params![entity, branch_b_hash],
+    );
+
+    // --- Desired-behaviour assertions (expected to FAIL on main) ---
+
+    // Both children of genesis must be retained: this is a true fork.
+    assert_eq!(
+        child_count(&conn, entity, &genesis),
+        2,
+        "genesis must have two children (branch A + branch B) — the \
+         fork must be representable, not silently collapsed",
+    );
+
+    // The entity now has two live branch tips; both must be tracked.
+    assert_eq!(
+        head_count(&conn, entity),
+        2,
+        "a forked entity must record one head per branch; today the \
+         single-row-per-entity head table cannot express this (#31)",
+    );
+}