hack-ink · yvette-carlisle · Jun 27, 2026 · Jun 27, 2026 · Jun 27, 2026 · Jun 27, 2026
diff --git a/apps/elf-eval/fixtures/real_world_memory/production_ops/authority_plane_recovery_drill.json b/apps/elf-eval/fixtures/real_world_memory/production_ops/authority_plane_recovery_drill.json
diff --git a/apps/elf-eval/src/bin/real_world_job_benchmark.rs b/apps/elf-eval/src/bin/real_world_job_benchmark.rs
diff --git a/apps/elf-eval/tests/real_world_job_benchmark.rs b/apps/elf-eval/tests/real_world_job_benchmark.rs
diff --git a/docs/evidence/2026-06-27-authority-recovery-drill-drift-audit.md b/docs/evidence/2026-06-27-authority-recovery-drill-drift-audit.md
@@ -0,0 +1,94 @@
+---
+type: Drift Audit
+title: "Authority Recovery Drill Drift Audit"
+description: "Drift audit for production-ops authority recovery drill benchmark artifacts and reports."
+resource: docs/evidence/2026-06-27-authority-recovery-drill-drift-audit.md
+status: active
+authority: evidence
+owner: docs
+last_verified: 2026-06-27
+tags:
+  - docs
+  - evidence
+  - benchmarking
+  - production-ops
+source_refs:
+  - https://linear.app/hackink/issue/XY-1119
+code_refs:
+  - apps/elf-eval/src/bin/real_world_job_benchmark.rs
+  - apps/elf-eval/fixtures/real_world_memory/production_ops/authority_plane_recovery_drill.json
+  - docs/spec/real_world_agent_memory_benchmark_v1.md
+  - docs/runbook/benchmarking/real_world_agent_memory_benchmark.md
+related:
+  - docs/spec/real_world_agent_memory_benchmark_v1.md
+  - docs/runbook/benchmarking/real_world_agent_memory_benchmark.md
+drift_watch:
+  - apps/elf-eval/src/bin/real_world_job_benchmark.rs
+  - apps/elf-eval/fixtures/real_world_memory/production_ops/
+  - docs/spec/real_world_agent_memory_benchmark_v1.md
+---
+# Authority Recovery Drill Drift Audit
+
+Purpose: Anchor the production-ops authority recovery drill report contract to the
+runner, fixture, and documentation surfaces.
+Read this when: You need evidence for backup/PITR, idempotent outbox replay, Qdrant
+rebuild completeness, degraded read, migration repair, dead-letter handling, and
+RPO/RTO reporting in the real-world memory benchmark.
+Not this document: Live production restore proof, private-corpus quality, hosted HA,
+or multi-region failover evidence.
+
+## Watched Claims
+
+- `elf.authority_recovery_drill/v1` is a benchmark artifact under
+  `adapter_response.answer.recovery_drills[]`.
+- The runner validates drill topology, failure injections, backup/PITR restored
+  evidence, degraded-read labels with visible source-of-truth records, RPO/RTO
+  measurements that meet targets, matching authority record counts for source,
+  journal, memory, knowledge, proposal, trace, and audit planes, preserved source
+  refs and lifecycle history, idempotent outbox replay without duplicate writes,
+  Qdrant rebuild completeness without missing vectors or errors, applied migration
+  repair, and dead-letter handling.
+- Reports expose those drill counts through
+  `operational_evidence.authority_recovery`, including backup/PITR restored,
+  record-count preservation, and predicate-gated drill pass counters.
+- The checked-in fixture is local synthetic evidence only. It does not prove private
+  corpus quality, provider-backed behavior, hosted HA, standby failover, or
+  multi-region SLA.
+
+## Evidence Anchors
+
+- `apps/elf-eval/src/bin/real_world_job_benchmark.rs` defines and validates
+  `AuthorityRecoveryDrillArtifact` and aggregates
+  `OperationalAuthorityRecoveryReport`.
+- `apps/elf-eval/fixtures/real_world_memory/production_ops/authority_plane_recovery_drill.json`
+  encodes one production-ops job with topology, degraded-read labels, RPO/RTO,
+  matching before/after authority record counts, replay, rebuild, migration repair,
+  and dead-letter evidence.
+- `docs/spec/real_world_agent_memory_benchmark_v1.md` defines the artifact schema and
+  production-ops/report semantics.
+- `docs/runbook/benchmarking/real_world_agent_memory_benchmark.md` routes operators to
+  the production-ops command and describes the authority recovery drill coverage.
+
+## Reverse Checks
+
+- Run `cargo make real-world-memory-production-ops` to parse the fixture and render
+  the production-ops report.
+- Run `cargo make check-docs` after docs changes.
+
+## Verdict
+
+pass
+
+## Required Updates
+
+- If recovery drill fields change, update the runner structs, fixture, benchmark
+  spec, runbook, and this audit together.
+- If a live Docker recovery drill is added later, preserve the fixture/local evidence
+  boundary and add separate live evidence instead of reclassifying this fixture.
+
+## Citations
+
+- `apps/elf-eval/src/bin/real_world_job_benchmark.rs`
+- `apps/elf-eval/fixtures/real_world_memory/production_ops/authority_plane_recovery_drill.json`
+- `docs/spec/real_world_agent_memory_benchmark_v1.md`
+- `docs/runbook/benchmarking/real_world_agent_memory_benchmark.md`
diff --git a/docs/evidence/index.md b/docs/evidence/index.md
@@ -27,5 +27,7 @@ Routes to: Drift audits and evidence concepts under `docs/evidence/`.
   suppression boundaries.
 - `2026-06-27-work-journal-drift-audit.md`: Drift audit for Work Journal
   source-adjacent capture, readback, redaction, and promotion-boundary behavior.
+- `2026-06-27-authority-recovery-drill-drift-audit.md`: Drift audit for
+  production-ops authority recovery drill benchmark artifacts and reports.
 - `external_memory_pattern_radar_latest.md`: Latest weekly external memory pattern
   radar summary.
diff --git a/docs/log.md b/docs/log.md
@@ -140,3 +140,7 @@ logs.
   Work Journal oracle fields, report rates, and hard-fail counters for redaction,
   rejected-option, inferred-step, journal-authority, and janitor false-promotion
   boundaries.
+- Added the XY-1119 authority recovery drill production-ops slice, defining
+  `elf.authority_recovery_drill/v1` report artifacts, validating topology, degraded
+  reads, RPO/RTO, authority record counts, idempotent outbox replay, Qdrant rebuild,
+  migration repair, and dead-letter handling, and linking the drift audit.
diff --git a/docs/runbook/benchmarking/real_world_agent_memory_benchmark.md b/docs/runbook/benchmarking/real_world_agent_memory_benchmark.md
@@ -6,7 +6,7 @@ resource: docs/runbook/benchmarking/real_world_agent_memory_benchmark.md
 status: active
 authority: procedural
 owner: runbook
-last_verified: 2026-06-23
+last_verified: 2026-06-27
 tags:
   - docs
   - runbook
@@ -192,10 +192,12 @@ including the retrieval-quality slice below. The suite currently encodes:
   source-id preservation, evidence binding, no secret leakage, and fixture-backed
   capture/integration boundary classification.
 - `production_ops`: interrupted generated backfill resume, backup/restore plus
-  cold-start readback, resource-envelope interpretation, public-proxy
-  production-private addendum readback, pinned OpenViking local embedding
-  runtime/wrong-result classification, missing private manifest `blocked`
-  classification, and provider credential boundary `blocked` classification.
+  cold-start readback, recoverable authority-plane drill evidence over source,
+  journal, memory, knowledge, proposal, trace, and audit records,
+  resource-envelope interpretation, public-proxy production-private addendum readback,
+  pinned OpenViking local embedding runtime/wrong-result classification, missing
+  private manifest `blocked` classification, and provider credential boundary
+  `blocked` classification.
 - `personalization`: scoped stable preference correction without temporary or
   cross-project preference leakage.
 - `core_archival_memory`: core block attachment, scope, provenance, stale-core
@@ -705,10 +707,24 @@ The production-ops fixtures live under
 `apps/elf-eval/fixtures/real_world_memory/production_ops/`. They encode user-job
 readback over existing public benchmark and restore evidence: interrupted backfill
 resume from checkpoint, clean-run comparison, backup/restore readback, Qdrant rebuild
-from Postgres-held vectors, cold-start search recovery, and resource-envelope
-interpretation. The P4 slice also encodes the operator-approved public-proxy
-production-private addendum and emits `elf.operational_evidence_gates/v1` so local
-fixture, public-proxy, private-corpus, and provider-backed evidence remain separate.
+from Postgres-held vectors, cold-start search recovery, recoverable authority-plane
+drills, and resource-envelope interpretation. Authority recovery drills use
+`elf.authority_recovery_drill/v1` under `adapter_response.answer.recovery_drills[]`
+to report topology, failure injection, backup/PITR, degraded-read labels, RPO/RTO
+targets and measurements, matching before/after authority record counts, idempotent
+outbox replay, Qdrant rebuild completeness, migration repair, and dead-letter
+handling. The runner fails drills whose predicates are false: backup/PITR must be
+restored, source-of-truth records must stay visible during degraded reads, RPO/RTO
+measurements must meet targets, authority counts/source refs/lifecycle history must
+be preserved, outbox replay must be idempotent without duplicate writes, Qdrant
+rebuilds must complete without missing vectors or errors, migration repair must be
+applied, and dead-letter rows must be handled. The generated
+`operational_evidence.authority_recovery` report includes backup/PITR restored,
+record-count preservation, and per-predicate recovery counters; drill pass counts
+require both a passing job and successful recovery predicates. The P4 slice also
+encodes the operator-approved public-proxy production-private addendum and emits
+`elf.operational_evidence_gates/v1` so local fixture, public-proxy, private-corpus,
+and provider-backed evidence remain separate.
 
 The same slice deliberately keeps non-pass boundaries typed. A missing private
 production manifest is `blocked`, unavailable provider credentials are `blocked`, and

diff --git a/docs/spec/real_world_agent_memory_benchmark_v1.md b/docs/spec/real_world_agent_memory_benchmark_v1.md
@@ -6,15 +6,15 @@ resource: docs/spec/real_world_agent_memory_benchmark_v1.md
 status: active
 authority: normative
 owner: spec
-last_verified: 2026-06-23
+last_verified: 2026-06-27
 tags:
   - docs
   - spec
 source_refs: []
 code_refs:
   - Makefile.toml
   - apps/elf-eval/src/bin/real_world_job_benchmark.rs
-  - apps/elf-eval/fixtures/real_world_memory/
+  - apps/elf-eval/fixtures/real_world_memory/production_ops/authority_plane_recovery_drill.json
 related: []
 drift_watch:
   - docs/spec/real_world_agent_memory_benchmark_v1.md
@@ -451,6 +451,40 @@ untraced section count. Rebuild results are acceptable only when repeated output
 deterministic enough for regression comparison or every allowed variance is explicitly
 reported.
 
+### Optional `adapter_response.answer.recovery_drills`
+
+Production-ops fixtures MAY include authority recovery drill artifacts in
+`corpus.adapter_response.answer.recovery_drills[]`. These artifacts use schema
+`elf.authority_recovery_drill/v1` and are fixture/report evidence, not proof of a
+multi-region or hosted HA topology.
+
+Each recovery drill MUST include:
+
+- `drill_id`, `contract_schema`, and `generated_at`;
+- `topology` with the authority store, derived indexes, adapters, and failover
+  boundary;
+- one or more `failure_injections` with target, fault, timestamps, and evidence refs;
+- `backup_pitr` with backup reference, PITR target, `restored = true`, and evidence
+  refs;
+- `degraded_read` with unavailable derived indexes or adapters labeled separately
+  from visible source-of-truth records, and `source_of_truth_visible = true`;
+- `rpo` and `rto` targets and measured seconds with evidence refs, where measured
+  seconds are less than or equal to the target seconds;
+- `authority_record_counts` for `source`, `journal`, `memory`, `knowledge`,
+  `proposal`, `trace`, and `audit`, including matching before/after counts plus
+  `source_refs_preserved = true` and `lifecycle_history_preserved = true`;
+- `outbox_replay` with `idempotent = true`, zero duplicate writes, and evidence refs;
+- `qdrant_rebuild` with `complete = true`, zero missing vectors, zero errors, and
+  evidence refs;
+- `migration_repair` with `applied = true` and evidence refs;
+- `dead_letter` with handled count greater than or equal to dead-letter count and
+  evidence refs.
+
+A recovery drill MUST NOT claim failover unless a standby or replacement authority
+service is actually part of the topology. Qdrant and document indexes remain derived
+and rebuildable; degraded read must label unavailable derived indexes or adapters
+without hiding Postgres source-of-truth records.
+
 ### `negative_traps`
 
 Negative traps MUST be explicit so systems are tested against realistic memory failure
@@ -638,7 +672,7 @@ Suite ids are stable public names. Each suite MUST contain at least one
 | `source_library` | Preserve long-form source records and citable excerpts without silently promoting them to memory. | Capture a long document; hydrate a source_ref excerpt; preserve a social/thread source boundary. | Source ids, canonical source metadata, source_ref hydration pointers, verified excerpts, explicit no-autopromotion boundary. | answer_correctness, evidence_grounding, lifecycle_behavior, trap_avoidance. | PageIndex, ELF. |
 | `operator_debugging_ux` | Show whether a wrong or ambiguous memory result can be debugged without raw store spelunking. | Explain why a result ranked first; inspect a trace; identify which stage dropped expected evidence. | Trace bundle, retrieval trajectory, candidate metrics, viewer or CLI readback. | debuggability, evidence_grounding, workflow_helpfulness, answer_correctness. | claude-mem, qmd, agentmemory, ELF. |
 | `capture_integration` | Evaluate how accurately work observations become usable memory across agents and tools. | Capture a session decision; exclude private spans; import external agent observations. | Hook/import logs, write policy audits, excluded spans, resulting note ids. | answer_correctness, evidence_grounding, trap_avoidance, lifecycle_behavior. | agentmemory, claude-mem, memsearch, mem0. |
-| `production_ops` | Prove safe operation under backup, restore, backfill, cold start, resource, and credential boundaries. | Resume interrupted import; restore from backup; report missing private manifest as bounded caveat. | Command/report artifacts, resource envelope, checkpoint state, failure guard evidence. | lifecycle_behavior, latency_resource, uncertainty_handling, evidence_grounding. | ELF, qmd, memsearch, LangGraph. |
+| `production_ops` | Prove safe operation under backup, restore, backfill, cold start, authority recovery, resource, and credential boundaries. | Resume interrupted import; restore from backup; report missing private manifest as bounded caveat; report authority-plane degraded read and replay drills. | Command/report artifacts, resource envelope, checkpoint state, failure guard evidence, authority record counts, RPO/RTO measurements, degraded-read labels. | lifecycle_behavior, latency_resource, uncertainty_handling, evidence_grounding. | ELF, qmd, memsearch, LangGraph. |
 | `personalization` | Apply user/project preferences correctly without leaking across scopes or overfitting stale preferences. | Remember preferred response style; avoid using another project tenant's note; update a preference. | Scoped memory ids, preference versions, tenant/project/agent context, negative cross-scope traps. | personalization_fit, trap_avoidance, evidence_grounding, answer_correctness. | mem0, Letta, agentmemory, ELF. |
 | `core_archival_memory` | Verify always-loaded core memory behavior separately from archival note search and derived retrieval indexes. | Read an attached core block; enforce core block scope; detect stale core state from archival evidence; fall back to archival notes; recover a decision from core routing plus archival rationale. | Core block ids, attachment ids, read_profile/scope metadata, source_ref and audit history, archival note evidence ids, stale-core traps, and explicit no-Qdrant-core-block boundary evidence. | answer_correctness, evidence_grounding, trap_avoidance, lifecycle_behavior, workflow_helpfulness. | Letta, ELF. |
 | `context_trajectory` | Measure staged context trajectory, hierarchy selection, and recursive/context expansion without converting setup or retrieval preconditions into trajectory wins. | Explain whether a staged trajectory can be scored; identify selected hierarchy nodes; report recursive expansion paths and pruned branches. | Same-corpus expected evidence ids, matched/missing evidence ids, stage artifacts, selected hierarchy nodes, rejected siblings or decoys, expansion paths, pruned branches, comparable ELF trace/session artifacts when a comparison is claimed. | answer_correctness, evidence_grounding, trap_avoidance, debuggability, workflow_helpfulness. | OpenViking, ELF, qmd. |
@@ -690,10 +724,16 @@ Reports MUST include:
   separating `local_fixture`, `public_proxy`, `private_corpus`, and
   `provider_backed` tiers. The gates MUST report tier status, job counts, pass and
   typed non-pass counts, mean latency, cost summary, resource-envelope counts,
-  cold-start/restore/Qdrant-rebuild counts, typed blocker reasons, and explicit
-  booleans for whether private-corpus or provider-backed pass claims are allowed.
-  Local fixture and public-proxy passes MUST NOT satisfy private-corpus or
-  provider-backed proof.
+  cold-start/restore/Qdrant-rebuild counts, authority recovery drill counts where a
+  pass requires the job to pass and every drill predicate above to succeed,
+  topology coverage, failure-injection counts, degraded-read label counts, visible
+  source-of-truth counts, backup/PITR restored counts, RPO/RTO target and met counts,
+  authority record-count preservation counts, source-ref and lifecycle preservation
+  counts, idempotent replay counts, complete Qdrant rebuild counts, migration repair
+  counts, dead-letter handling counts, typed blocker reasons, and explicit booleans
+  for whether private-corpus or provider-backed pass claims are allowed. Local
+  fixture and public-proxy passes MUST NOT satisfy private-corpus or provider-backed
+  proof.
 - run id, runner version, corpus profile, job ids, suite ids, project adapter metadata;
 - per-job status, normalized score, hard-fail hits, evidence ids used, trap ids used;
 - per-job `answer_type`, required caveat/refusal flags, and whether an unknown answer