Skip to content

feat(cutover-prep): reconciliation + dry-run + runbook + monitoring + T+90 mailout#53

Merged
themightychris merged 8 commits into
mainfrom
feat/cutover-prep
May 17, 2026
Merged

feat(cutover-prep): reconciliation + dry-run + runbook + monitoring + T+90 mailout#53
themightychris merged 8 commits into
mainfrom
feat/cutover-prep

Conversation

@themightychris
Copy link
Copy Markdown
Member

Summary

Final plan in the modernization DAG — ships the playbook + tooling for the cutover event itself. Closes plans/cutover-prep.md.

  • Scriptsapps/api/scripts/reconcile.ts (supersedes the narrower reconcile-private-store.ts), cutover-dry-run.ts (end-to-end importer + count diff + smoke checks), cutover-mailout.ts (T+90 Resend bulk send, --dry-run mandatory for CI)
  • Operational docsdocs/operations/cutover.md (T-7 → T+180d runbook), cutover-announcement.md (Slack + email templates), cutover-rollback.md (rollback plan), monitoring.md (UptimeRobot + log-webhook playbook)
  • Snapshot CI workflow.github/workflows/snapshot.yml runs scrub-data weekly, force-pushes to codeforphilly-data-snapshot, tags snapshot-<year>-q<n>-scrubbed (closes out the deferral from public-snapshot-scrub)
  • Tests — three new test files cover orphan flagging, fixture row-counting, recipient selection, HTML escaping, and snapshot-workflow YAML validity

Test plan

  • npm run lint clean
  • npm run type-check clean across api/web/shared
  • npm run build clean
  • npm test clean (api 213 / web 30 / shared 52)
  • Reconcile script flags an orphan + fixes a missing newsletter token in a fixture (apps/api/tests/reconcile.test.ts)
  • Dry-run script runs against the laddr fixture with no errors and emits matching per-sheet count diffs (apps/api/tests/cutover-dry-run.test.ts)
  • T+90 mailout works in --dry-run mode with the recipient-selection rules (apps/api/tests/cutover-mailout.test.ts)
  • Snapshot workflow YAML is valid GHA syntax (apps/api/tests/snapshot-workflow.test.ts)
  • Dry-run runs end-to-end against staging — deferred (no staging cluster yet; see plan Notes)
  • 100 laddr URLs resolve via redirects — deferred (same)
  • SAML byte-for-byte continuity — deferred (same)
  • Rollback procedure rehearsed — deferred (same)
  • Post-cutover monitoring alarms verified — deferred (same)

Cluster-dependent validation rolls up under a single follow-up issue chained off #36 (deploy cluster stand-up).

Adds three cutover-prep scripts in apps/api/scripts/:

- reconcile.ts walks the public people sheet + private store and flags
  orphans (both directions), inconsistent newsletter state, and drained
  LegacyPasswordCredentials. --fix mode regenerates missing unsubscribe
  tokens and deletes drained credentials. Supersedes the narrower
  reconcile-private-store.ts (whose scope is fully absorbed here).
- cutover-dry-run.ts orchestrates an end-to-end rehearsal: imports a
  mysqldump, compares per-table row counts vs. per-sheet imported counts,
  and optionally smokes a staging target (10 random Persons/Projects,
  legacy-id redirects, SAML metadata, OAuth start, health probes).
- cutover-mailout.ts collects unclaimed Persons with valid emails and
  sends a single reminder via Resend. --dry-run is mandatory for CI.

Tests cover each script against fixtures: orphan flagging, newsletter
repair, drained-credential cleanup, fixture row-count parsing, recipient
selection, and HTML escaping in the email body.
…ck plan

Three new operational docs under docs/operations/:

- cutover.md sequences the T-7 → T+180d timeline: announce + DNS TTL
  drop, staging rehearsal at T-3, production import at T-1, the T-0
  cutover playbook with explicit point-of-no-return marker, the
  monitoring window at T+1h, reconciliation at T+7d, and the T+90 /
  T+180 closeout tasks.
- cutover-announcement.md ships Slack + email templates for each
  step in the timeline, plus the maintenance-page HTML used during
  the DNS flip.
- cutover-rollback.md spells out when to roll back, how (the four-step
  DNS-back / writes-back / scale-zero / Slack-notice sequence), and
  the much-uglier partial-write rollback for the case where the
  point-of-no-return has been crossed.
Used by tests/snapshot-workflow.test.ts to validate the snapshot
workflow YAML. Pulling it in explicitly so the test doesn't rely on
yaml being a transitive dep that could disappear in a future bump.

Command: npm install --save-dev -w apps/api yaml
- .github/workflows/snapshot.yml runs apps/api/scripts/scrub-data.ts
  every Sunday at 03:00 UTC, force-pushes the anonymized result to
  codeforphilly-data-snapshot, and tags the run with
  snapshot-<year>-q<n>-scrubbed. Manual override via workflow_dispatch
  with an optional seed input. This closes out the "how it gets
  invoked in CI" piece that public-snapshot-scrub deferred.
- apps/api/tests/snapshot-workflow.test.ts validates the workflow's
  YAML parses, the schedule cron is present, the scrub-data script is
  invoked, and the action versions match the rest of CI.
- docs/operations/monitoring.md documents the four monitoring signals
  to wire pre-cutover (UptimeRobot liveness + readiness, log webhook
  to #alerts, push-daemon error pings) and what we deliberately don't
  monitor at v1.
…ntion

- The script is now apps/api/scripts/reconcile.ts (broader scope absorbed
  from reconcile-private-store.ts by cutover-prep).
- Prometheus is deferred — replace with the actual production alert
  path (Pino warn+ → log webhook → #alerts Slack), pointing at the new
  docs/operations/monitoring.md.
@themightychris themightychris merged commit 1de6998 into main May 17, 2026
1 check passed
@themightychris themightychris deleted the feat/cutover-prep branch May 17, 2026 03:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant