feat(cutover-prep): reconciliation + dry-run + runbook + monitoring + T+90 mailout#53
Merged
Conversation
Adds three cutover-prep scripts in apps/api/scripts/: - reconcile.ts walks the public people sheet + private store and flags orphans (both directions), inconsistent newsletter state, and drained LegacyPasswordCredentials. --fix mode regenerates missing unsubscribe tokens and deletes drained credentials. Supersedes the narrower reconcile-private-store.ts (whose scope is fully absorbed here). - cutover-dry-run.ts orchestrates an end-to-end rehearsal: imports a mysqldump, compares per-table row counts vs. per-sheet imported counts, and optionally smokes a staging target (10 random Persons/Projects, legacy-id redirects, SAML metadata, OAuth start, health probes). - cutover-mailout.ts collects unclaimed Persons with valid emails and sends a single reminder via Resend. --dry-run is mandatory for CI. Tests cover each script against fixtures: orphan flagging, newsletter repair, drained-credential cleanup, fixture row-count parsing, recipient selection, and HTML escaping in the email body.
…ck plan Three new operational docs under docs/operations/: - cutover.md sequences the T-7 → T+180d timeline: announce + DNS TTL drop, staging rehearsal at T-3, production import at T-1, the T-0 cutover playbook with explicit point-of-no-return marker, the monitoring window at T+1h, reconciliation at T+7d, and the T+90 / T+180 closeout tasks. - cutover-announcement.md ships Slack + email templates for each step in the timeline, plus the maintenance-page HTML used during the DNS flip. - cutover-rollback.md spells out when to roll back, how (the four-step DNS-back / writes-back / scale-zero / Slack-notice sequence), and the much-uglier partial-write rollback for the case where the point-of-no-return has been crossed.
Used by tests/snapshot-workflow.test.ts to validate the snapshot workflow YAML. Pulling it in explicitly so the test doesn't rely on yaml being a transitive dep that could disappear in a future bump. Command: npm install --save-dev -w apps/api yaml
- .github/workflows/snapshot.yml runs apps/api/scripts/scrub-data.ts every Sunday at 03:00 UTC, force-pushes the anonymized result to codeforphilly-data-snapshot, and tags the run with snapshot-<year>-q<n>-scrubbed. Manual override via workflow_dispatch with an optional seed input. This closes out the "how it gets invoked in CI" piece that public-snapshot-scrub deferred. - apps/api/tests/snapshot-workflow.test.ts validates the workflow's YAML parses, the schedule cron is present, the scrub-data script is invoked, and the action versions match the rest of CI. - docs/operations/monitoring.md documents the four monitoring signals to wire pre-cutover (UptimeRobot liveness + readiness, log webhook to #alerts, push-daemon error pings) and what we deliberately don't monitor at v1.
…ntion - The script is now apps/api/scripts/reconcile.ts (broader scope absorbed from reconcile-private-store.ts by cutover-prep). - Prometheus is deferred — replace with the actual production alert path (Pino warn+ → log webhook → #alerts Slack), pointing at the new docs/operations/monitoring.md.
9 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Final plan in the modernization DAG — ships the playbook + tooling for the cutover event itself. Closes plans/cutover-prep.md.
apps/api/scripts/reconcile.ts(supersedes the narrowerreconcile-private-store.ts),cutover-dry-run.ts(end-to-end importer + count diff + smoke checks),cutover-mailout.ts(T+90 Resend bulk send,--dry-runmandatory for CI)docs/operations/cutover.md(T-7 → T+180d runbook),cutover-announcement.md(Slack + email templates),cutover-rollback.md(rollback plan),monitoring.md(UptimeRobot + log-webhook playbook).github/workflows/snapshot.ymlrunsscrub-dataweekly, force-pushes tocodeforphilly-data-snapshot, tagssnapshot-<year>-q<n>-scrubbed(closes out the deferral frompublic-snapshot-scrub)Test plan
npm run lintcleannpm run type-checkclean across api/web/sharednpm run buildcleannpm testclean (api 213 / web 30 / shared 52)apps/api/tests/reconcile.test.ts)apps/api/tests/cutover-dry-run.test.ts)--dry-runmode with the recipient-selection rules (apps/api/tests/cutover-mailout.test.ts)apps/api/tests/snapshot-workflow.test.ts)Cluster-dependent validation rolls up under a single follow-up issue chained off #36 (deploy cluster stand-up).