Skip to content

feat(server): central audited chokepoint for customer-data drops + CI guard (truehomie hardening)#50

Merged
mastermanas805 merged 2 commits into
masterfrom
feat/drop-chokepoint-audit-guard
Jun 6, 2026
Merged

feat(server): central audited chokepoint for customer-data drops + CI guard (truehomie hardening)#50
mastermanas805 merged 2 commits into
masterfrom
feat/drop-chokepoint-audit-guard

Conversation

@mastermanas805

Copy link
Copy Markdown
Member

What & why

Addresses the OPEN root-cause class of the truehomie-db DROP incident (2026-06-03): an active Pro customer's Postgres DB + role were dropped on the shared postgres-customers cluster by an unidentified path with NO audit_log row. The provisioner was a dumb executorDeprovisionResource dropped whatever (token, provider_resource_id, resource_type) it received and kept no record of its own.

Full enumeration + design + ranked root-cause hypotheses: docs/ci/DATA-INTEGRITY-DROP-PATH-AUDIT.md (docs repo).

Changes

  1. guardedDrop chokepoint (internal/server/drop_chokepoint.go) — the single sanctioned wrapper for a customer-data destruction. All 8 backend Deprovision dispatches in DeprovisionResource (postgres/redis/mongo/queue × shared/dedicated) now route through it. It emits, before the drop:
    • a structured event=provisioner.drop audit log line: token, provider_resource_id, resource_type, backend, request_id, caller (gRPC peer addr — the attribution that was missing in the incident). Always-on, app-layer equivalent of the cluster log_statement='ddl' trap.
    • instant_provisioner_drop_total{resource_type,backend,outcome} (eager) — an abnormal drop rate is now alertable (the incident was a burst of un-attributed drops).
  2. CI guard (internal/server/drop_guard_test.go) — AST-iterating (rule 18). Walks the provisioner source and fails the build if any DROP DATABASE/ROLE/USER SQL literal, ACL DELUSER, mongo dropDatabase, or Database(...).Drop(...) call appears outside a sanctioned deprovision function reached through guardedDrop. A new un-audited drop path cannot merge. Proven non-vacuous (TestDropGuard_FlagsUnsanctionedSite) and comment-safe (TestDropGuard_IgnoresComments); does not flag Collection.Drop (sentinel cleanup, not customer data).

Surgical / safe

Behaviour of what gets purged in the TTL reaper / team-deletion / user-DELETE flows is unchanged — the chokepoint only ADDs the audit + metric around the existing dispatch. The breaker wrapping is preserved. No proto change, no platform-DB access, no flag needed.

The larger fix for invariant (3) — a DeprovisionRequest.deletion_intent proto field + provisioner terminality enforcement (needs platform-DB read, fail-closed) — is designed + filed, not rushed, to avoid breaking the legitimate purges that drop active rows by design.

Verification

  • make gate: green (build + vet + go test ./... -short -count=1 -p 1, all packages).
  • New tests (9): chokepoint ok/error/breaker-open outcomes + metric assertions, caller attribution, and the CI guard (real-tree scan + synthetic positive + comment negative).
  • Rule 25: alert + dashboard tile + catalog row for instant_provisioner_drop_total ship in the matching infra PR.

🤖 Generated with Claude Code

… guard

Addresses the OPEN truehomie-db DROP incident root-cause class: an active
customer's DB+role were dropped by an unidentified path with NO audit trail.
The provisioner was a dumb executor — DeprovisionResource dropped whatever
token it was handed and kept no record of its own.

- guardedDrop (drop_chokepoint.go): the single sanctioned wrapper for a
  customer-data destruction. Every backend Deprovision dispatch in
  DeprovisionResource now routes through it. Emits a structured
  `event=provisioner.drop` audit log line (token, provider_resource_id,
  resource_type, backend, request_id, caller from gRPC peer) BEFORE the drop
  + instant_provisioner_drop_total{resource_type,backend,outcome}. This is the
  always-on, app-layer equivalent of the cluster's log_statement='ddl' trap.

- drop_guard_test.go: AST-iterating CI guard (rule 18) that FAILS the build if
  any DROP DATABASE/ROLE/USER SQL literal, ACL DELUSER, mongo dropDatabase, or
  Database().Drop() call appears outside a sanctioned deprovision function
  reached through guardedDrop. A new un-audited drop path cannot merge. Proven
  non-vacuous: a synthetic un-sanctioned DROP is flagged; a commented DROP and
  a Collection.Drop (not customer data) are not.

Behaviour of WHAT gets purged in the TTL reaper / team-deletion / user-DELETE
flows is unchanged — the chokepoint only ADDs the audit line + metric around
the existing dispatch. The larger deletion-intent proto field + terminality
enforcement (needs platform-DB read, flag-gated) is designed + filed in
docs/ci/DATA-INTEGRITY-DROP-PATH-AUDIT.md.

make gate: green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@mastermanas805 mastermanas805 enabled auto-merge (squash) June 6, 2026 13:37
…002)

staticcheck QF1002: prefer errors.Is over == for error comparison; also more
correct for a potentially-wrapped circuit.ErrOpen.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@mastermanas805 mastermanas805 merged commit 11475eb into master Jun 6, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant