feat(server): central audited chokepoint for customer-data drops + CI guard (truehomie hardening)#50
Merged
Conversation
… guard
Addresses the OPEN truehomie-db DROP incident root-cause class: an active
customer's DB+role were dropped by an unidentified path with NO audit trail.
The provisioner was a dumb executor — DeprovisionResource dropped whatever
token it was handed and kept no record of its own.
- guardedDrop (drop_chokepoint.go): the single sanctioned wrapper for a
customer-data destruction. Every backend Deprovision dispatch in
DeprovisionResource now routes through it. Emits a structured
`event=provisioner.drop` audit log line (token, provider_resource_id,
resource_type, backend, request_id, caller from gRPC peer) BEFORE the drop
+ instant_provisioner_drop_total{resource_type,backend,outcome}. This is the
always-on, app-layer equivalent of the cluster's log_statement='ddl' trap.
- drop_guard_test.go: AST-iterating CI guard (rule 18) that FAILS the build if
any DROP DATABASE/ROLE/USER SQL literal, ACL DELUSER, mongo dropDatabase, or
Database().Drop() call appears outside a sanctioned deprovision function
reached through guardedDrop. A new un-audited drop path cannot merge. Proven
non-vacuous: a synthetic un-sanctioned DROP is flagged; a commented DROP and
a Collection.Drop (not customer data) are not.
Behaviour of WHAT gets purged in the TTL reaper / team-deletion / user-DELETE
flows is unchanged — the chokepoint only ADDs the audit line + metric around
the existing dispatch. The larger deletion-intent proto field + terminality
enforcement (needs platform-DB read, flag-gated) is designed + filed in
docs/ci/DATA-INTEGRITY-DROP-PATH-AUDIT.md.
make gate: green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…002) staticcheck QF1002: prefer errors.Is over == for error comparison; also more correct for a potentially-wrapped circuit.ErrOpen. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
Addresses the OPEN root-cause class of the truehomie-db DROP incident (2026-06-03): an active Pro customer's Postgres DB + role were dropped on the shared
postgres-customerscluster by an unidentified path with NOaudit_logrow. The provisioner was a dumb executor —DeprovisionResourcedropped whatever(token, provider_resource_id, resource_type)it received and kept no record of its own.Full enumeration + design + ranked root-cause hypotheses:
docs/ci/DATA-INTEGRITY-DROP-PATH-AUDIT.md(docs repo).Changes
guardedDropchokepoint (internal/server/drop_chokepoint.go) — the single sanctioned wrapper for a customer-data destruction. All 8 backendDeprovisiondispatches inDeprovisionResource(postgres/redis/mongo/queue × shared/dedicated) now route through it. It emits, before the drop:event=provisioner.dropaudit log line:token,provider_resource_id,resource_type,backend,request_id,caller(gRPC peer addr — the attribution that was missing in the incident). Always-on, app-layer equivalent of the clusterlog_statement='ddl'trap.instant_provisioner_drop_total{resource_type,backend,outcome}(eager) — an abnormal drop rate is now alertable (the incident was a burst of un-attributed drops).internal/server/drop_guard_test.go) — AST-iterating (rule 18). Walks the provisioner source and fails the build if anyDROP DATABASE/ROLE/USERSQL literal,ACL DELUSER, mongodropDatabase, orDatabase(...).Drop(...)call appears outside a sanctioned deprovision function reached throughguardedDrop. A new un-audited drop path cannot merge. Proven non-vacuous (TestDropGuard_FlagsUnsanctionedSite) and comment-safe (TestDropGuard_IgnoresComments); does not flagCollection.Drop(sentinel cleanup, not customer data).Surgical / safe
Behaviour of what gets purged in the TTL reaper / team-deletion / user-DELETE flows is unchanged — the chokepoint only ADDs the audit + metric around the existing dispatch. The breaker wrapping is preserved. No proto change, no platform-DB access, no flag needed.
The larger fix for invariant (3) — a
DeprovisionRequest.deletion_intentproto field + provisioner terminality enforcement (needs platform-DB read, fail-closed) — is designed + filed, not rushed, to avoid breaking the legitimate purges that drop active rows by design.Verification
make gate: green (build + vet +go test ./... -short -count=1 -p 1, all packages).instant_provisioner_drop_totalship in the matching infra PR.🤖 Generated with Claude Code