Skip to content

feat(dropguard): name-convention guard on every customer-data drop + migration-safety CI gate (truehomie D3)#56

Merged
mastermanas805 merged 5 commits into
masterfrom
fix/d3-drop-path-audit-gate
Jun 10, 2026
Merged

feat(dropguard): name-convention guard on every customer-data drop + migration-safety CI gate (truehomie D3)#56
mastermanas805 merged 5 commits into
masterfrom
fix/d3-drop-path-audit-gate

Conversation

@mastermanas805

Copy link
Copy Markdown
Member

What

Task D3 (WAVE2-HANDOFF-2026-06-11) — layer 2 of the truehomie-db DROP hardening. Layer 1 (PR #50, guardedDrop) made every sanctioned drop auditable; this PR makes a mis-targeted drop unexecutable, and adds a migration-safety CI gate.

  1. internal/dropguard — validates the target identity of every customer-data destruction: naming token (chokepoint) + final db/user identifiers (SQL sites). Refuses empty/garbage tokens, SQL metacharacters, system databases (postgres, template0/1, instant_customers, instant_platform, mongo admin/local/config) and admin roles (instanode_admin, instant_cust, doadmin, redis default). Deliberately permissive on token shape (uuid, dashless hex, pool-*, e2e cohort tokens all pass) so no legitimate legacy deprovision wedges — this repo auto-deploys.
  2. Guard wiring (refusal = provisioner.drop.refused structured log + instant_provisioner_drop_total{outcome="refused"} + error to caller):
    • server.guardedDrop — poolident-resolved naming token validated BEFORE backend dispatch (covers postgres/redis/mongo/queue, shared + dedicated).
    • postgres local.Deprovision, cleanupProvisionPartial (rollback path that does NOT pass the chokepoint), dedicated.deprovisionLocal — FINAL constructed dbName/username validated right before DROP DATABASE/DROP USER.
    • mongo Deprovision — token guard before connect; per-candidate name guard (refused canonical = hard error, refused legacy candidate = skip).
    • redis local + dedicated — ACL DELUSER of any non-tenant-shaped username skipped (DELUSER default would brick the shared pod).
  3. Migration-safety CI gatescripts/check-migration-safety.sh (+ ci.yml step, copy-able to other repos): destructive DDL (DROP TABLE/DATABASE/SCHEMA/COLUMN/ROLE/USER, TRUNCATE, DELETE FROM without WHERE) in migrations/*.sql fails CI unless the file carries -- destructive: acknowledged <reason>. 6-fixture self-test runs first in CI.

Behavior change

NONE for legitimate flows — TestGuardedDrop_LegitimateForms_StillExecute asserts every real token shape still reaches the backend. Only invalid/system-targeted drops now fail (they previously executed).

Rule-17 coverage block

Symptom:        unidentified, non-audited DROP DATABASE/ROLE on postgres-customers (truehomie 2026-06-03, root cause OPEN)
Enumeration:    rg -in 'DROP DATABASE|DROP USER|DROP ROLE|DROP OWNED|dropdb|DropDatabase|dropUser|FLUSHDB|FLUSHALL|\.Drop\(|DELUSER' across provisioner + (read-only) api/worker/common @ origin/master
Sites found:    11 destructive sites in provisioner (pg local x2 incl. rollback, pg dedicated, pg k8s ns-delete, neon API, mongo db+user, redis DELUSER local+dedicated, redis SCAN+DEL, queue/storage ns-delete) + 2 in api (providers/db, providers/nosql — dev-only direct path, read-only repo, flagged in report)
Sites touched:  8 (all SQL/command drop sites in provisioner) + chokepoint covers the rest (k8s namespace deletes derive from the validated token); api sites NOT touched (read-only repo — escalated in task report)
Coverage test:  internal/server/drop_guard_test.go (AST guard, pre-existing) + dropguard 100% unit coverage + per-site refusal tests in postgres/mongo/redis/server packages
Live verified:  awaiting auto-deploy on merge; `/healthz` SHA check + `provisioner.drop` log line on next TTL reap will confirm (no destructive prod action taken)

Test evidence

  • make gate — green (build + vet + go test ./... -short -count=1)
  • golangci-lint run on changed packages — 0 issues
  • internal/dropguard — 100.0% statement coverage; all changed lines in other packages covered (verified via go tool cover against the diff; remaining uncovered ranges are pre-existing branches outside this diff)
  • mongo loop-branch tests verified against a live local mongo:7; redis tests run without a live server (unreachable-client + nil-client seams)
  • scripts/check-migration-safety.sh --self-test — 6/6 fixtures

🤖 Generated with Claude Code

…migration-safety CI gate (truehomie D3)

Layer 2 of the truehomie-db DROP hardening (layer 1 = the guardedDrop audit
chokepoint, PR #50). The chokepoint made every sanctioned drop AUDITABLE;
this makes a MIS-TARGETED drop UNEXECUTABLE:

- internal/dropguard: charset+denylist validation of naming tokens and final
  db/user identifiers. Refuses empty/garbage tokens, SQL metacharacters,
  system databases (postgres, template0/1, instant_customers,
  instant_platform, mongo admin/local/config) and admin roles
  (instanode_admin, instant_cust, doadmin, postgres, redis "default").
  Deliberately permissive on token SHAPE (uuid, dashless hex, pool-*, e2e
  cohorts all pass) so no legitimate legacy deprovision can wedge — this
  repo auto-deploys.
- server.guardedDrop: validates the poolident-resolved naming token BEFORE
  dispatch; refusal = error + `provisioner.drop.refused` log event +
  instant_provisioner_drop_total{outcome="refused"}.
- postgres local Deprovision + cleanupProvisionPartial (non-chokepoint
  rollback path), dedicated deprovisionLocal: validate the FINAL constructed
  dbName/username via validateDropTargets right before DROP.
- mongo Deprovision: token guard before connect + per-candidate name guard
  (refused canonical = hard error; refused legacy candidate = skip).
- redis local/dedicated DELUSER: skip any non-tenant-shaped username
  (ACL DELUSER "default" would brick the shared pod).
- scripts/check-migration-safety.sh + ci.yml step: destructive DDL
  (DROP TABLE/DATABASE/SCHEMA/COLUMN/ROLE/USER, TRUNCATE, DELETE FROM
  without WHERE) in migrations/*.sql fails CI unless the file carries
  `-- destructive: acknowledged <reason>`. Self-test fixture suite included.

No behavior change for legitimate deprovision flows (regression tests assert
uuid/dashless/pool/e2e token shapes still reach the backend). New refusal
branches are unit-tested in every package; dropguard itself is 100% covered.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@mastermanas805 mastermanas805 enabled auto-merge (squash) June 10, 2026 19:28
Manas Srivastava and others added 4 commits June 11, 2026 01:04
…atch

pool.deprovisionBacking is the ONE customer-infra drop dispatch that does not
pass through server.guardedDrop (no gRPC request). Give it the same contract:
dropguard naming-token validation (refusal = provisioner.drop.refused) and
the `provisioner.drop` audit event emitted BEFORE the backend executes, with
caller="pool_reaper" and the proto-style resource_type strings so one NR
query covers RPC drops and pool-reap drops alike. Without this, every pool
reap of a failed postgres item would page the new customer-db-destructive-ddl
NR alert as an unsanctioned DROP (and was itself an un-audited drop path).

Test fixture fix: fakeRows.Scan filled every string dest with "postgres",
which dropguard now (correctly) refuses as a pool naming token — fill
positionally so pool_token carries a valid pool-token shape.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…not postgres's 63

The live-cluster CI suite provisions tokens like tok<nano>_<TestName>
(>63 bytes with the usr_ prefix) — postgres truncates such identifiers
CONSISTENTLY on CREATE and DROP, so they round-trip fine in reality, and the
63-byte refusal wedged their Deprovision (4 coverage-job test failures).
Keep the cap purely as an absurdity bound at 128. Verified the four failing
live tests pass against a real postgres 16.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…v/provisioner into fix/d3-drop-path-audit-gate
@mastermanas805 mastermanas805 merged commit 33f41a2 into master Jun 10, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant