Skip to content

obs(orphan-db-sweep): alert + dashboard + catalog for audit-only orphan-DB sweep#68

Merged
mastermanas805 merged 1 commit into
masterfrom
obs/orphan-db-sweep-candidates
Jun 8, 2026
Merged

obs(orphan-db-sweep): alert + dashboard + catalog for audit-only orphan-DB sweep#68
mastermanas805 merged 1 commit into
masterfrom
obs/orphan-db-sweep-candidates

Conversation

@mastermanas805

Copy link
Copy Markdown
Member

Rule-25 observability for the two metrics introduced in worker #102 (the audit-only orphan-DB / redis-namespace sweep): instant_orphan_db_sweep_candidates_total{kind} + instant_orphan_db_sweep_candidates_current{kind}.

The sweep is flag-gated OFF (ORPHAN_DB_SWEEP_ENABLED), so the gauge stays 0 until enabled; these surface the ~25-orphan drain backlog so the candidate list can be reviewed before any destructive enablement. The destructive arm (ORPHAN_DB_SWEEP_DESTRUCTIVE_ENABLED) reclaims only via the audited provisioner DeprovisionResource chokepoint — never a raw DROP (truehomie, 2026-06-03).

What's here

  • k8s/prometheus-rules.yamlinstant-worker-orphan-db-sweep group; max(...candidates_current) by (kind) > 25 for 1h → P2 (warning). Only fires once the sweep flag is on.
  • newrelic/alerts/orphan-db-sweep-backlog.json — NRQL alert (max candidates_current, faceted by kind, ABOVE 25).
  • newrelic/dashboards/instanode-reliability.json — backlog-by-kind line tile.
  • observability/METRICS-CATALOG.md — 2 catalog rows.

Apply

infra has no auto-apply (rule 15) — operator applies prometheus-rules.yaml + imports the NR alert/dashboard. validate.yml (yamllint + kubeconform) is the gate; YAML/JSON validated locally.

Completes the rule-25 obligation for worker #102.

🤖 Generated with Claude Code

…an-DB sweep metrics

Rule-25 observability for worker #102's two metrics (instant_orphan_db_sweep_candidates_total{kind} + _current{kind}). The audit-only orphan-DB/redis-namespace sweep is flag-gated OFF, so the gauge stays 0 until enabled; these surface the ~25-orphan drain backlog so the candidate list can be reviewed before any destructive enablement (which routes only through the audited provisioner chokepoint — never a raw DROP, truehomie).

- k8s/prometheus-rules.yaml: instant-worker-orphan-db-sweep group, candidates_current > 25 for 1h -> P2.

- newrelic/alerts/orphan-db-sweep-backlog.json + dashboard tile + 2 METRICS-CATALOG rows.

infra has no auto-apply (rule 15) — operator applies prometheus-rules + imports the NR alert/dashboard.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mastermanas805 mastermanas805 merged commit 6a6064c into master Jun 8, 2026
3 checks passed
@mastermanas805 mastermanas805 deleted the obs/orphan-db-sweep-candidates branch June 8, 2026 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant