Skip to content

feat(governance): add distributional drift detection (PSI, KS)#53

Open
Hopelynconsult wants to merge 2 commits into
developfrom
feature/governance-drift-detector
Open

feat(governance): add distributional drift detection (PSI, KS)#53
Hopelynconsult wants to merge 2 commits into
developfrom
feature/governance-drift-detector

Conversation

@Hopelynconsult
Copy link
Copy Markdown
Collaborator

Summary

  • Adds governance/drift_detector.py — distributional drift checks that compare recent prediction/input windows against a reference baseline.
  • Two non-parametric methods: Population Stability Index (PSI) and two-sample Kolmogorov-Smirnov.
  • Per-feature DriftResult rolled up into a DriftReport. Designed to plug into the prediction-history JSONL written by the anomaly detector (feat(governance): add anomaly detection for inference outputs #35).

Why this is distinct from the anomaly detector

The anomaly detector (#35) flags individual predictions whose features sit outside historical norms — point anomalies. A model can have zero point anomalies and still be silently drifting if the distribution of its predictions has shifted (e.g., post-monsoon Sentinel-2 statistics on a deforestation model). PSI/KS catch that distributional shift over a window.

What's in the PR

  • DriftResult (per-feature) and DriftReport (multi-feature) dataclasses with JSON serialisation, mirroring governance.calibration.CalibrationReport style.
  • population_stability_index() — bins on the reference's quantiles (canonical PSI), industry-standard severity bands (< 0.1 stable, 0.1–0.25 moderate, >= 0.25 severe). Constant-reference fallback to a single bin.
  • kolmogorov_smirnov() — supremum CDF gap with an asymptotic p-value computed from the standard Kolmogorov series. Avoids pulling scipy into evaluation-time deps.
  • detect_drift() — one-shot entrypoint that takes feature-name → values dicts for both windows and runs PSI or KS per feature.
  • write_drift_report() — persistence alongside model cards / calibration reports.
  • 13 unit tests: identical vs shifted distributions, both methods, per-feature severity isolation, constant-reference edge case, validation (non-finite, empty, feature mismatch, unknown method), JSON round-trip.

Plugging into the existing pipeline

The anomaly detector already writes a JSONL of per-prediction features (mean_confidence, std_confidence, positive_fraction, entropy). Wiring PSI on those four features over a rolling window is a 30-line script — left as a follow-up so this PR stays focused.

Follow-ups (out of scope here)

  • A scheduled CI job that reads the last N days of outputs/anomalies/history.jsonl, picks a baseline window, and emits a drift report.
  • A drift_score threshold added to scripts/governance_ci_gate.py so a release fails if any monitored feature is in the severe band.

Test plan

  • pytest tests/test_drift_detector.py -q → 13 passed
  • Reviewer: confirm the severity bands (PSI_STABLE=0.10, PSI_MODERATE=0.25, KS significance 0.05) are the right defaults for our use case before we wire them into the CI gate.

🤖 Generated with Claude Code

Complement to the per-point anomaly detector (#35): the anomaly detector
flags individual predictions whose features fall outside historical
norms; this module compares the *distribution* of recent predictions (or
inputs) against a reference baseline and flags drift even when no single
prediction is anomalous.

Two non-parametric tests:
- Population Stability Index over reference quantile bins. PSI < 0.1
  stable, 0.1-0.25 moderate, > 0.25 severe (industry-standard rule of
  thumb).
- Two-sample Kolmogorov-Smirnov, with the asymptotic p-value computed
  from the standard Kolmogorov series so we don't pull in scipy at
  evaluation time.

Both run per-feature; a DriftReport aggregates per-feature DriftResults
so callers (CI gate, monitoring dashboards) decide their own aggregation
policy. Designed to plug into the prediction-history JSONL emitted by
the anomaly detector so drift can run as a scheduled CI step over the
last N days of production predictions.

- DriftResult / DriftReport dataclasses with JSON serialisation
- detect_drift() one-shot entrypoint covering both methods
- write_drift_report() for persistence alongside model cards
- 13 tests covering identical/shifted distributions, both methods,
  per-feature severity, edge cases (constant reference, non-finite,
  empty windows), feature mismatch validation, and JSON round-trip
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants