feat(governance): add distributional drift detection (PSI, KS)#53
Open
Hopelynconsult wants to merge 2 commits into
Open
feat(governance): add distributional drift detection (PSI, KS)#53Hopelynconsult wants to merge 2 commits into
Hopelynconsult wants to merge 2 commits into
Conversation
Complement to the per-point anomaly detector (#35): the anomaly detector flags individual predictions whose features fall outside historical norms; this module compares the *distribution* of recent predictions (or inputs) against a reference baseline and flags drift even when no single prediction is anomalous. Two non-parametric tests: - Population Stability Index over reference quantile bins. PSI < 0.1 stable, 0.1-0.25 moderate, > 0.25 severe (industry-standard rule of thumb). - Two-sample Kolmogorov-Smirnov, with the asymptotic p-value computed from the standard Kolmogorov series so we don't pull in scipy at evaluation time. Both run per-feature; a DriftReport aggregates per-feature DriftResults so callers (CI gate, monitoring dashboards) decide their own aggregation policy. Designed to plug into the prediction-history JSONL emitted by the anomaly detector so drift can run as a scheduled CI step over the last N days of production predictions. - DriftResult / DriftReport dataclasses with JSON serialisation - detect_drift() one-shot entrypoint covering both methods - write_drift_report() for persistence alongside model cards - 13 tests covering identical/shifted distributions, both methods, per-feature severity, edge cases (constant reference, non-finite, empty windows), feature mismatch validation, and JSON round-trip
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
governance/drift_detector.py— distributional drift checks that compare recent prediction/input windows against a reference baseline.DriftResultrolled up into aDriftReport. Designed to plug into the prediction-history JSONL written by the anomaly detector (feat(governance): add anomaly detection for inference outputs #35).Why this is distinct from the anomaly detector
The anomaly detector (#35) flags individual predictions whose features sit outside historical norms — point anomalies. A model can have zero point anomalies and still be silently drifting if the distribution of its predictions has shifted (e.g., post-monsoon Sentinel-2 statistics on a deforestation model). PSI/KS catch that distributional shift over a window.
What's in the PR
DriftResult(per-feature) andDriftReport(multi-feature) dataclasses with JSON serialisation, mirroringgovernance.calibration.CalibrationReportstyle.population_stability_index()— bins on the reference's quantiles (canonical PSI), industry-standard severity bands (< 0.1stable,0.1–0.25moderate,>= 0.25severe). Constant-reference fallback to a single bin.kolmogorov_smirnov()— supremum CDF gap with an asymptotic p-value computed from the standard Kolmogorov series. Avoids pulling scipy into evaluation-time deps.detect_drift()— one-shot entrypoint that takes feature-name → values dicts for both windows and runs PSI or KS per feature.write_drift_report()— persistence alongside model cards / calibration reports.Plugging into the existing pipeline
The anomaly detector already writes a JSONL of per-prediction features (
mean_confidence,std_confidence,positive_fraction,entropy). Wiring PSI on those four features over a rolling window is a 30-line script — left as a follow-up so this PR stays focused.Follow-ups (out of scope here)
outputs/anomalies/history.jsonl, picks a baseline window, and emits a drift report.drift_scorethreshold added toscripts/governance_ci_gate.pyso a release fails if any monitored feature is in the severe band.Test plan
pytest tests/test_drift_detector.py -q→ 13 passedPSI_STABLE=0.10,PSI_MODERATE=0.25, KS significance0.05) are the right defaults for our use case before we wire them into the CI gate.🤖 Generated with Claude Code