Follow-up from review of #53.
Main concern — PSI drifted flag fires on "moderate" drift.
In detect_drift (PSI branch):
drifted=psi >= PSI_STABLE, # PSI_STABLE = 0.10
Combined with _psi_severity, a PSI of e.g. 0.12 will be marked drifted=True with severity="moderate". Since report.any_drifted is what consumers will gate on (CI, alerts, etc.), this means "moderate" drift triggers alerts — which conflicts with the docstring's canonical guidance that >0.25 is the significant-drift threshold.
The existing tests pass either way because they only assert severity in {"moderate", "severe"} and any_drifted is True.
Proposal:
- Expose a configurable threshold parameter on
detect_drift, e.g. psi_drift_threshold: float = PSI_MODERATE.
- Default to
psi >= PSI_MODERATE so any_drifted only fires on significant drift, while still reporting severity for the moderate range.
- Update / add tests to lock in both branches (moderate-but-not-drifted, severe-and-drifted).
Smaller items (can fold into the same PR or split out):
- KS p-value with ties. The asymptotic Kolmogorov formula assumes continuous distributions; with heavy ties (e.g. constant-vs-constant or coarse discrete data) p-values can be biased. Add a docstring note or short comment in
kolmogorov_smirnov calling this out.
- Magic constant in KS severity.
severe = p_value < (ks_significance / 5) — promote 5 (or the resulting factor) to a named constant like KS_SEVERE_FACTOR = 5 with a brief justification, or expose it as a parameter.
- Double-sort micro-opt in KS.
kolmogorov_smirnov sorts ref and cur twice each (once via np.searchsorted(np.sort(ref), ...), once implicitly via combined). Hoist ref_sorted = np.sort(ref) / cur_sorted = np.sort(cur) once. Cosmetic only.
Follow-up from review of #53.
Main concern — PSI
driftedflag fires on "moderate" drift.In
detect_drift(PSI branch):Combined with
_psi_severity, a PSI of e.g. 0.12 will be markeddrifted=Truewithseverity="moderate". Sincereport.any_driftedis what consumers will gate on (CI, alerts, etc.), this means "moderate" drift triggers alerts — which conflicts with the docstring's canonical guidance that >0.25 is the significant-drift threshold.The existing tests pass either way because they only assert
severity in {"moderate", "severe"}andany_drifted is True.Proposal:
detect_drift, e.g.psi_drift_threshold: float = PSI_MODERATE.psi >= PSI_MODERATEsoany_driftedonly fires on significant drift, while still reportingseverityfor the moderate range.Smaller items (can fold into the same PR or split out):
kolmogorov_smirnovcalling this out.severe = p_value < (ks_significance / 5)— promote5(or the resulting factor) to a named constant likeKS_SEVERE_FACTOR = 5with a brief justification, or expose it as a parameter.kolmogorov_smirnovsortsrefandcurtwice each (once vianp.searchsorted(np.sort(ref), ...), once implicitly viacombined). Hoistref_sorted = np.sort(ref)/cur_sorted = np.sort(cur)once. Cosmetic only.