feat(governance): add anomaly detection for inference outputs#35
Merged
Conversation
Hybrid detector that combines an Isolation Forest fitted on rolling prediction history with a statistical fallback (z-score + IQR fences) for the cold-start case. Persists feature history to JSONL and emits anomaly reports for human review.
1 task
femi23
approved these changes
May 5, 2026
Collaborator
femi23
left a comment
There was a problem hiding this comment.
Hybrid IF + statistical fallback with rolling-history persistence is the right shape — this fills the gap I left in #41 (alert_generator) where I wanted a smarter signal than 'value > threshold'. The feature vector (mean / std / positive_fraction / entropy) is small enough that fitting IF on 50 samples is cheap, which is what we need at our throughput.
Approving.
2 tasks
Hopelynconsult
added a commit
that referenced
this pull request
May 17, 2026
Complement to the per-point anomaly detector (#35): the anomaly detector flags individual predictions whose features fall outside historical norms; this module compares the *distribution* of recent predictions (or inputs) against a reference baseline and flags drift even when no single prediction is anomalous. Two non-parametric tests: - Population Stability Index over reference quantile bins. PSI < 0.1 stable, 0.1-0.25 moderate, > 0.25 severe (industry-standard rule of thumb). - Two-sample Kolmogorov-Smirnov, with the asymptotic p-value computed from the standard Kolmogorov series so we don't pull in scipy at evaluation time. Both run per-feature; a DriftReport aggregates per-feature DriftResults so callers (CI gate, monitoring dashboards) decide their own aggregation policy. Designed to plug into the prediction-history JSONL emitted by the anomaly detector so drift can run as a scheduled CI step over the last N days of production predictions. - DriftResult / DriftReport dataclasses with JSON serialisation - detect_drift() one-shot entrypoint covering both methods - write_drift_report() for persistence alongside model cards - 13 tests covering identical/shifted distributions, both methods, per-feature severity, edge cases (constant reference, non-finite, empty windows), feature mismatch validation, and JSON round-trip
Goldokpa
pushed a commit
that referenced
this pull request
May 17, 2026
Complement to the per-point anomaly detector (#35): the anomaly detector flags individual predictions whose features fall outside historical norms; this module compares the *distribution* of recent predictions (or inputs) against a reference baseline and flags drift even when no single prediction is anomalous. Two non-parametric tests: - Population Stability Index over reference quantile bins. PSI < 0.1 stable, 0.1-0.25 moderate, > 0.25 severe (industry-standard rule of thumb). - Two-sample Kolmogorov-Smirnov, with the asymptotic p-value computed from the standard Kolmogorov series so we don't pull in scipy at evaluation time. Both run per-feature; a DriftReport aggregates per-feature DriftResults so callers (CI gate, monitoring dashboards) decide their own aggregation policy. Designed to plug into the prediction-history JSONL emitted by the anomaly detector so drift can run as a scheduled CI step over the last N days of production predictions. - DriftResult / DriftReport dataclasses with JSON serialisation - detect_drift() one-shot entrypoint covering both methods - write_drift_report() for persistence alongside model cards - 13 tests covering identical/shifted distributions, both methods, per-feature severity, edge cases (constant reference, non-finite, empty windows), feature mismatch validation, and JSON round-trip
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
src/climatevision/governance/anomaly_detector.py— a hybrid detector that combines an Isolation Forest fitted on rolling prediction history with a statistical fallback (z-score + IQR fences) for the cold-start case before there is enough history to fit IF.write_anomaly_report) for human review of flagged predictions.governance/__init__.py.Why
Sprint deliverable: "Create anomaly detection for inference inputs and outputs — flag unusual predictions for human review." Pairs with the upcoming
/api/anomaliesendpoint and the audit-logger PR.Test plan
pytest tests/test_anomaly_detector.py— 6/6 pass locallymin_history_for_iforestNotes for reviewers
develop. No overlap with PR feat(governance): add SHAP explainability for segmentation predictions #29 (SHAP, merged) or feat(governance): add regional bias audit framework for model fairness #30 (bias audit, open).AnomalyDetector(contamination=...).