feat: anomaly detection engine — rolling baseline alerts for cost, tokens, errors by vivekchand · Pull Request #370 · vivekchand/clawmetry

vivekchand · 2026-03-26T17:31:34Z

Summary

Implements a proper rolling-baseline anomaly detection engine with 4 signal types, configurable thresholds, and a live dashboard panel.

What's new

Backend: `_get_anomaly_status()`

Computes 7-day rolling hourly baseline for each signal (not just daily)
4 signals with sensible default thresholds:
- cost_per_hour — 2x threshold
- token_velocity — 3x threshold
- error_rate — 2.5x threshold
- latency_p95 — 2.5x threshold (skipped if insufficient data)
Severity: ok / warning / high

Backend: `/api/alerts/anomaly-status` endpoint

Returns live signal status with current, baseline, ratio, threshold, and severity per signal.

`_budget_monitor_loop()` integration

Uses _get_anomaly_status() to fire _fire_alert() for any crossing signal
Dispatches cost_spike webhook for cost anomalies
Legacy daily anomaly check preserved for backward compat

Frontend: ⚡ Anomaly Status panel

Added in the System Health sidebar (overview tab):

One row per signal: icon + label + current vs baseline + ratio badge
Green/amber/red row background by severity
"All Clear" banner when no anomalies
Auto-refreshes every 60 seconds

Closes

closes #301

- Track per-turn model attribution in _compute_transcript_analytics() - Record model switching events with session context and timestamps - Add /api/usage/model-attribution endpoint returning: - Turn distribution by model with percentages - Model switching events (from/to, session, timestamp) - Per-session model breakdown with multi-model detection - Primary model identification - Add Model Attribution section to Tokens tab UI: - Horizontal bar chart showing turn share per model - Per-session models table (recent 30 sessions, multi-model flag) - Model switch events list with from→to arrows - Switch event count badge

- Add /api/cron/health endpoint: aggregates run history, success rate, sparkline data, and cost per job from local cron/runs/*.jsonl files - Add /api/cron/<job_id>/kill endpoint: kill switch that disables a cron job immediately via gateway or local file fallback - Add Cron Health Monitor panel in dashboard UI: - Per-job success rate % over last 7 days (color-coded) - 7-day sparkline bar chart (green=ok, red=fail) - Run count, duration, total cost columns - Collapsible run history table per job - Kill switch button to disable a running/scheduled job - Health Monitor button in crons tab refresh bar

…kens, errors - Add _get_anomaly_status(): computes 7-day rolling hourly baseline for 4 signals: cost_per_hour (2x threshold), token_velocity (3x), error_rate (2.5x), latency_p95 (2.5x) - Add /api/alerts/anomaly-status endpoint returning severity ok/warning/high per signal - Hook rolling-baseline detector into _budget_monitor_loop() to fire alerts automatically - Keep legacy daily anomaly check for backward compat - Add '⚡ Anomaly Status' panel in System Health section (overview tab) - One row per signal: icon + label + current vs baseline + ratio badge - Green/amber/red row background by severity; 'All Clear' state when ok - startAnomalyStatusRefresh() polls every 60s closes #301

vivekchand added 3 commits March 23, 2026 09:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: anomaly detection engine — rolling baseline alerts for cost, tokens, errors#370

feat: anomaly detection engine — rolling baseline alerts for cost, tokens, errors#370
vivekchand wants to merge 3 commits intomainfrom
feat/anomaly-detection

vivekchand commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vivekchand commented Mar 26, 2026

Summary

What's new

Backend: _get_anomaly_status()

Backend: /api/alerts/anomaly-status endpoint

_budget_monitor_loop() integration

Frontend: ⚡ Anomaly Status panel

Closes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Backend: `_get_anomaly_status()`

Backend: `/api/alerts/anomaly-status` endpoint

`_budget_monitor_loop()` integration