diff --git a/CHANGELOG.md b/CHANGELOG.md index 7333934be..f86bef163 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,9 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] -### Added — G5 citation-verifier observability T1+T2 (v6.8.6 / v6.8.7, 2026-05-12, PRs [#122](https://github.com/Number531/Legal-API/pull/122) + [#124](https://github.com/Number531/Legal-API/pull/124)) +### Added — G5 citation-verifier observability T1+T2 (v6.8.6 / v6.8.7 / v6.8.7.1, 2026-05-12, PRs [#122](https://github.com/Number531/Legal-API/pull/122) + [#124](https://github.com/Number531/Legal-API/pull/124) + [#127](https://github.com/Number531/Legal-API/pull/127)) -Two-tier observability remediation closing the regulator gap (T1) and ops/SLO gap (T2) on the G5 citation-verifier subagent. Built on the production-fidelity A/B baseline established the same day (Exa 96.8% / Anthropic 96.1%, PRs [#118](https://github.com/Number531/Legal-API/pull/118) + [#119](https://github.com/Number531/Legal-API/pull/119)). +Two-tier observability remediation closing the regulator gap (T1) and ops/SLO gap (T2) on the G5 citation-verifier subagent, plus a pre-deploy telemetry-alignment fix (v6.8.7.1) before the first deploy. Built on the production-fidelity A/B baseline established the same day (Exa 96.8% / Anthropic 96.1%, PRs [#118](https://github.com/Number531/Legal-API/pull/118) + [#119](https://github.com/Number531/Legal-API/pull/119)). **v6.8.6 T1 — regulator persistence (PR [#122](https://github.com/Number531/Legal-API/pull/122))**: - New `citation_verdicts` junction table (dual-path: `migrations/015_*.sql` + `CITATION_VERDICTS_DDL` in postgres.js + `ensureHookSchema()` call). FK ON DELETE CASCADE on reports + sessions; UNIQUE (report_id, footnote_id) for idempotent upsert; 3 indexes. @@ -19,18 +19,23 @@ Two-tier observability remediation closing the regulator gap (T1) and ops/SLO ga - `client-audit-export` skill ships `citation_verdicts__csv.gz` + `citation_verification_certificate__csv.gz` in the per-client regulator-handoff WORM bundle. **v6.8.7 T2 — telemetry + alerts (PR [#124](https://github.com/Number531/Legal-API/pull/124))**: -- 4 Prometheus series in sdkMetrics.js: `citation_verifier_confirmation_rate_pct` (Gauge, `mode` label), `citation_verifier_{confirmed,unconfirmed,errors}_total` (Counters). 13 series total; bounded enums prevent explosion. +- 4 Prometheus series in sdkMetrics.js: `claude_citation_verifier_confirmation_rate_pct` (Gauge, `mode` label), `claude_citation_verifier_{confirmed,unconfirmed,errors}_total` (Counters). 13 series total; bounded enums prevent explosion. *(Renamed from un-prefixed `citation_verifier_*` in v6.8.7.1; see below.)* - Recording in `hookDBBridge.persistState()` from `state_data.verification_results` (in-hand at SubagentStop; no race with T1's fire-and-forget INSERT). - Structured log `event=citation_verifier_completed` with counts/mode/duration/turns/tool-call counts. - 3 alert rules in `prometheus/alerts.yml`: rate <90% sustained 1h (WARN), <80% 30m (CRIT), error spike >50/15m (WARN). Thresholds calibrated against measured baseline + 7pp WARN margin. - Bundled fix: corrected pre-existing `access_log` SELECT in audit-report endpoint that queried non-existent columns (`actor`/`action`/`accessed_at`) — these had been silently failing via `.catch()` since Wave 3 shipped. -**Operator skills aligned (PR [#125](https://github.com/Number531/Legal-API/pull/125))**: -- `post-deploy-verify` Tier 2 — new V6 check probes `/metrics` for 4 new series + companion `v6-citation-verdicts-presence.sql`. -- `infrastructure-health` Tier 3 — `citation_verifier_*` PromQL guidance + new `references/citation-verifier-telemetry.md` runbook (PromQL dashboards, Cloud Logging schema, T1↔T2 reconciliation SQL, alert response by severity). +**v6.8.7.1 — telemetry alignment fix (PR [#127](https://github.com/Number531/Legal-API/pull/127))**: +- **Metric prefix correction** — renamed all 4 T2 series from un-prefixed `citation_verifier_*` to `claude_citation_verifier_*` to match the codebase-wide `claude_*` convention used by every other Prometheus series (`claude_exa_ab_*`, `claude_kg_build_*`, `claude_subagent_duration_ms`, `claude_hook_*`, `claude_tool_*`, etc.). PromQL filters scoped to `claude_*` would have missed the original names. Safe rename without dual-emit since PR #124 had not yet deployed to a live Prometheus TSDB. +- **OTel trace correlation** — `event=citation_verifier_completed` structured log now includes `otel_trace_id` via `sdkTracing.getActiveTraceId()`. Operators can now pivot from a Cloud Logging entry → Cloud Trace span by trace_id. Matches Exa A3 pattern (`event_data.exa_a3.otel_trace_id` from PR [#115](https://github.com/Number531/Legal-API/pull/115)). Helper returns `null` when OTel disabled — safe fallback. +- Functional smoke test confirmed all 4 renamed metrics expose correct HELP+TYPE+value lines with labels intact; V6 post-deploy grep pattern updated accordingly. + +**Operator skills aligned (PR [#125](https://github.com/Number531/Legal-API/pull/125), updated in PR [#127](https://github.com/Number531/Legal-API/pull/127))**: +- `post-deploy-verify` Tier 2 — new V6 check probes `/metrics` for 4 new series + companion `v6-citation-verdicts-presence.sql`. V6 grep pattern updated to `claude_citation_verifier_*` in PR #127. +- `infrastructure-health` Tier 3 — `claude_citation_verifier_*` PromQL guidance + new `references/citation-verifier-telemetry.md` runbook (PromQL dashboards, Cloud Logging schema with `otel_trace_id` pivot, T1↔T2 reconciliation SQL, alert response by severity). - `session-diagnostics` — new Section 11 (G5 forensics) with 5 forensic SQL queries (verdict distribution, state-vs-table reconciliation, per-batch failure pattern, citation→source provenance). -**Risk**: 3/10 combined (2/10 T1 + 1/10 T2). Pattern mirrors Wave 2 + Exa A3 telemetry — battle-tested. Independent review on both PRs surfaced zero blockers. +**Risk**: 3/10 combined (2/10 T1 + 1/10 T2 + 1/10 v6.8.7.1). Pattern mirrors Wave 2 + Exa A3 telemetry — battle-tested. Independent review on T1 and T2 PRs surfaced zero blockers; v6.8.7.1 verified by functional smoke test (register + emit + scrape) confirming all 4 renamed metrics expose correct HELP/TYPE/value lines. **Pre-existing bug surfaced (not in scope)**: schema-doc-validator caught that `source_chunk_embeddings` does NOT have a `metadata` JSONB column despite Wave 2 code at `hookDBBridge.js:454` assuming it does. The `.catch()` masks the error; citation-source-link enrichment silently falls back to bare candidates. Flagged as separate investigation. diff --git a/super-legal-mcp-refactored/CHANGELOG.md b/super-legal-mcp-refactored/CHANGELOG.md index b4da042a9..0c9f09f23 100644 --- a/super-legal-mcp-refactored/CHANGELOG.md +++ b/super-legal-mcp-refactored/CHANGELOG.md @@ -4,7 +4,7 @@ All notable changes to the Super Legal MCP Server are documented in this file. ## [Unreleased] -### Added — G5 citation-verifier observability T1+T2 (v6.8.6, v6.8.7, PRs [#122](https://github.com/Number531/Legal-API/pull/122) + this PR) +### Added — G5 citation-verifier observability T1+T2 (v6.8.6, v6.8.7, v6.8.7.1, PRs [#122](https://github.com/Number531/Legal-API/pull/122) + [#124](https://github.com/Number531/Legal-API/pull/124) + [#127](https://github.com/Number531/Legal-API/pull/127)) Two-tier observability remediation closing the regulator-facing gap (T1) and ops/SLO gap (T2) on the G5 citation-verifier subagent. Validated against the just-shipped production-fidelity A/B baseline (Exa 96.8% / Anthropic 96.1%, 2026-05-12). @@ -18,9 +18,9 @@ Two-tier observability remediation closing the regulator-facing gap (T1) and ops #### v6.8.7 T2 — Telemetry + alerts (this PR) -- **4 Prometheus series** in sdkMetrics.js: `claude_citation_verifier_confirmation_rate_pct` (Gauge), `claude_citation_verifier_confirmed_total` + `claude_citation_verifier_unconfirmed_total` (Counters, `mode` label), `claude_citation_verifier_errors_total` (Counter, `reason` label). 13 series total; bounded enums prevent explosion. (v6.8.7.1 prefix correction — names landed pre-deploy, no live consumer impact.) +- **4 Prometheus series** in sdkMetrics.js: `claude_citation_verifier_confirmation_rate_pct` (Gauge), `claude_citation_verifier_confirmed_total` + `claude_citation_verifier_unconfirmed_total` (Counters, `mode` label), `claude_citation_verifier_errors_total` (Counter, `reason` label). 13 series total; bounded enums prevent explosion. *(Renamed from un-prefixed `citation_verifier_*` in v6.8.7.1 — see below.)* - **Recording site** — `hookDBBridge.persistState()` immediately after JSON.parse of state file, before agent_states INSERT. Source: `state_data.verification_results` (in-hand; no race with T1's fire-and-forget verdict INSERT). -- **Structured log emission** — `logInfo('citation_verifier_completed', {...})` with full counts, mode, duration_ms, turns_used, tool-call counts. +- **Structured log emission** — `logInfo('citation_verifier_completed', {...})` with full counts, mode, duration_ms, turns_used, tool-call counts (`otel_trace_id` added in v6.8.7.1). - **3 alert rules** in `prometheus/alerts.yml`: `CitationVerifierConfirmationRateLow` (rate<90% sustained 1h, WARN), `CitationVerifierConfirmationRateCritical` (rate<80% sustained 30m, CRIT), `CitationVerifierErrorSpike` (>50 errors in 15m, WARN). - **Documentation** — new §9.2 in `docs/metrics-catalog.md` with full metric inventory, mode-label semantics, cardinality budget, baseline values, alert thresholds. @@ -28,6 +28,16 @@ Two-tier observability remediation closing the regulator-facing gap (T1) and ops - **`access_log` SELECT column bug** (pre-existing) — audit-report endpoint queried non-existent `actor`/`action`/`accessed_at`; corrected to real columns from ACCESS_LOG_DDL (`requester`/`purpose_code`/`created_at` etc.). Previously the `.catch(() => ([]))` silently swallowed the error; access_log has been empty in audit-reports since Wave 3 shipped. Fix unblocks T1's new INSERTs from actually showing up in regulator bundles. +#### v6.8.7.1 — Telemetry alignment fix (PR [#127](https://github.com/Number531/Legal-API/pull/127)) + +Pre-deploy correction of two consistency gaps in T2 vs. other observability avenues. Shipped same day before any `/deploy` ran with PR #124 metrics; no dual-emit required, no live-consumer impact. + +- **Metric prefix alignment** — renamed all 4 T2 metric `name:` strings from un-prefixed `citation_verifier_*` to `claude_citation_verifier_*` to match the codebase-wide `claude_*` convention (used by `claude_exa_ab_*`, `claude_kg_build_*`, `claude_subagent_duration_ms`, `claude_hook_*`, `claude_tool_*`, `claude_document_conversion_*`, etc.). Operators running PromQL filters scoped to `claude_*` would have missed the original names. Recording function names (`recordCitationVerifierRate`, etc.) preserved. +- **OTel trace correlation in structured log** — `event=citation_verifier_completed` Cloud Logging entry now includes `otel_trace_id` via `sdkTracing.getActiveTraceId()`. Operators can now pivot from a log entry → Cloud Trace span by trace ID. Matches Exa A3 pattern (`event_data.exa_a3.otel_trace_id` from PR [#115](https://github.com/Number531/Legal-API/pull/115)). Helper returns `null` when OTel disabled — safe fallback. +- **Files (8)**: sdkMetrics.js, hookDBBridge.js, prometheus/alerts.yml, docs/metrics-catalog.md, `.claude/skills/post-deploy-verify/scripts/verify-tier2.sh` (V6 grep pattern), `.claude/skills/infrastructure-health/SKILL.md`, `.claude/skills/infrastructure-health/references/citation-verifier-telemetry.md`, this CHANGELOG. +- **Validation** — `node -c` on both JS files; YAML parses for new alert block; grep across `.claude/skills/`, `src/`, `docs/`, `prometheus/` returns **zero stale unprefixed metric refs**; functional smoke test (register + emit + scrape) confirms 4 metrics expose 8/8 HELP+TYPE lines with correct labels; V6 grep pattern matches 8 lines in actual `/metrics` output; `schema-doc-validator` truth still shows 40 metrics (renamed series captured); `getActiveTraceId` import resolves and returns `null` safely with no active span. +- **Risk**: 1/10. Pure renaming + additive log field. No dual-emit needed. + #### Risk T1 = 2/10 (pattern shipped 4 times prior, near-zero base rate of failure). T2 = 1/10 (pure additive metrics + alerts). Combined = 3/10. No schema migrations beyond T1's `citation_verdicts`. No flag flips. No hot-path code in T2 (single guarded conditional in persistState).