Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 12 additions & 7 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added — G5 citation-verifier observability T1+T2 (v6.8.6 / v6.8.7, 2026-05-12, PRs [#122](https://github.com/Number531/Legal-API/pull/122) + [#124](https://github.com/Number531/Legal-API/pull/124))
### Added — G5 citation-verifier observability T1+T2 (v6.8.6 / v6.8.7 / v6.8.7.1, 2026-05-12, PRs [#122](https://github.com/Number531/Legal-API/pull/122) + [#124](https://github.com/Number531/Legal-API/pull/124) + [#127](https://github.com/Number531/Legal-API/pull/127))

Two-tier observability remediation closing the regulator gap (T1) and ops/SLO gap (T2) on the G5 citation-verifier subagent. Built on the production-fidelity A/B baseline established the same day (Exa 96.8% / Anthropic 96.1%, PRs [#118](https://github.com/Number531/Legal-API/pull/118) + [#119](https://github.com/Number531/Legal-API/pull/119)).
Two-tier observability remediation closing the regulator gap (T1) and ops/SLO gap (T2) on the G5 citation-verifier subagent, plus a pre-deploy telemetry-alignment fix (v6.8.7.1) before the first deploy. Built on the production-fidelity A/B baseline established the same day (Exa 96.8% / Anthropic 96.1%, PRs [#118](https://github.com/Number531/Legal-API/pull/118) + [#119](https://github.com/Number531/Legal-API/pull/119)).

**v6.8.6 T1 — regulator persistence (PR [#122](https://github.com/Number531/Legal-API/pull/122))**:
- New `citation_verdicts` junction table (dual-path: `migrations/015_*.sql` + `CITATION_VERDICTS_DDL` in postgres.js + `ensureHookSchema()` call). FK ON DELETE CASCADE on reports + sessions; UNIQUE (report_id, footnote_id) for idempotent upsert; 3 indexes.
Expand All @@ -19,18 +19,23 @@ Two-tier observability remediation closing the regulator gap (T1) and ops/SLO ga
- `client-audit-export` skill ships `citation_verdicts__csv.gz` + `citation_verification_certificate__csv.gz` in the per-client regulator-handoff WORM bundle.

**v6.8.7 T2 — telemetry + alerts (PR [#124](https://github.com/Number531/Legal-API/pull/124))**:
- 4 Prometheus series in sdkMetrics.js: `citation_verifier_confirmation_rate_pct` (Gauge, `mode` label), `citation_verifier_{confirmed,unconfirmed,errors}_total` (Counters). 13 series total; bounded enums prevent explosion.
- 4 Prometheus series in sdkMetrics.js: `claude_citation_verifier_confirmation_rate_pct` (Gauge, `mode` label), `claude_citation_verifier_{confirmed,unconfirmed,errors}_total` (Counters). 13 series total; bounded enums prevent explosion. *(Renamed from un-prefixed `citation_verifier_*` in v6.8.7.1; see below.)*
- Recording in `hookDBBridge.persistState()` from `state_data.verification_results` (in-hand at SubagentStop; no race with T1's fire-and-forget INSERT).
- Structured log `event=citation_verifier_completed` with counts/mode/duration/turns/tool-call counts.
- 3 alert rules in `prometheus/alerts.yml`: rate <90% sustained 1h (WARN), <80% 30m (CRIT), error spike >50/15m (WARN). Thresholds calibrated against measured baseline + 7pp WARN margin.
- Bundled fix: corrected pre-existing `access_log` SELECT in audit-report endpoint that queried non-existent columns (`actor`/`action`/`accessed_at`) — these had been silently failing via `.catch()` since Wave 3 shipped.

**Operator skills aligned (PR [#125](https://github.com/Number531/Legal-API/pull/125))**:
- `post-deploy-verify` Tier 2 — new V6 check probes `/metrics` for 4 new series + companion `v6-citation-verdicts-presence.sql`.
- `infrastructure-health` Tier 3 — `citation_verifier_*` PromQL guidance + new `references/citation-verifier-telemetry.md` runbook (PromQL dashboards, Cloud Logging schema, T1↔T2 reconciliation SQL, alert response by severity).
**v6.8.7.1 — telemetry alignment fix (PR [#127](https://github.com/Number531/Legal-API/pull/127))**:
- **Metric prefix correction** — renamed all 4 T2 series from un-prefixed `citation_verifier_*` to `claude_citation_verifier_*` to match the codebase-wide `claude_*` convention used by every other Prometheus series (`claude_exa_ab_*`, `claude_kg_build_*`, `claude_subagent_duration_ms`, `claude_hook_*`, `claude_tool_*`, etc.). PromQL filters scoped to `claude_*` would have missed the original names. Safe rename without dual-emit since PR #124 had not yet deployed to a live Prometheus TSDB.
- **OTel trace correlation** — `event=citation_verifier_completed` structured log now includes `otel_trace_id` via `sdkTracing.getActiveTraceId()`. Operators can now pivot from a Cloud Logging entry → Cloud Trace span by trace_id. Matches Exa A3 pattern (`event_data.exa_a3.otel_trace_id` from PR [#115](https://github.com/Number531/Legal-API/pull/115)). Helper returns `null` when OTel disabled — safe fallback.
- Functional smoke test confirmed all 4 renamed metrics expose correct HELP+TYPE+value lines with labels intact; V6 post-deploy grep pattern updated accordingly.

**Operator skills aligned (PR [#125](https://github.com/Number531/Legal-API/pull/125), updated in PR [#127](https://github.com/Number531/Legal-API/pull/127))**:
- `post-deploy-verify` Tier 2 — new V6 check probes `/metrics` for 4 new series + companion `v6-citation-verdicts-presence.sql`. V6 grep pattern updated to `claude_citation_verifier_*` in PR #127.
- `infrastructure-health` Tier 3 — `claude_citation_verifier_*` PromQL guidance + new `references/citation-verifier-telemetry.md` runbook (PromQL dashboards, Cloud Logging schema with `otel_trace_id` pivot, T1↔T2 reconciliation SQL, alert response by severity).
- `session-diagnostics` — new Section 11 (G5 forensics) with 5 forensic SQL queries (verdict distribution, state-vs-table reconciliation, per-batch failure pattern, citation→source provenance).

**Risk**: 3/10 combined (2/10 T1 + 1/10 T2). Pattern mirrors Wave 2 + Exa A3 telemetry — battle-tested. Independent review on both PRs surfaced zero blockers.
**Risk**: 3/10 combined (2/10 T1 + 1/10 T2 + 1/10 v6.8.7.1). Pattern mirrors Wave 2 + Exa A3 telemetry — battle-tested. Independent review on T1 and T2 PRs surfaced zero blockers; v6.8.7.1 verified by functional smoke test (register + emit + scrape) confirming all 4 renamed metrics expose correct HELP/TYPE/value lines.

**Pre-existing bug surfaced (not in scope)**: schema-doc-validator caught that `source_chunk_embeddings` does NOT have a `metadata` JSONB column despite Wave 2 code at `hookDBBridge.js:454` assuming it does. The `.catch()` masks the error; citation-source-link enrichment silently falls back to bare candidates. Flagged as separate investigation.

Expand Down
16 changes: 13 additions & 3 deletions super-legal-mcp-refactored/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ All notable changes to the Super Legal MCP Server are documented in this file.

## [Unreleased]

### Added — G5 citation-verifier observability T1+T2 (v6.8.6, v6.8.7, PRs [#122](https://github.com/Number531/Legal-API/pull/122) + this PR)
### Added — G5 citation-verifier observability T1+T2 (v6.8.6, v6.8.7, v6.8.7.1, PRs [#122](https://github.com/Number531/Legal-API/pull/122) + [#124](https://github.com/Number531/Legal-API/pull/124) + [#127](https://github.com/Number531/Legal-API/pull/127))

Two-tier observability remediation closing the regulator-facing gap (T1) and ops/SLO gap (T2) on the G5 citation-verifier subagent. Validated against the just-shipped production-fidelity A/B baseline (Exa 96.8% / Anthropic 96.1%, 2026-05-12).

Expand All @@ -18,16 +18,26 @@ Two-tier observability remediation closing the regulator-facing gap (T1) and ops

#### v6.8.7 T2 — Telemetry + alerts (this PR)

- **4 Prometheus series** in sdkMetrics.js: `claude_citation_verifier_confirmation_rate_pct` (Gauge), `claude_citation_verifier_confirmed_total` + `claude_citation_verifier_unconfirmed_total` (Counters, `mode` label), `claude_citation_verifier_errors_total` (Counter, `reason` label). 13 series total; bounded enums prevent explosion. (v6.8.7.1 prefix correction — names landed pre-deploy, no live consumer impact.)
- **4 Prometheus series** in sdkMetrics.js: `claude_citation_verifier_confirmation_rate_pct` (Gauge), `claude_citation_verifier_confirmed_total` + `claude_citation_verifier_unconfirmed_total` (Counters, `mode` label), `claude_citation_verifier_errors_total` (Counter, `reason` label). 13 series total; bounded enums prevent explosion. *(Renamed from un-prefixed `citation_verifier_*` in v6.8.7.1 — see below.)*
- **Recording site** — `hookDBBridge.persistState()` immediately after JSON.parse of state file, before agent_states INSERT. Source: `state_data.verification_results` (in-hand; no race with T1's fire-and-forget verdict INSERT).
- **Structured log emission** — `logInfo('citation_verifier_completed', {...})` with full counts, mode, duration_ms, turns_used, tool-call counts.
- **Structured log emission** — `logInfo('citation_verifier_completed', {...})` with full counts, mode, duration_ms, turns_used, tool-call counts (`otel_trace_id` added in v6.8.7.1).
- **3 alert rules** in `prometheus/alerts.yml`: `CitationVerifierConfirmationRateLow` (rate<90% sustained 1h, WARN), `CitationVerifierConfirmationRateCritical` (rate<80% sustained 30m, CRIT), `CitationVerifierErrorSpike` (>50 errors in 15m, WARN).
- **Documentation** — new §9.2 in `docs/metrics-catalog.md` with full metric inventory, mode-label semantics, cardinality budget, baseline values, alert thresholds.

#### Bundled fix (T2 PR)

- **`access_log` SELECT column bug** (pre-existing) — audit-report endpoint queried non-existent `actor`/`action`/`accessed_at`; corrected to real columns from ACCESS_LOG_DDL (`requester`/`purpose_code`/`created_at` etc.). Previously the `.catch(() => ([]))` silently swallowed the error; access_log has been empty in audit-reports since Wave 3 shipped. Fix unblocks T1's new INSERTs from actually showing up in regulator bundles.

#### v6.8.7.1 — Telemetry alignment fix (PR [#127](https://github.com/Number531/Legal-API/pull/127))

Pre-deploy correction of two consistency gaps in T2 vs. other observability avenues. Shipped same day before any `/deploy` ran with PR #124 metrics; no dual-emit required, no live-consumer impact.

- **Metric prefix alignment** — renamed all 4 T2 metric `name:` strings from un-prefixed `citation_verifier_*` to `claude_citation_verifier_*` to match the codebase-wide `claude_*` convention (used by `claude_exa_ab_*`, `claude_kg_build_*`, `claude_subagent_duration_ms`, `claude_hook_*`, `claude_tool_*`, `claude_document_conversion_*`, etc.). Operators running PromQL filters scoped to `claude_*` would have missed the original names. Recording function names (`recordCitationVerifierRate`, etc.) preserved.
- **OTel trace correlation in structured log** — `event=citation_verifier_completed` Cloud Logging entry now includes `otel_trace_id` via `sdkTracing.getActiveTraceId()`. Operators can now pivot from a log entry → Cloud Trace span by trace ID. Matches Exa A3 pattern (`event_data.exa_a3.otel_trace_id` from PR [#115](https://github.com/Number531/Legal-API/pull/115)). Helper returns `null` when OTel disabled — safe fallback.
- **Files (8)**: sdkMetrics.js, hookDBBridge.js, prometheus/alerts.yml, docs/metrics-catalog.md, `.claude/skills/post-deploy-verify/scripts/verify-tier2.sh` (V6 grep pattern), `.claude/skills/infrastructure-health/SKILL.md`, `.claude/skills/infrastructure-health/references/citation-verifier-telemetry.md`, this CHANGELOG.
- **Validation** — `node -c` on both JS files; YAML parses for new alert block; grep across `.claude/skills/`, `src/`, `docs/`, `prometheus/` returns **zero stale unprefixed metric refs**; functional smoke test (register + emit + scrape) confirms 4 metrics expose 8/8 HELP+TYPE lines with correct labels; V6 grep pattern matches 8 lines in actual `/metrics` output; `schema-doc-validator` truth still shows 40 metrics (renamed series captured); `getActiveTraceId` import resolves and returns `null` safely with no active span.
- **Risk**: 1/10. Pure renaming + additive log field. No dual-emit needed.

#### Risk

T1 = 2/10 (pattern shipped 4 times prior, near-zero base rate of failure). T2 = 1/10 (pure additive metrics + alerts). Combined = 3/10. No schema migrations beyond T1's `citation_verdicts`. No flag flips. No hot-path code in T2 (single guarded conditional in persistState).
Expand Down