From 373ff808323ddaf1dac0ba6fdf1bff6c4f55b68d Mon Sep 17 00:00:00 2001 From: Number531 <120485065+Number531@users.noreply.github.com> Date: Tue, 12 May 2026 10:57:41 -0400 Subject: [PATCH] =?UTF-8?q?docs:=20T1+T2=20residual=20gaps=20=E2=80=94=20a?= =?UTF-8?q?pi-reference=20+=20root=20CHANGELOG=20+=20README=20+=20backup?= =?UTF-8?q?=20verification?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Final cleanup of doc surface area touched by T1 (PR #122) + T2 (PR #124) + skill alignment (PR #125). Four residual gaps closed: 1. docs/api-reference.md §3 (/api/session/:sessionKey/audit-report): - Response shape now includes citation_verification_certificate + citation_verdicts + verdict_summary fields shipped in T1 - access_log column shape corrected from non-existent (user_id/endpoint/method/accessed_at) to real ACCESS_LOG_DDL columns (requester/resource_type/resource_key/purpose_code/ip_address/created_at) — matches T2's SELECT fix - citations column shape corrected (match_method → matched_via, confidence_score → confidence) - report_version bumped 1.0 → 1.1 with version history block 2. Service README.md line 614: - "5 audit tables" → "8 audit surfaces" with full list including citation_verification_certificate + citation_verdicts - Cross-link to docs/api-reference.md §3 3. Root CHANGELOG.md [Unreleased]: - New T1+T2 entry mirroring service-level changelog detail - Cross-references PRs #122, #124, #125 + baseline PRs #118, #119 - Notes pre-existing source_chunk_embeddings.metadata bug flagged for separate investigation 4. client-backup-restore SKILL.md table inventory + restore verification: - citation_verdicts added to v7.0.0+ verification list - Documents that backup is automatic via pg_dump (FK CASCADE handles dependent rows); restore should confirm row counts post-restore - Notes "zero rows acceptable" for sessions pre-v6.8.6 or that never ran citation-websearch-verifier ## Validation schema-doc-validator clean on all 4 files: - Truth: 34 tables / 40 metrics / 16 alerts / 67 endpoints (unchanged) - Zero new violations introduced - access_log columns now match DDL in api-reference.md - citation_verdicts column shape matches DDL in client-backup-restore ## Risk: 0/10 Pure documentation alignment. No code, no schema, no flags. All surfaces were either describing nonexistent state (access_log) or missing information added by T1/T2 (verdict fields, table inventory). --- .claude/skills/client-backup-restore/SKILL.md | 2 ++ CHANGELOG.md | 27 ++++++++++++++ super-legal-mcp-refactored/README.md | 2 +- .../docs/api-reference.md | 35 ++++++++++++++++--- 4 files changed, 61 insertions(+), 5 deletions(-) diff --git a/.claude/skills/client-backup-restore/SKILL.md b/.claude/skills/client-backup-restore/SKILL.md index 3f4d1c12f..be41cd6bf 100644 --- a/.claude/skills/client-backup-restore/SKILL.md +++ b/.claude/skills/client-backup-restore/SKILL.md @@ -196,12 +196,14 @@ gcloud sql backups restore {backup_id} \ - `code_executions` (now with 13+ reproducibility columns: model_id, llm_name, anthropic_request_id, anthropic_message_id, system_prompt_hash, python_code, container_id, tool_use_id, stop_reason, turn_count, pause_count, refusal_detected, etc.) — required for byte-replay envelope per EU AI Act Art. 15 - `code_execution_inputs` — data lineage junction (small table, 1-5 rows per execution) - `citation_source_links` — citation→source bridge with confidence scores (1 row per matched citation) +- `citation_verdicts` — per-footnote G5 verdicts (v6.8.6 T1, PR #122). 1 row per verified footnote; ~300-500 rows per memo session that ran citation-websearch-verifier. FK ON DELETE CASCADE on reports + sessions — backed up automatically via pg_dump; no manual handling needed. - `hook_audit_log` — now includes `bridge_metadata` JSONB column with `git_sha + sdk_version + container_id + system_prompt_hash` (regulator-replay envelope) Restore verification (Phase 4) should confirm these row counts post-restore for v7.0.0+ deployments: - `SELECT COUNT(*) FROM transcript_events` matches pre-backup count - `SELECT COUNT(*) FROM code_executions WHERE model_id IS NOT NULL` matches pre-backup count (NULL model_id = pre-v6.8.4 row, allowed) - `SELECT COUNT(*) FROM citation_source_links` matches pre-backup count +- `SELECT COUNT(*) FROM citation_verdicts` matches pre-backup count (zero rows is acceptable for sessions that ran before v6.8.6 OR that never invoked citation-websearch-verifier) - `SELECT event_data->'bridge_metadata' IS NOT NULL FROM hook_audit_log WHERE tool_name='run_python_analysis'` — bridge_metadata preserved on restore ## Storage Locations diff --git a/CHANGELOG.md b/CHANGELOG.md index 34ddbf91f..7333934be 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,33 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +### Added — G5 citation-verifier observability T1+T2 (v6.8.6 / v6.8.7, 2026-05-12, PRs [#122](https://github.com/Number531/Legal-API/pull/122) + [#124](https://github.com/Number531/Legal-API/pull/124)) + +Two-tier observability remediation closing the regulator gap (T1) and ops/SLO gap (T2) on the G5 citation-verifier subagent. Built on the production-fidelity A/B baseline established the same day (Exa 96.8% / Anthropic 96.1%, PRs [#118](https://github.com/Number531/Legal-API/pull/118) + [#119](https://github.com/Number531/Legal-API/pull/119)). + +**v6.8.6 T1 — regulator persistence (PR [#122](https://github.com/Number531/Legal-API/pull/122))**: +- New `citation_verdicts` junction table (dual-path: `migrations/015_*.sql` + `CITATION_VERDICTS_DDL` in postgres.js + `ensureHookSchema()` call). FK ON DELETE CASCADE on reports + sessions; UNIQUE (report_id, footnote_id) for idempotent upsert; 3 indexes. +- Parser promoted: `test/sdk/_lib/certificateParser.mjs` → `src/utils/certificateParser.js`. +- Fire-and-forget batch INSERT in `persistReport` mirrors Wave 2 `citation_source_links` pattern (single round-trip for ~500 footnotes). +- `/api/session/:sessionKey/audit-report` v1.1 returns `citation_verification_certificate` (full markdown + verdict_summary) and `citation_verdicts` (per-footnote array). Access logged to `access_log`. +- `client-audit-export` skill ships `citation_verdicts__csv.gz` + `citation_verification_certificate__csv.gz` in the per-client regulator-handoff WORM bundle. + +**v6.8.7 T2 — telemetry + alerts (PR [#124](https://github.com/Number531/Legal-API/pull/124))**: +- 4 Prometheus series in sdkMetrics.js: `citation_verifier_confirmation_rate_pct` (Gauge, `mode` label), `citation_verifier_{confirmed,unconfirmed,errors}_total` (Counters). 13 series total; bounded enums prevent explosion. +- Recording in `hookDBBridge.persistState()` from `state_data.verification_results` (in-hand at SubagentStop; no race with T1's fire-and-forget INSERT). +- Structured log `event=citation_verifier_completed` with counts/mode/duration/turns/tool-call counts. +- 3 alert rules in `prometheus/alerts.yml`: rate <90% sustained 1h (WARN), <80% 30m (CRIT), error spike >50/15m (WARN). Thresholds calibrated against measured baseline + 7pp WARN margin. +- Bundled fix: corrected pre-existing `access_log` SELECT in audit-report endpoint that queried non-existent columns (`actor`/`action`/`accessed_at`) — these had been silently failing via `.catch()` since Wave 3 shipped. + +**Operator skills aligned (PR [#125](https://github.com/Number531/Legal-API/pull/125))**: +- `post-deploy-verify` Tier 2 — new V6 check probes `/metrics` for 4 new series + companion `v6-citation-verdicts-presence.sql`. +- `infrastructure-health` Tier 3 — `citation_verifier_*` PromQL guidance + new `references/citation-verifier-telemetry.md` runbook (PromQL dashboards, Cloud Logging schema, T1↔T2 reconciliation SQL, alert response by severity). +- `session-diagnostics` — new Section 11 (G5 forensics) with 5 forensic SQL queries (verdict distribution, state-vs-table reconciliation, per-batch failure pattern, citation→source provenance). + +**Risk**: 3/10 combined (2/10 T1 + 1/10 T2). Pattern mirrors Wave 2 + Exa A3 telemetry — battle-tested. Independent review on both PRs surfaced zero blockers. + +**Pre-existing bug surfaced (not in scope)**: schema-doc-validator caught that `source_chunk_embeddings` does NOT have a `metadata` JSONB column despite Wave 2 code at `hookDBBridge.js:454` assuming it does. The `.catch()` masks the error; citation-source-link enrichment silently falls back to bare candidates. Flagged as separate investigation. + ### Added — Citation-verifier A/B test harnesses (test-only, 2026-05-12, PRs [#118](https://github.com/Number531/Legal-API/pull/118) + [#119](https://github.com/Number531/Legal-API/pull/119)) Empirical validation of the production `EXA_WEB_TOOLS=true` config (live in `flags.env` since 2026-04-18 PR #76 but never directly measured). Test-only; no production code touched. diff --git a/super-legal-mcp-refactored/README.md b/super-legal-mcp-refactored/README.md index 1f73d7e1b..e21cae372 100644 --- a/super-legal-mcp-refactored/README.md +++ b/super-legal-mcp-refactored/README.md @@ -611,7 +611,7 @@ End-to-end machinery for EU AI Act Art. 12 (logging), Art. 13 (transparency), Ar **Reproducibility metadata** (`hook_audit_log.event_data->'bridge_metadata'`): `sdk_version`, `git_sha`, `beta_headers`, `bridge_version`, `otel_trace_id` — enables byte-exact replay of historical executions. -**Regulator-facing endpoint**: `GET /api/session/:sessionKey/audit-report` aggregates 5 audit tables (code_executions, bridge_metadata, citations, human_interventions, access_log) as JSON or CSV (`?format=csv`). Auth via existing `cookieAuth`. See [docs/runbooks/v6.8.5-audit-export.md](docs/runbooks/v6.8.5-audit-export.md) for operator workflow. +**Regulator-facing endpoint**: `GET /api/session/:sessionKey/audit-report` (v1.1 since v6.8.6 T1, PR [#122](https://github.com/Number531/Legal-API/pull/122)) aggregates **8 audit surfaces** as JSON or CSV (`?format=csv`): session lifecycle, `code_executions`, `bridge_metadata`, `citations` (citation_source_links), `citation_verification_certificate` (full G5 markdown + verdict_summary), `citation_verdicts` (per-footnote array), `human_interventions`, and `access_log`. Auth via existing `cookieAuth`. See [docs/runbooks/v6.8.5-audit-export.md](docs/runbooks/v6.8.5-audit-export.md) and [docs/api-reference.md §3](docs/api-reference.md) for operator workflow + full response shape. **PII redaction** (GDPR Art. 17): `redactSessionEventData(sessionId)` in `src/utils/retentionManager.js` walks `hook_audit_log.event_data` JSONB and scrubs sensitive paths (tool_input.{code,text,url,query,email,…}, error.context.*, python_code, stdout, raw_output) with `[REDACTED]` markers. Idempotent. Audit shape preserved (regulator sees "an event happened" — not the content). Invoke before `DELETE FROM sessions` for compliance erasure. diff --git a/super-legal-mcp-refactored/docs/api-reference.md b/super-legal-mcp-refactored/docs/api-reference.md index c79f0f428..61b151084 100644 --- a/super-legal-mcp-refactored/docs/api-reference.md +++ b/super-legal-mcp-refactored/docs/api-reference.md @@ -175,12 +175,34 @@ Two access-audit middlewares wrap subsets of read paths: { "tool_use_id": "...", "bridge_metadata": { "git_sha": "...", "sdk_version": "...", "container_id": "...", "system_prompt_hash": "..." }, "created_at": "..." } ], "citations": [ - { "report_id": "...", "citation_marker": "[12]", "source_hash": "...", "match_method": "url_exact", "confidence_score": 1.00, "matched_at": "..." } + { "report_id": "...", "citation_marker": "[12]", "source_hash": "...", "confidence": 1.00, "matched_via": "url", "report_type": "memo", "report_key": "..." } + ], + "citation_verification_certificate": { + "report_id": "uuid", + "agent_type": "citation-websearch-verifier", + "certificate_text": "## CERTIFICATION STATUS: PASS\n\n**Confirmation Rate:** 96.8% (358 confirmed / 370 verifiable footnotes)\n\n...", + "created_at": "...", + "word_count": 12450, + "verdict_summary": { + "total": 370, + "confirmed": 358, + "unconfirmed": 12, + "errors": 0, + "skipped": 0, + "pass_with_note": 0, + "paywalled": 0, + "confirmation_rate": 0.9676 + } + }, + "citation_verdicts": [ + { "footnote_id": "^43", "footnote_row": 43, "citation_text": "...", "source_type": "case law", "verification_method": "Exa fetch_document", "verdict": "CONFIRMED", "paywalled": false, "notes": "", "created_at": "..." } ], "human_interventions": [], "access_log": [ - { "user_id": 42, "endpoint": "/api/db/sessions/.../report/...", "method": "GET", "accessed_at": "..." } - ] + { "requester": "user@example.com", "resource_type": "session_data", "resource_key": "...", "purpose_code": "regulator_audit", "ip_address": "...", "created_at": "..." } + ], + "generated_at": "...", + "report_version": "1.1" } ``` @@ -188,9 +210,14 @@ Two access-audit middlewares wrap subsets of read paths: **Response (400)**: `{ "error": "Invalid session key format" }` **Response (503)**: `{ "error": "Database not configured" }` +### Version history + +- **v1.0** (initial) — sessions, code_executions, bridge_metadata, citations, human_interventions, access_log +- **v1.1** (v6.8.6 T1, PR [#122](https://github.com/Number531/Legal-API/pull/122)) — added `citation_verification_certificate` (full G5 markdown + parsed summary) and `citation_verdicts` (per-footnote verdict array sourced from `citation_verdicts` table). Closes EU AI Act Art. 12/13 query-reconstruction for the verification layer. `access_log` columns corrected to match `ACCESS_LOG_DDL` (requester/resource_type/resource_key/purpose_code/created_at) in v6.8.7 T2, PR [#124](https://github.com/Number531/Legal-API/pull/124). + **Note on PII**: redaction fires at offboarding-time via `retentionManager.redactSessionEventData()` per GDPR Art. 17, NOT at admin read-time. Regulators require full audit data; redaction is the erasure boundary, not the access boundary. -**Reference**: `docs/runbooks/v6.8.5-audit-export.md` for response interpretation and CSV format. +**Reference**: `docs/runbooks/v6.8.5-audit-export.md` for response interpretation and CSV format. T1+T2 cross-reference: `docs/metrics-catalog.md` §9.2 for the parallel Prometheus telemetry path. ### `GET /api/db/sessions/:sessionKey/transcript`