feat(g5-observability): T1 — citation_verdicts table + audit endpoint + WORM export (v6.8.6) by Number531 · Pull Request #122 · Number531/Legal-API

Number531 · 2026-05-12T14:10:51Z

Summary

T1 of three-tier G5 citation-verifier observability remediation. Closes the regulator-facing gap surfaced by the recent audit (post-PR #121): per-footnote verdicts from citation-websearch-verifier are now queryable in SQL and exposed in the audit-report endpoint + WORM regulator-handoff bundle, rather than buried in markdown.

Without this PR, a regulator asking "prove footnote [^VII.34] was verified" can see the tool calls (hook_audit_log) and the source (citation_source_links), but not the verdict the verifier ultimately assigned — undermining the 96.8% citation accuracy claim the GTM docs (just shipped in PR #121) lean on for EU AI Act Art. 12/13 transparency.

What ships

Component	Detail
`citation_verdicts` table	New junction table mirroring Wave 2 `citation_source_links` pattern. FK to `reports(id)` and `sessions(id)`, both ON DELETE CASCADE. UNIQUE (report_id, footnote_id) for idempotent re-parse. Three indexes covering session+verdict, method, and report fetch paths.
Dual-path schema	`migrations/015_*.sql` + `CITATION_VERDICTS_DDL` in `postgres.js` + call in `ensureHookSchema()`
Parser promotion	`test/sdk/_lib/certificateParser.mjs` (PR #119) copied to `src/utils/certificateParser.js`. Test harness still imports from `_lib/` to keep PR #119 fixtures green.
Fire-and-forget persistReport hook	When `reportType==='qa' && reportKey==='citation-verification-certificate'` lands in `reports`, the parser runs in `backgroundTasks` and writes verdicts via a single batch `INSERT ... VALUES` (one round-trip for up to ~500 footnotes). Idempotent upsert.
Audit endpoint extension	`/api/session/:sessionKey/audit-report` now returns `citation_verification_certificate` (full markdown + summary stats: confirmation rate, confirmed/unconfirmed/error/skip/pass_with_note/paywalled counts) and `citation_verdicts` (per-footnote array). `report_version` 1.0 → 1.1. Access logged to `access_log` (Wave 3).
WORM bundle inclusion	`client-audit-export` now ships `citation_verdicts__csv.gz` + `citation_verification_certificate__csv.gz` in the regulator-handoff bundle (both session-scoped and date-range modes).

Pattern precedent

This is the Wave 2 citation_source_links pattern, line-for-line:

Table additive, FK ON DELETE CASCADE, UNIQUE for idempotent upsert
Fire-and-forget via backgroundTasks Set in persistReport
Try/catch around parser, non-fatal on failure
Audit endpoint exposure with .catch(() => ({ rows: [] })) graceful fallback
WORM-bundle CSV export

That pattern has shipped clean across Wave 1, Wave 2, Wave 3, and Exa A3 (v7.6.0-v7.6.2). The only Wave 2 incident was a missing-DDL hotfix caught in 24h; this PR uses dual-path (migration + ensureHookSchema) which prevents that.

Risk: 2/10

Strictly additive table (no existing-schema mutation)
Endpoint adds JSON fields (no field removal/rename) — report_version bump signals consumers
All new queries gracefully fall back to [] on stale schemas
Parser is pure logic (no I/O); failure is silent + non-fatal
Batch INSERT avoids per-row round-trip serialization (the only hot-path concern from plan review)
WORM bundle change is durable (Object Lock means the SKILL.md update ships permanently from this point), but the change is additive — existing bundles unaffected

Test plan

DDL applies cleanly on fresh DB (CREATE TABLE + 3 indexes idempotent)
DDL applies cleanly on production-shaped DB with existing reports table (FK ON DELETE CASCADE wires correctly)
After a session runs, verify SELECT verdict, COUNT(*) FROM citation_verdicts WHERE session_id = ? returns expected distribution
/api/session/:sessionKey/audit-report returns non-null citation_verification_certificate and populated citation_verdicts array
client-audit-export --session <key> produces citation_verdicts__csv.gz + citation_verification_certificate__csv.gz in bundle
Malformed/empty cert → parser returns silently, no rows written, no error log spam
Stale schema (table missing) → endpoint returns 200 with empty arrays, audit report still generates

Follow-ups (separate PRs)

T2 (~1d): Prometheus metrics (citation_verifier_confirmation_rate_pct, confirmed_total, unconfirmed_total, errors_total) + logInfo('citation_verifier_completed') + 3 alert rules. Risk score 1/10.
T3 (~0.5d + measurement): OTel span wrap via SubagentStart/Stop in hookSSEBridge + _v3 tool metrics with agent_type label (dual-emit 7d). Risk score 3/10 — gated on Prometheus headroom measurement.

🤖 Generated with Claude Code

… + WORM export (v6.8.6) Closes the regulator-facing gap on the G5 citation verification layer. Per-footnote verdicts emitted by citation-websearch-verifier are now queryable in SQL rather than embedded in markdown — closing EU AI Act Art. 12/13 query-reconstruction. Pattern: mirrors Wave 2 citation_source_links junction-table + parser + fire-and-forget persistReport hook + Wave 3 audit-report endpoint exposure + client-audit-export WORM bundle inclusion. ## Files (8) Schema (dual-path): - migrations/015_citation-verdicts.{up,down}.sql — new - src/db/postgres.js — CITATION_VERDICTS_DDL + call in ensureHookSchema Parser: - src/utils/certificateParser.js — promoted from test/sdk/_lib/ (test harness still imports from _lib to avoid breaking PR #119) Population: - src/utils/hookDBBridge.js — fire-and-forget verdict-parse hook in persistReport, guarded by reportType==='qa' && reportKey=== 'citation-verification-certificate'. Batch INSERT via VALUES list (single round-trip for up to ~500 footnotes). Idempotent upsert on (report_id, footnote_id). Endpoint: - src/server/dbFrontendRouter.js — /api/session/:sessionKey/audit-report gains citation_verification_certificate (full markdown + summary) and citation_verdicts (per-footnote array) fields. report_version bumped 1.0 → 1.1. Access logged to access_log when certificate returned (Wave 3 Art. 12 audit trail). Skill: - .claude/skills/client-audit-export/SKILL.md + scripts/range-query.py — new CSV exports citation_verdicts__csv.gz and citation_verification_certificate__csv.gz land in the per-client regulator-handoff WORM bundle (both session-scoped and date-range). ## Risk 2/10 — purely additive. Zero hot-path schema changes. All paths guarded by try/catch + graceful-fallback. Parser is pure logic (no I/O). Mirrors a pattern that has shipped clean four times. ## Tests Syntax-checked all 4 JS files + Python script. Parser logic already covered by PR #119 fixtures.

… log (v6.8.7) Telemetry tier built on T1 (PR #122). Closes the ops/SLO gap: G5 citation verifier now emits Prometheus series + structured log per SubagentStop, with 3 alert rules calibrated against the production-fidelity A/B baseline (Exa 96.8% / Anthropic 96.1%, 2026-05-12). ## Components Metrics (src/utils/sdkMetrics.js) — 4 series: - citation_verifier_confirmation_rate_pct (Gauge, mode label) - citation_verifier_confirmed_total (Counter, mode label) - citation_verifier_unconfirmed_total (Counter, mode label) - citation_verifier_errors_total (Counter, reason label, 5 bounded values) Cardinality budget: 13 series total. Bounded enums prevent explosion. Recording site (src/utils/hookDBBridge.js persistState): - Inserted after JSON.parse of state file, before agent_states INSERT. - Source: state_data.verification_results (in-hand; no race with T1's fire-and-forget verdict INSERT). - Bulk .inc(count) calls — no loop. - All wrapped in try/catch with non-fatal log. Structured log (src/utils/sdkLogger.js consumer): - logInfo('citation_verifier_completed', {...}) per agent stop. - Includes counts, mode, duration, turns_used, tool-call counts. - Cloud Logging query: jsonPayload.event="citation_verifier_completed" Alerts (prometheus/alerts.yml) — 3 rules: - CitationVerifierConfirmationRateLow: rate<90% sustained 1h (WARN) - CitationVerifierConfirmationRateCritical: rate<80% sustained 30m (CRIT) - CitationVerifierErrorSpike: >50 errors in 15m (WARN) Thresholds calibrated against measured baseline + 7pp WARN margin. Documentation (docs/metrics-catalog.md): - New §9.2 with full inventory, cardinality budget, baseline values, alert thresholds, cross-references to T1 + A/B report. ## Bundled fix access_log SELECT in dbFrontendRouter.js audit-report endpoint queried non-existent columns (actor/action/accessed_at). The .catch silently returned [] so access_log has been empty in regulator bundles since Wave 3 shipped. Corrected to ACCESS_LOG_DDL columns. Unblocks T1's INSERTs from showing up in regulator audit reports. ## Risk: 1/10 Pure additive metrics + alerts + doc. Single guarded conditional in persistState. No schema migration. No flag flip. No hot-path code. Pattern mirrors Exa A3 telemetry (PR #114, shipped clean). ## Tests - node -c on 3 JS files: pass - YAML parse of new alert rules block: pass (pre-existing strict-YAML issue with multi-line histogram_quantile() in ClaudeLatencyRegression is upstream, not introduced by this PR; promtool handles it)

… log (v6.8.7) (#124) Telemetry tier built on T1 (PR #122). Closes the ops/SLO gap: G5 citation verifier now emits Prometheus series + structured log per SubagentStop, with 3 alert rules calibrated against the production-fidelity A/B baseline (Exa 96.8% / Anthropic 96.1%, 2026-05-12). ## Components Metrics (src/utils/sdkMetrics.js) — 4 series: - citation_verifier_confirmation_rate_pct (Gauge, mode label) - citation_verifier_confirmed_total (Counter, mode label) - citation_verifier_unconfirmed_total (Counter, mode label) - citation_verifier_errors_total (Counter, reason label, 5 bounded values) Cardinality budget: 13 series total. Bounded enums prevent explosion. Recording site (src/utils/hookDBBridge.js persistState): - Inserted after JSON.parse of state file, before agent_states INSERT. - Source: state_data.verification_results (in-hand; no race with T1's fire-and-forget verdict INSERT). - Bulk .inc(count) calls — no loop. - All wrapped in try/catch with non-fatal log. Structured log (src/utils/sdkLogger.js consumer): - logInfo('citation_verifier_completed', {...}) per agent stop. - Includes counts, mode, duration, turns_used, tool-call counts. - Cloud Logging query: jsonPayload.event="citation_verifier_completed" Alerts (prometheus/alerts.yml) — 3 rules: - CitationVerifierConfirmationRateLow: rate<90% sustained 1h (WARN) - CitationVerifierConfirmationRateCritical: rate<80% sustained 30m (CRIT) - CitationVerifierErrorSpike: >50 errors in 15m (WARN) Thresholds calibrated against measured baseline + 7pp WARN margin. Documentation (docs/metrics-catalog.md): - New §9.2 with full inventory, cardinality budget, baseline values, alert thresholds, cross-references to T1 + A/B report. ## Bundled fix access_log SELECT in dbFrontendRouter.js audit-report endpoint queried non-existent columns (actor/action/accessed_at). The .catch silently returned [] so access_log has been empty in regulator bundles since Wave 3 shipped. Corrected to ACCESS_LOG_DDL columns. Unblocks T1's INSERTs from showing up in regulator audit reports. ## Risk: 1/10 Pure additive metrics + alerts + doc. Single guarded conditional in persistState. No schema migration. No flag flip. No hot-path code. Pattern mirrors Exa A3 telemetry (PR #114, shipped clean). ## Tests - node -c on 3 JS files: pass - YAML parse of new alert rules block: pass (pre-existing strict-YAML issue with multi-line histogram_quantile() in ClaudeLatencyRegression is upstream, not introduced by this PR; promtool handles it)

…rts (#125) Updates four operator skills to know about the new v6.8.6 T1 (citation_verdicts table + audit endpoint + WORM export) and v6.8.7 T2 (4 Prometheus metrics + 3 alerts + structured log) surface area shipped in PRs #122 + #124. ## Changes post-deploy-verify (Tier 2 V6 check): - verify-tier2.sh: new V6 check probing /metrics for the 4 citation_verifier_* series. PASSED on 4/4 HELP|TYPE lines present (value-agnostic; populates after first G5 run). WARNING on partial/zero. - scripts/queries/v6-citation-verdicts-presence.sql: 3-query DB-side check (schema shape, verdict distribution, per-session confirmation rate). - SKILL.md description updated V1-V4 → V1-V6. infrastructure-health (Tier 3 metric watch): - SKILL.md execution block extended with citation_verifier_* PromQL guidance (90% WARN / 80% CRIT thresholds matching alerts) + companion log filter + T1↔T2 reconciliation reference. - references/citation-verifier-telemetry.md: new operator runbook covering quick health check, PromQL dashboards, Cloud Logging schema, T1↔T2 reconciliation SQL, alert response runbook by severity. session-diagnostics (Section 11 forensics): - SKILL.md report-section list extended with Section 11 (G5 forensics) describing certificate + verdict-table + state_data cross-check. - references/citation-verifier-forensics.md: 5 forensic SQL queries (detection, certificate metadata, verdict distribution, state-vs-table reconciliation, per-batch failure pattern, citation→source provenance) with example output formatting for the diagnostic report. ## Validation All new docs pass schema-doc-validator: - 34 tables / 40 metrics / 16 alerts in truth (post-T1+T2 expected counts) - All SQL queries use real columns from current DDL - All metric/alert names match registrations in sdkMetrics.js / alerts.yml One pre-existing assumption surfaced during validation: source_chunk_embeddings does NOT have a 'metadata' JSONB column (despite a Wave 2 query in hookDBBridge.js that .catch's the error). Source URL lookup removed from the forensic query; existing Wave 2 code's assumption flagged for separate investigation (not introduced by this PR). ## Risk: 1/10 Pure docs + verification scripts. No runtime behavior change. New V6 check is read-only curl + jq + grep. New reference docs are reader-only.

… backup verification (#126) Final cleanup of doc surface area touched by T1 (PR #122) + T2 (PR #124) + skill alignment (PR #125). Four residual gaps closed: 1. docs/api-reference.md §3 (/api/session/:sessionKey/audit-report): - Response shape now includes citation_verification_certificate + citation_verdicts + verdict_summary fields shipped in T1 - access_log column shape corrected from non-existent (user_id/endpoint/method/accessed_at) to real ACCESS_LOG_DDL columns (requester/resource_type/resource_key/purpose_code/ip_address/created_at) — matches T2's SELECT fix - citations column shape corrected (match_method → matched_via, confidence_score → confidence) - report_version bumped 1.0 → 1.1 with version history block 2. Service README.md line 614: - "5 audit tables" → "8 audit surfaces" with full list including citation_verification_certificate + citation_verdicts - Cross-link to docs/api-reference.md §3 3. Root CHANGELOG.md [Unreleased]: - New T1+T2 entry mirroring service-level changelog detail - Cross-references PRs #122, #124, #125 + baseline PRs #118, #119 - Notes pre-existing source_chunk_embeddings.metadata bug flagged for separate investigation 4. client-backup-restore SKILL.md table inventory + restore verification: - citation_verdicts added to v7.0.0+ verification list - Documents that backup is automatic via pg_dump (FK CASCADE handles dependent rows); restore should confirm row counts post-restore - Notes "zero rows acceptable" for sessions pre-v6.8.6 or that never ran citation-websearch-verifier ## Validation schema-doc-validator clean on all 4 files: - Truth: 34 tables / 40 metrics / 16 alerts / 67 endpoints (unchanged) - Zero new violations introduced - access_log columns now match DDL in api-reference.md - citation_verdicts column shape matches DDL in client-backup-restore ## Risk: 0/10 Pure documentation alignment. No code, no schema, no flags. All surfaces were either describing nonexistent state (access_log) or missing information added by T1/T2 (verdict fields, table inventory).

…128) Updates both root + service CHANGELOGs to document the v6.8.7.1 patch: - Metric prefix rename: citation_verifier_* → claude_citation_verifier_* - otel_trace_id correlation in citation_verifier_completed log Root CHANGELOG: - Top-level header now references all 4 PRs (#122, #124, #125, #127) - v6.8.7 T2 bullet shows renamed metric names with backref to v6.8.7.1 - New v6.8.7.1 sub-section under T2 with full rationale + validation summary - Operator skills section notes V6 grep pattern updated in PR #127 - Risk score updated to 3/10 combined (2/10 T1 + 1/10 T2 + 1/10 v6.8.7.1) Service CHANGELOG: - Top-level header includes v6.8.7.1 and PR #127 - v6.8.7 T2 metric bullet renamed with backref note - Structured log bullet notes otel_trace_id added in v6.8.7.1 - New v6.8.7.1 subsection: rationale, file list, validation gates, smoke test results, risk score

Both docs in pending-updates/ described work that fully shipped: - exa-a3-augmentor-refactor-spec.md → v7.5.0 via PR #108 (2026-05-09) - exa-a3-improvements-plan.md → v7.3.0 → v7.6.2 via PRs #108-#115 Production all-treatment in flags.env since 2026-05-11. EXA_WEB_TOOLS=true graduated 2026-05-12 (Issue #41 closed). G5 observability (T1+T2+v6.8.7.1) shipped same day via PRs #122/#124/#125/#126/#127/#128. Both docs retained in pending-updates/ for architectural + implementation reference; original "Draft" / "Active" status now annotated with SHIPPED marker and PR cross-references. No functional change; zero risk.

… + 14 commits) Brings the branch up to date with main, including: - v6.8.6 T1 — citation_verdicts table + audit endpoint + WORM export (#122) - v6.8.7 T2 — Prometheus metrics + alerts + structured log (#124) - v6.8.7.1 — telemetry alignment fix (#127) - PRs #121, #125, #126, #128–#131 (skill updates, exa-a3 docs, Sonnet-deep A/B experiment + KEEP_SONNET verdict) Conflict resolution: - prometheus/alerts.yml — kept BOTH alert blocks (XLSX SLOs + G5 citation-verifier alerts are independent concerns) - .claude/skills/post-deploy-verify/SKILL.md — branch's V6 row renumbered → V7 so main's V6 (G5 citation-verifier) keeps its slot; both verification rows coexist Auto-merged additively (no manual edit needed): - src/db/postgres.js — CITATION_VERDICTS_DDL (main, line 459) + XLSX_RENDERS_* + Wave 3 schemas (branch, lines 1113+) coexist; ensureCitationVerdictsSchema wired into ensureHookSchema (line 1173) - src/utils/sdkMetrics.js — G5 metrics + xlsx metrics coexist - src/utils/hookDBBridge.js — citation_verdicts persistence path added alongside xlsx persistence; both schemas write independently Migrations directory now has two 015s side-by-side: - 015_citation-verdicts.{up,down}.sql (main) - 015_human-interventions-metadata.{up,down}.sql (branch) A follow-up commit will renumber the branch's 015→016, 016→017, 017→018 to slot AFTER main's 015 cleanly. Migrations are documentation-only (runtime applicator is ensure*Schema()), so this is an audit-trail fix, not a production breakage. Verification: - All conflict markers resolved (verified via grep) - Auto-merged JS files syntax-clean (node --check) - Unit suite: 167 / 0 / 2 — identical to pre-merge (DB-gated; confirms no semantic breakage in shared modules) Branch is now ready for migration renumber + PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Resolves the migration-numbering collision introduced by main's 015_citation-verdicts (PR #122) landing concurrently with this branch's 015_human-interventions-metadata. After the merge commit ac5a6cd brought both 015s side-by-side, this commit slots the branch's three migrations cleanly AFTER main's 015 to restore a single linear sequence: 015_citation-verdicts (main, kept) 016_human-interventions-metadata (was 015 on branch) 017_xlsx-renders (was 016 on branch) 018_xlsx-renders-generated-columns (was 017 on branch) Renames are pure `git mv` — file contents unchanged except the self-referential `-- 0NN_…sql` header comment on line 1 of each file (annotated with "renumbered from 0(N-1) post-merge with origin/main 015_citation-verdicts" for audit trail). Updated references: - src/db/postgres.js — 2 comments referencing migrations/016 and migrations/017 updated to 017 and 018 respectively - docs/pending-updates/excel-code-execution.md — 1 reference - docs/pending-updates/excel-code-execution-phase2-plan.md — 4 refs - docs/pending-updates/excel-code-execution-isolation-test-plan.md — 9 refs - docs/pending-updates/excel-code-execution-preflight.md — 4 refs Verification: - grep -rn for old migration numbers → zero non-annotated matches - node --check src/db/postgres.js → OK - node test/sdk/xlsx-renderer-integration.test.js → 167 / 0 / 2 (pre-renumber baseline preserved; test code path didn't touch migration filenames anyway — runtime applicator is ensure*Schema()) This is the audit-trail fix called out in the prior merge commit; branch is now PR-ready against main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Number531 mentioned this pull request May 12, 2026

feat(g5-observability): T2 — Prometheus metrics + alerts + structured log (v6.8.7) #123

Closed

6 tasks

Number531 merged commit ba83df0 into main May 12, 2026

Number531 deleted the feat/citation-verdicts-t1 branch May 12, 2026 14:25

Number531 mentioned this pull request May 12, 2026

feat(g5-observability): T2 — Prometheus metrics + alerts + structured log (v6.8.7) #124

Merged

6 tasks

Number531 mentioned this pull request May 12, 2026

docs(skills): align operator skills with T1+T2 telemetry, schema, alerts #125

Merged

3 tasks

Number531 mentioned this pull request May 12, 2026

docs: T1+T2 residual gaps — api-reference + root CHANGELOG + README + backup verification #126

Merged

Number531 mentioned this pull request May 12, 2026

docs(changelog): reflect v6.8.7.1 telemetry alignment fix (PR #127) #128

Merged

This was referenced May 12, 2026

WebFetch → Hybrid fetch_document Conversion (v1.2.0) #38

Closed

Feature flag: EXA_WEB_TOOLS — track rollout and graduation #41

Closed

Number531 mentioned this pull request May 12, 2026

experiment: Sonnet-deep vs Haiku-deep A/B — KEEP_SONNET verdict + production gap findings #130

Merged

Number531 mentioned this pull request May 15, 2026

feat: xlsx renderer pipeline + post-merge alignment (XLSX_RENDERER=false in prod) #132

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(g5-observability): T1 — citation_verdicts table + audit endpoint + WORM export (v6.8.6)#122

feat(g5-observability): T1 — citation_verdicts table + audit endpoint + WORM export (v6.8.6)#122
Number531 merged 1 commit into
mainfrom
feat/citation-verdicts-t1

Number531 commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Number531 commented May 12, 2026

Summary

What ships

Pattern precedent

Risk: 2/10

Test plan

Follow-ups (separate PRs)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant