Number531 · Number531 · May 12, 2026 · May 12, 2026
diff --git a/.claude/skills/infrastructure-health/references/citation-verifier-telemetry.md b/.claude/skills/infrastructure-health/references/citation-verifier-telemetry.md
@@ -91,6 +91,104 @@ LIMIT 10;
 
 Expected: `divergence` is 0 or within ±2 for every row. Larger = investigate.
 
+## Detecting cert confabulation (added 2026-05-12 from PR #130 findings)
+
+PR #130 surfaced that the verifier model can write a certificate claiming verification methods (e.g., `fetch_document`, `exa_web_search`) that were never actually invoked at the tool level. Haiku in deep mode did this completely (0 tool calls, 17 method-label confabulations); Sonnet partially (12 tool calls but 42 "structural" / "reporter knowledge" pattern confirmations).
+
+The `subagent_tool_usage` hook event already counts real tool invocations per category. A cert that claims more tool-based confirmations than telemetry recorded is **confabulating** — a regulator-facing data-integrity risk.
+
+### Cross-check query — telemetry vs cert claims (per session)
+
+<!-- noqa:07 — schema-doc-validator parses string literals like 'fetch_document' inside length('X') as column refs; false positive -->
+<!-- noqa:05 -->
+```sql
+-- Compares claimed cert methods against actual telemetry counts.
+-- Run for any session that ran citation-websearch-verifier in deep mode.
+--
+-- A row where claimed_X > actual_X indicates the verifier wrote method
+-- attributions in the cert that the tool-call telemetry doesn't support.
+WITH telemetry AS (
+  SELECT
+    s.session_key,
+    s.id AS session_id,
+    -- Pull tool_counts from the subagent_tool_usage event in hook_audit_log
+    -- (logged at SubagentStop with cumulative per-subagent counts)
+    (h.event_data->'tool_counts'->>'exaWebSearches')::int  AS actual_exa_searches,
+    (h.event_data->'tool_counts'->>'fetchDocumentCalls')::int AS actual_fetch_docs,
+    (h.event_data->'tool_counts'->>'mcpCalls')::int       AS actual_mcp_calls,
+    (h.event_data->'tool_counts'->>'totalToolCalls')::int AS total_tool_calls
+  FROM sessions s
+  JOIN hook_audit_log h ON h.session_id = s.id
+  WHERE h.event_type = 'SubagentStop'
+    AND h.agent_type = 'citation-websearch-verifier'
+    AND h.event_data ? 'tool_counts'
+),
+cert_claims AS (
+  -- Count method-column appearances in the cert text. Crude but effective:
+  -- substring-counts of method-name tokens in reports.content.
+  SELECT
+    r.session_id,
+    -- Each method-name appearance roughly = one claimed verification
+    (length(r.content) - length(replace(r.content, 'fetch_document', '')))
+      / length('fetch_document') AS claimed_fetch_docs,
+    (length(r.content) - length(replace(r.content, 'exa_web_search', '')))
+      / length('exa_web_search') AS claimed_exa_searches,
+    (length(r.content) - length(replace(r.content, 'lookup_citation', '')))
+      / length('lookup_citation') AS claimed_lookup_citation,
+    (length(r.content) - length(replace(r.content, 'search_sec_filings', '')))
+      / length('search_sec_filings') AS claimed_search_sec
+  FROM reports r
+  WHERE r.report_type = 'qa'
+    AND r.report_key = 'citation-verification-certificate'
+)
+SELECT
+  t.session_key,
+  t.actual_fetch_docs,    c.claimed_fetch_docs,
+  t.actual_exa_searches,  c.claimed_exa_searches,
+  t.actual_mcp_calls,     c.claimed_lookup_citation + c.claimed_search_sec AS claimed_mcp_total,
+  -- Confabulation flag: claimed > actual
+  CASE
+    WHEN c.claimed_fetch_docs > t.actual_fetch_docs + 1 THEN 'fetch_document'
+    WHEN c.claimed_exa_searches > t.actual_exa_searches + 1 THEN 'exa_web_search'
+    WHEN c.claimed_lookup_citation + c.claimed_search_sec > t.actual_mcp_calls + 1 THEN 'mcp'
+    ELSE NULL
+  END AS confabulation_method
+FROM telemetry t
+JOIN cert_claims c ON c.session_id = t.session_id
+WHERE t.total_tool_calls IS NOT NULL
+ORDER BY t.session_key DESC
+LIMIT 20;
+```
+
+**Interpretation:**
+- `confabulation_method IS NULL` → cert claims match telemetry (good)
+- `confabulation_method = 'fetch_document'` etc. → cert claims more method-X invocations than telemetry recorded. **Investigate.** The +1 tolerance handles minor counting noise (method name appearing in legend/header).
+
+### Tier-3 health check addition
+
+Add to the `infrastructure-health --tier 3` sweep when `CITATION_DEEP_VERIFICATION=true` is observed in `/health.feature_flags`:
+
+```bash
+# Run cross-check query against last 24h of deep-mode sessions
+psql -d super_legal -c "$(cat <<'SQL'
+SELECT session_key, confabulation_method, actual_fetch_docs, claimed_fetch_docs
+FROM (<query above>) AS audit
+WHERE confabulation_method IS NOT NULL
+  AND created_at > NOW() - INTERVAL '24 hours';
+SQL
+)"
+```
+
+If query returns rows → WARNING (deep mode is confabulating; escalate). If empty → PASSED.
+
+### Proposed Prometheus alert (future work)
+
+Not yet wired — `CitationVerifierMethodConfabulation` would fire when cert claims diverge from `subagent_tool_usage` telemetry. Requires either:
+- DB-query-backed alert (Prometheus doesn't natively query Postgres; would need an exporter), OR
+- Hook-side computation: at SubagentStop, parse the cert, compare to telemetry, emit `citation_verifier_confabulation_total{method}` counter
+
+Tracked as P1 follow-up from PR #130; ~10-min implementation in `hookDBBridge.persistState()`.
+
 ## Alert response runbook
 
 ### `CitationVerifierConfirmationRateLow` (WARNING, <90% 1h)

diff --git a/.claude/skills/session-diagnostics/references/citation-verifier-forensics.md b/.claude/skills/session-diagnostics/references/citation-verifier-forensics.md
@@ -110,6 +110,101 @@ LIMIT 20;
 
 Useful when a regulator question is "show me the source for footnote ^N" — this is the queryable join.
 
+## (f) Cert-vs-telemetry method confabulation check (added 2026-05-12 from PR #130)
+
+PR #130 surfaced that the verifier model can write a certificate claiming tool-based verification methods (e.g., `fetch_document`, `exa_web_search`) that were never actually invoked. This check compares the cert's method-column claims against the authoritative `subagent_tool_usage` telemetry from the SubagentStop hook.
+
+A row where `claimed > actual` indicates **method confabulation** — the cert attributes verifications to tools that didn't fire. This is a regulator-facing data-integrity risk (EU AI Act Art. 13 transparency: the audit trail must reflect what actually happened).
+
+<!-- noqa:07 — schema-doc-validator parses string literals inside length('X')/replace() as column refs; false positive -->
+<!-- noqa:05 -->
+<!-- noqa:04 -->
+```sql
+-- Cert-claims vs telemetry-counts mismatch detector.
+-- Run for any session that ran citation-websearch-verifier (any mode).
+-- Most relevant in deep mode where tool-invocation is expected for most footnotes.
+WITH telemetry AS (
+  SELECT
+    h.session_id,
+    (h.event_data->'tool_counts'->>'exaWebSearches')::int    AS actual_exa,
+    (h.event_data->'tool_counts'->>'fetchDocumentCalls')::int AS actual_fetch,
+    (h.event_data->'tool_counts'->>'mcpCalls')::int           AS actual_mcp,
+    (h.event_data->'tool_counts'->>'totalToolCalls')::int     AS total_calls,
+    h.created_at
+  FROM hook_audit_log h
+  WHERE h.session_id = $1
+    AND h.event_type = 'SubagentStop'
+    AND h.agent_type = 'citation-websearch-verifier'
+    AND h.event_data ? 'tool_counts'
+  ORDER BY h.created_at DESC
+  LIMIT 1
+),
+cert_claims AS (
+  SELECT
+    r.session_id,
+    (length(r.content) - length(replace(r.content, 'fetch_document', '')))
+      / length('fetch_document') AS claimed_fetch,
+    (length(r.content) - length(replace(r.content, 'exa_web_search', '')))
+      / length('exa_web_search') AS claimed_exa,
+    (length(r.content) - length(replace(r.content, 'lookup_citation', '')))
+      / length('lookup_citation')
+    + (length(r.content) - length(replace(r.content, 'search_sec_filings', '')))
+      / length('search_sec_filings') AS claimed_mcp,
+    r.word_count AS cert_word_count
+  FROM reports r
+  WHERE r.session_id = $1
+    AND r.report_type = 'qa'
+    AND r.report_key = 'citation-verification-certificate'
+)
+SELECT
+  c.claimed_fetch, t.actual_fetch, (c.claimed_fetch - t.actual_fetch) AS fetch_gap,
+  c.claimed_exa,   t.actual_exa,   (c.claimed_exa - t.actual_exa)     AS exa_gap,
+  c.claimed_mcp,   t.actual_mcp,   (c.claimed_mcp - t.actual_mcp)     AS mcp_gap,
+  t.total_calls,
+  c.cert_word_count,
+  CASE
+    WHEN (c.claimed_fetch - t.actual_fetch) > 2
+      OR (c.claimed_exa - t.actual_exa) > 2
+      OR (c.claimed_mcp - t.actual_mcp) > 2
+    THEN 'CONFABULATION_SUSPECTED'
+    ELSE 'OK'
+  END AS verdict
+FROM cert_claims c   -- noqa: 04 — CTE alias, not a real table
+LEFT JOIN telemetry t ON t.session_id = c.session_id;   -- noqa: 04
+```
+
+**Interpretation:**
+
+| Result | Meaning |
+|---|---|
+| `verdict = 'OK'`, all gaps ≤ 2 | Cert claims match telemetry within counting noise (method name appearing in legend/header sections). No confabulation. |
+| `verdict = 'CONFABULATION_SUSPECTED'`, fetch_gap > 2 | Cert attributes more `fetch_document` verifications than actually fired. **Investigate.** Likely model confabulated to fill the cert's method-column format. |
+| `total_calls IS NULL` | `subagent_tool_usage` hook didn't fire (pre-T2 image, or session pre-dates SubagentStop hook telemetry capture). Cannot validate; mark inconclusive. |
+| `claimed_fetch = 0, actual_fetch > 0` | Cert doesn't claim any fetch_document usage, but tools were called. May indicate tool failure handling — tool calls were made but cert decided not to attribute (e.g., all returned errors). Worth investigating separately. |
+
+### Forensic output rendering (added to Section 11 of diagnostic report)
+
+When generating session diagnostics for any deep-mode session OR any session where `confabulation_check.verdict = 'CONFABULATION_SUSPECTED'`, include this block:
+
+```
+### 11.6 Cert-vs-Telemetry Confabulation Audit
+
+Verdict: CONFABULATION_SUSPECTED (or OK)
+
+Method     | Cert claims | Actual telemetry | Gap
+---------- | ----------- | ---------------- | ---
+fetch_doc  |          17 |                0 |  17  ⚠
+exa_search |           4 |                3 |   1
+mcp        |           0 |                4 |  -4
+
+Interpretation: cert attributes 17 fetch_document verifications, but subagent_tool_usage hook
+recorded zero such invocations. The verifier model wrote method labels matching the expected
+cert format without actually invoking the tools. This is regulator-facing data-integrity risk
+— escalate to dev team for prompt-hardening review.
+```
+
+This is the operator-facing manifestation of the P1 finding from PR #130.
+
 ## Output format
 
 In the session-diagnostics report (Section 11), produce:

diff --git a/super-legal-mcp-refactored/docs/feature-flags.md b/super-legal-mcp-refactored/docs/feature-flags.md
@@ -575,11 +575,36 @@ Flags deeper in the tree have no effect when their parent is OFF. For example, `
 - Duration: 1-5 min
 - Agent only confirms sources exist (HTTP 200/401/403 = confirmed) without evaluating content
 
-**Cost differential: 338x** between modes. Source Existence mode is the recommended starting point for initial G5 rollout.
+**Cost differential: 338x** between modes (per agent-file estimate). **Measured 4.4x** on 65-footnote test (PR [#130](https://github.com/Number531/Legal-API/pull/130)) — actual ratio dominated by cache-read cost (3x flat between models) rather than work multiplier. Source Existence mode is the recommended starting point for initial G5 rollout.
+
+#### Production readiness status (2026-05-12)
+
+| Mode | Validation | Status |
+|---|---|---|
+| **Existence** (`false`, default) | PRs [#118](https://github.com/Number531/Legal-API/pull/118) + [#119](https://github.com/Number531/Legal-API/pull/119) — production-fidelity A/B on unlabeled 467-footnote Project Nexus fixture | ✅ **Production-validated** at 96.8% (Exa arm) / 96.1% (Anthropic arm), both PASS gate |
+| **Deep** (`true`) | PR [#130](https://github.com/Number531/Legal-API/pull/130) — Sonnet-vs-Haiku A/B on **labeled** 65-footnote "A/B SUBSET" fixture | ⚠️ **NOT production-validated.** Sonnet-deep mechanically functions (gate checks pass, 96.7% confirmation rate) but tool-invocation rigor was lower than expected (12 real tool calls for 65 footnotes; 42 confirmations used "structural" / "reporter knowledge" patterns). Fixture's `# HAIKU/SONNET DEEP-MODE A/B SUBSET` header may have signaled "test environment" and biased model behavior toward shortcutting. Haiku-deep confabulated entirely (zero real verification tool calls; cert claimed `fetch_document` / `exa_web_search` methods 17 times — see PR #130 for forensic detail). |
+
+#### Pre-flip checklist (before setting `CITATION_DEEP_VERIFICATION=true` in production)
+
+Required validation steps — do NOT enable deep mode without completing these:
+
+1. **Re-run the PR #130 harness against the unlabeled production fixture** (Project Nexus 393-footnote `reports/2026-03-07-1772900028/consolidated-footnotes.md`, NOT the labeled "A/B SUBSET" sample). Estimated cost: ~$15 (Sonnet-deep × 393 footnotes prorated). Time: ~30 min.
+   - Use `test/sdk/citation-verifier-model-ab-driver.mjs` with `--arms sonnet`
+   - Override the fixture path or use a clean unlabeled copy
+2. **Verify tool-invocation rate matches prompt expectation.** The verifier prompt instructs "10-15 `fetch_document` calls per turn" — confirm `subagent_tool_usage.tool_counts` reflects real invocation, not pattern-knowledge shortcutting.
+3. **Check cert↔telemetry method alignment.** Cross-reference cert method-column claims against `subagent_tool_usage` event counts. Discrepancies = confabulation risk. See `.claude/skills/infrastructure-health/references/citation-verifier-telemetry.md` § "Detecting cert confabulation" for the query.
+4. **Recalibrate alert thresholds.** Existing `CitationVerifierConfirmationRateLow` / `Critical` alerts in `prometheus/alerts.yml` filter by `{mode="source_existence"}`. Deep mode runs would be silently un-alerted. Either:
+   - Clone the alert rules with `{mode="full_content"}` filter at thresholds calibrated against the deep-mode baseline measured in step 1, OR
+   - Generalize the existing rules to fire on any mode
+5. **Cost monitoring.** Deep mode at ~$6.76/memo × N memos/month is materially different from existence mode at ~$0.02/memo. Confirm cost dashboards trend this before enabling.
+
+**Rollback path.** If deep mode is enabled and the rigor concern materializes (cert confabulation detected, or unexpected cost spike), `CITATION_DEEP_VERIFICATION=false` in `flags.env` instantly reverts to existence mode with no schema or code change needed. The verifier subagent re-resolves model + strategy at module load on next session.
 
 **Files:**
 - `src/config/legalSubagents/agents/citation-websearch-verifier.js` — lines 19-334 (model selection, strategy selection, duration estimates)
 - `test/sdk/citation-websearch-verifier.test.js` — dual-mode tests
+- `test/sdk/citation-verifier-model-ab-driver.mjs` — deep-mode A/B harness (PR #130)
+- `docs/runbooks/citation-verifier-model-ab-2026-05-12-CORRECTED.md` — PR #130 final report with full forensic detail
 
 ---