Number531 · Number531 · May 12, 2026 · May 12, 2026
diff --git a/.claude/skills/client-audit-export/SKILL.md b/.claude/skills/client-audit-export/SKILL.md
@@ -95,7 +95,7 @@ Compatible with `shasum -a 256 -c manifest.txt` for verification.
 - **`encrypted_value` exclusion** — hard-coded in `range-query.py`'s SELECT clause. Adding a new PII column requires explicit allow-list change here AND update to `references/art-13-fields.md`.
 - **Manifest regeneration** — manifest is rebuilt every run. Operators verifying chain-of-custody check the file hash against the manifest, not the upload date.
 - **WORM upload** — files land in the per-client WORM bucket with Object Lock. Cannot be modified or deleted by anyone (including project owner) until lock period elapses.
-- **A3 query capture verification (v7.6.1)** — `range-query.py` counts `hook_audit_log` rows with `event_data ? 'exa_a3'` over the export window and writes the count into the manifest. Zero rows during a window with active A3 traffic = forensic gap; investigate `hookDBBridge` wiring (PR #114 defect 4.3.2). The full `event_data.exa_a3` JSONB (additional_queries[], query_count, ab_arm, ab_outcome) is included in the standard `hook_audit_log__csv.gz` export, satisfying EU AI Act Art. 12 query-reconstruction requirements.
+- **A3 query capture verification (v7.6.1)** — `range-query.py` counts `hook_audit_log` rows with `event_data ? 'exa_a3'` over the export window and writes the count into the manifest. **Interpretation depends on flag state during the window**: (a) if `EXA_ADDITIONAL_QUERIES=true` was active and ≥1 session ran, zero rows = forensic gap → investigate `hookDBBridge` wiring (PR #114 defect 4.3.2); (b) if `EXA_ADDITIONAL_QUERIES=false` during the window, zero rows is expected and no action is needed (do not file incident). When non-zero, the full `event_data.exa_a3` JSONB (`additional_queries[]`, `query_count`, `ab_arm`, `ab_outcome`, `otel_trace_id`) is included in the standard `hook_audit_log__csv.gz` export, satisfying EU AI Act Art. 12 query-reconstruction requirements.
 
 ## Output report
 

diff --git a/.claude/skills/deploy/SKILL.md b/.claude/skills/deploy/SKILL.md
@@ -37,6 +37,16 @@ Report these to the user:
 - Health status and feature flags (from script output)
 - Any warnings or issues encountered
 
+### Production flag verification
+
+`curl http://34.26.70.60:3001/health | jq '.feature_flags'` should show the Exa production state established in `flags.env`:
+
+- `EXA_WEB_TOOLS=true` — production-locked since 2026-04-18 (PR [#76](https://github.com/Number531/Legal-API/pull/76)); validated 2026-05-12 via PRs [#118](https://github.com/Number531/Legal-API/pull/118)/[#119](https://github.com/Number531/Legal-API/pull/119) (96.8% Exa vs 96.1% Anthropic citation-verifier rate, both PASS gate).
+- `EXA_ADDITIONAL_QUERIES=true` — all-treatment rollout in production `flags.env` since 2026-05-11.
+- `EXA_ADDITIONAL_QUERIES_AB_SAMPLE=0.0` — all-treatment (set to `0.5` only for staging A/B windows; see `docs/runbooks/exa-a3-ab-staging.md`).
+
+If any of these flip to `false` after a deploy, the deploy regressed an env-var; do not advance traffic until reconciled.
+
 ## Pre-Deploy Checks
 
 Before running the script, verify these prerequisites. The script validates Docker but cannot fix auth or project issues.

diff --git a/.claude/skills/subagent-scaffold/SKILL.md b/.claude/skills/subagent-scaffold/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: subagent-scaffold
-description: Generate a new Claude Agent SDK subagent across all 7 mandatory wiring files. Mirrors the equity-analyst canonical template — agent file in legalSubagents/agents/, index.js import + registration tuple, _promptConstants.js CAPABILITY constant, domainMcpServers.js SUBAGENT_DOMAIN_MAP entry, hookSSEBridge.js classifyAgent map, optional p0GateHook.js RESEARCH_AGENTS Set, catalogDisplay/agentClassifications.js + agentDisplayMeta.js. Triggers — subagent scaffold, new subagent, generate agent, /subagent-scaffold. Supports flags — --name <kebab-name>, --phase research|synthesis|qa, --domains <comma-list>, --keywords <comma-list>, --a3-eligible (auto-include EXA_ADDITIONAL_QUERIES_GUIDANCE for subagents that use Exa-routable tools).
+description: Generate a new Claude Agent SDK subagent across all 7 mandatory wiring files. Mirrors the equity-analyst canonical template — agent file in legalSubagents/agents/, index.js import + registration tuple, _promptConstants.js CAPABILITY constant, domainMcpServers.js SUBAGENT_DOMAIN_MAP entry, hookSSEBridge.js classifyAgent map, optional p0GateHook.js RESEARCH_AGENTS Set, catalogDisplay/agentClassifications.js + agentDisplayMeta.js. Triggers — subagent scaffold, new subagent, generate agent, /subagent-scaffold. Supports flags — --name <kebab-name>, --phase research|synthesis|qa, --domains <comma-list>, --keywords <comma-list>, --a3-eligible (RECOMMENDED for --phase research; auto-includes EXA_ADDITIONAL_QUERIES_GUIDANCE — pre-wires the orchestrator query-variation prompt for Exa-routable tools).
 ---
 
 # Subagent Scaffold — Generate a New Agent SDK Subagent

diff --git a/super-legal-mcp-refactored/company-strategy/enterprise-necessities.md b/super-legal-mcp-refactored/company-strategy/enterprise-necessities.md
@@ -763,7 +763,9 @@ All clients use the BaseHybridClient pattern: native API first, automatic fallba
 | `ENABLE_GEMINI_FILTERING` | false | Gemini-based content filtering |
 | `PTAB_PERMISSIVE_MODE` | false | Lenient PTAB API error handling |
 | `ENHANCED_SUMMARY_QUERIES` | true | Enhanced summary generation |
-| `EXA_WEB_TOOLS` | true | Exa-powered web search tools in agent context |
+| `EXA_WEB_TOOLS` | true | Exa-powered web search tools (`fetch_document`, `exa_web_search`) replacing Anthropic `WebFetch`/`WebSearch`. **Production-locked since 2026-04-18 (PR [#76](https://github.com/Number531/Legal-API/pull/76)); validated 2026-05-12 via production-fidelity A/B (PRs [#118](https://github.com/Number531/Legal-API/pull/118)/[#119](https://github.com/Number531/Legal-API/pull/119)): 96.8% Exa vs 96.1% Anthropic on 467-footnote citation-verifier fixture, both PASS gate.** |
+| `EXA_ADDITIONAL_QUERIES` | true (production all-treatment since 2026-05-11) | Orchestrator-authored `additionalQueries` forwarded to Exa /search across 20 high-traffic MCP tools (v7.1.0 → v7.6.2) |
+| `EXA_ADDITIONAL_QUERIES_AB_SAMPLE` | 0.0 | A/B split for staging quality-lift measurement (set to 0.5 for balanced split) |
 | `RAW_SOURCE_ARCHIVE` | true | Content-addressed raw source capture + SHA-256 hashing for audit traceability (v6.0.0) |
 | `PROMPT_INJECTION_DETECTION` | true | Regex-based injection detection on tool outputs (v6.0.0) |
 | `SLA_TELEMETRY` | true | Per-tool latency histograms + 7-day SLA dashboard (v6.0.0) |

diff --git a/super-legal-mcp-refactored/company-strategy/gtm-buyer-intelligence.md b/super-legal-mcp-refactored/company-strategy/gtm-buyer-intelligence.md
@@ -506,7 +506,7 @@ Every PE fund, investment bank, and law firm should ask these 10 questions of an
 | **5** | **Can you show the complete audit trail for how a specific conclusion was reached?** | No. Conversational interface with no disclosed provenance chain. | Partial. Shows source documents but no full provenance chain through intermediate reasoning. | Yes. Session-level traceability: agent → database queried → API response → fact registry → section draft → QA score → remediation → final memorandum. |
 | **6** | **Does your system test for completeness — not just accuracy of what it produces?** | No disclosed completeness testing. BigLaw Bench tests answer quality on provided tasks only. | No disclosed mechanism for detecting gaps in its own research scope. | Phase 2 research review includes explicit completeness checks. QA "Completeness" dimension (10 points) scores whether all material issues are addressed. |
 | **7** | **What is your citation validation methodology?** | BigLaw Bench reports 68% source reliability — 32% unreliable. No disclosed methodology for improvement. | Links to Westlaw sources. No disclosed programmatic citation validation independent of the model. | Three-layer: (1) Bluebook standards in agent prompts, (2) programmatic Python validation independent of LLM, (3) QA "Citation Quality" dimension (12 points — highest-weighted single dimension). |
-| **8** | **How many regulatory databases does your system query directly — not via web search?** | Not disclosed. No disclosed direct API integrations with government databases. | Westlaw + Practical Law (Thomson Reuters proprietary). No disclosed government database integrations. | 50+ database integrations via 134 domain-specific tools: SEC EDGAR, CourtListener, USPTO, FDA, EPA, Federal Register, GovInfo, BLS, ClinicalTrials.gov, USAspending, SAM.gov, ECB, ECHR, EUR-Lex, EPO, and more. |
+| **8** | **How many regulatory databases does your system query directly — not via web search?** | Not disclosed. No disclosed direct API integrations with government databases. | Westlaw + Practical Law (Thomson Reuters proprietary). No disclosed government database integrations. | 50+ database integrations via 134 domain-specific tools: SEC EDGAR, CourtListener, USPTO, FDA, EPA, Federal Register, GovInfo, BLS, ClinicalTrials.gov, USAspending, SAM.gov, ECB, ECHR, EUR-Lex, EPO, and more. Web-source citations (state-court rules, agency enforcement bulletins) route through Exa's MCP tools — blind A/B-validated 2026-05-12 at 96.8% confirmation on a 370-footnote production fixture (PRs [#118](https://github.com/Number531/Legal-API/pull/118)/[#119](https://github.com/Number531/Legal-API/pull/119)), both PASS production gate. |
 | **9** | **Does your system enforce least-privilege access for internal components?** | Not disclosed. | Not disclosed. | Yes. 25 domain-scoped MCP servers partition the 134-tool catalog. Each specialist agent receives only tools relevant to its domain (84-93% reduction). |
 | **10** | **Will you submit to an independent, blinded evaluation against human expert work product?** | Not disclosed. BigLaw Bench is self-designed and self-scored. | Participated in VLAIR (third-party) but tests task completion, not memorandum-grade output. | Yes. Independent blind evaluation on roadmap: law school professors design rubric, retired M&A partners score anonymized output. Architecture is built to survive this test. |
 

diff --git a/super-legal-mcp-refactored/company-strategy/gtm-positioning-strategy.md b/super-legal-mcp-refactored/company-strategy/gtm-positioning-strategy.md
@@ -135,6 +135,7 @@ USER QUERY + DOCUMENTS
 | Words per memorandum | 100,000+ | vs. 10-30 pages from traditional firms (varies by deal complexity) |
 | Citations per memorandum | 523+ unique | Bluebook 22nd Edition format |
 | Citation verification rate | 99%+ | Against live databases; uncertainties explicitly tagged for user review |
+| Citation websearch verifier (G5) | 96.8% confirmed (Exa) / 96.1% (Anthropic), both PASS production gate | Independent blind A/B on 467-footnote production fixture, 2026-05-12 (PRs [#118](https://github.com/Number531/Legal-API/pull/118) + [#119](https://github.com/Number531/Legal-API/pull/119)). Exa MCP tools (`fetch_document`, `exa_web_search`) are the production-default verifier path since 2026-04-18. |
 | Time to delivery | 2 hours 47 minutes | vs. 6-8 weeks for equivalent manual diligence |
 | Client configurability | 11 surfaces | Domain selection, database selection, agent roster, depth parameters, QA thresholds, certification floors, risk tolerance, deliverable format, remediation cycles, tool-level toggles, model routing |
 | Subscription pricing | $400K+/month | Deal infrastructure, not per-seat SaaS |

diff --git a/super-legal-mcp-refactored/company-strategy/gtm-sales-playbook.md b/super-legal-mcp-refactored/company-strategy/gtm-sales-playbook.md
@@ -517,7 +517,7 @@ These four objections come up in nearly every compliance-conscious buyer convers
 
 **Q1: "How do you handle EU AI Act?"**
 
-> Articles 12-15 are mapped row-by-row to shipping artifacts. Article 12 logging is `hook_audit_log` + `access_log`. Article 13 transparency is the audit-export endpoint at `/api/session/:sessionKey/audit-report`. Article 14 human oversight is the admin router (`/halt`, `/override`, `/legal-hold`) with everything written to `human_interventions`. Article 15 reproducibility is the byte-replay envelope on every code execution — `system_prompt_hash + python_code + git_sha + sdk_version + container_id + anthropic_request_id`. We don't claim "AI Act ready"; we ship the artifacts. Bring your auditor.
+> Articles 12-15 are mapped row-by-row to shipping artifacts. Article 12 logging is `hook_audit_log` + `access_log`. Article 13 transparency is the audit-export endpoint at `/api/session/:sessionKey/audit-report`. Article 14 human oversight is the admin router (`/halt`, `/override`, `/legal-hold`) with everything written to `human_interventions`. Article 15 reproducibility is the byte-replay envelope on every code execution — `system_prompt_hash + python_code + git_sha + sdk_version + container_id + anthropic_request_id`. Citation reproducibility extends through the verification layer itself: every footnote is checked against live regulatory/government databases via Exa MCP tools (production-default since 2026-04-18; blind A/B-validated 2026-05-12 at 96.8% confirmation on 370 footnotes — PRs [#118](https://github.com/Number531/Legal-API/pull/118)/[#119](https://github.com/Number531/Legal-API/pull/119)). We don't claim "AI Act ready"; we ship the artifacts. Bring your auditor.
 
 **Q2: "What about GDPR data deletion?"**
 

diff --git a/super-legal-mcp-refactored/company-strategy/system-design.md b/super-legal-mcp-refactored/company-strategy/system-design.md
@@ -371,7 +371,7 @@ The P0 agent runs in a **dedicated agentQuery** before the main orchestrator, wi
 | `withWrite` | Read, Grep, Glob, Write, Edit |
 | `withWriteAndWeb` | Read, Grep, Glob, Write, Edit, WebFetch*, WebSearch* |
 
-*When EXA_WEB_TOOLS=true: WebFetch → fetch_document, WebSearch → exa_web_search (Exa-powered MCP tools)
+*Production config (EXA_WEB_TOOLS=true since 2026-04-18, PR [#76](https://github.com/Number531/Legal-API/pull/76)): WebFetch → fetch_document, WebSearch → exa_web_search (Exa-powered MCP tools). Validated 2026-05-12 (PRs [#118](https://github.com/Number531/Legal-API/pull/118) + [#119](https://github.com/Number531/Legal-API/pull/119)) — Exa arm 358/370 = 96.8% vs Anthropic arm 340/354 = 96.1% on 467-footnote citation-verifier fixture; both PASS production gate.
 
 ### 4.3 Agent Model Selection
 
@@ -495,7 +495,7 @@ The `SessionStart` hook performs similar recovery on session resume, checking fo
 
 ## 6. Tool & Domain Architecture
 
-### 6.1 Tool Inventory (148 Tools across 30 Domains, +2 with EXA_WEB_TOOLS)
+### 6.1 Tool Inventory (150 Tools across 30 Domains — Exa MCP tools production-default since 2026-04-18)
 
 | Domain | Tool Count | Primary API | Example Tools |
 |--------|-----------|-------------|---------------|
@@ -528,8 +528,8 @@ The `SessionStart` hook performs similar recovery on session resume, checking fo
 | `analysis` | 1 | Exa comprehensive | comprehensive_legal_entity_analysis |
 | `filing` | 1 | Internal | draft_legal_filing |
 | `state-statutes` | 1 | Exa web search | search_state_statute |
-| `direct-fetch` | 1 | Exa `/contents` | fetch_document (conditional: EXA_WEB_TOOLS) |
-| `exa-search` | 1 | Exa search API | exa_web_search (conditional: EXA_WEB_TOOLS) |
+| `direct-fetch` | 1 | Exa `/contents` | fetch_document (production-default; replaces WebFetch) |
+| `exa-search` | 1 | Exa search API | exa_web_search (production-default; replaces WebSearch) |
 | `code-execution` | 1 | Anthropic sandbox | run_python_analysis (conditional) |
 
 ### 6.2 Hybrid Client Pattern
@@ -560,7 +560,7 @@ Behind `SCOPED_MCP_SERVERS=false` (default OFF):
 
 | Mode | MCP Servers | Tools Per Agent | Tool Name Pattern |
 |------|-------------|----------------|-------------------|
-| **Monolithic** (default) | 1 (`super-legal-tools`) | ~98 (all, +2 with EXA_WEB_TOOLS) | `mcp__super-legal-tools__search_sec_filings` |
+| **Monolithic** (default) | 1 (`super-legal-tools`) | ~150 (includes fetch_document + exa_web_search as production-default tools) | `mcp__super-legal-tools__search_sec_filings` |
 | **Scoped** | 25+ domain servers | 4-21 (per agent) | `mcp__sec__search_sec_filings` |
 
 **Subagent-to-Domain Mapping** (when scoped):
@@ -1880,7 +1880,9 @@ These invariants are why the v6.8.5 audit-export endpoint can claim byte-faithfu
 | `CITATION_CHAT` | `true` | Session-scoped RAG Q&A with Anthropic Citations API (requires EMBEDDING_PERSISTENCE) |
 | `KNOWLEDGE_GRAPH` | `true` | 10-phase KG extraction, provenance chains, force-graph visualization, graph Q&A (requires EMBEDDING_PERSISTENCE + HOOK_DB_PERSISTENCE) |
 | `AUTH_ENABLED` | `true` | Cookie-based authentication with bcrypt password hashing |
-| `EXA_WEB_TOOLS` | `true` | Exa-powered fetch_document + exa_web_search replacing WebFetch/WebSearch |
+| `EXA_WEB_TOOLS` | `true` | Exa-powered fetch_document + exa_web_search replacing WebFetch/WebSearch. **Production-locked since 2026-04-18 (PR [#76](https://github.com/Number531/Legal-API/pull/76)); validated 2026-05-12 via PRs [#118](https://github.com/Number531/Legal-API/pull/118) + [#119](https://github.com/Number531/Legal-API/pull/119) — 96.8% Exa vs 96.1% Anthropic on 467-footnote fixture, both PASS gate.** |
+| `EXA_ADDITIONAL_QUERIES` | `false` (production: `true`, all-treatment) | Orchestrator-authored `additionalQueries` forwarded to Exa /search across 20 high-traffic MCP tools (v7.1.0 → v7.6.2). All-treatment rollout in production `flags.env` since 2026-05-11. |
+| `EXA_ADDITIONAL_QUERIES_AB_SAMPLE` | `0.0` | A/B split fraction for staging quality-lift measurement (set to `0.5` for balanced split). |
 | `PROMPT_ENHANCEMENT` | `true` | Intake research pre-phase for short queries (< 1000 chars) |
 | `RAW_SOURCE_ARCHIVE` | `true` | Content-addressed raw source capture + SHA-256 hashing for audit traceability (v6.0.0) |
 | `PROMPT_INJECTION_DETECTION` | `true` | Regex-based injection detection on tool outputs — OWASP LLM Top 10 (v6.0.0) |
@@ -2485,7 +2487,7 @@ When `DOCUMENT_PROCESSING=true`, two sequential `agentQuery()` calls run (P0 + m
 | 1 | ~~**PostgreSQL Migration**~~ | **SHIPPED** (v4.0.0, Issue #30) | 5-tier DB enhancements | Cross-session analytics, ACID storage. Hook-to-DB bridge with 6 tables, 16+ indexes, frontend query router. |
 | 2 | **JSON Structured Reports** | [#10](https://github.com/Number531/Legal-API/issues/10) | ~3,000 lines, 18 files | Zod schemas for all 42 subagent outputs. Enables frontend rendering of structured findings, machine-readable risk tables, and API consumption of research results. |
 | 3 | **Document Processing (P0 Enable)** | [#8](https://github.com/Number531/Legal-API/issues/8) | Flag flip + integration testing | `DOCUMENT_PROCESSING=true` — P0 pre-wave subagent for client document upload and extraction. Code complete, empirical evidence in spec that prompt-only steering fails without enforcement gate. |
-| 4 | ~~**G5 Citation WebSearch Verification**~~ | **SHIPPED** (v3.7.4, flag now `true` by default) | 60 tests passing | Independent websearch verification of every footnote before final synthesis. Dual-mode (existence haiku / full-content sonnet). Tiered hybrid strategy (WebFetch -> Exa MCP -> Anthropic WebSearch). W5-004 tag downgrade pipeline. |
+| 4 | ~~**G5 Citation WebSearch Verification**~~ | **SHIPPED** (v3.7.4, flag now `true` by default; Exa-primary since 2026-04-18) | 60 tests passing + production-fidelity A/B validation (2026-05-12, PRs [#118](https://github.com/Number531/Legal-API/pull/118)/[#119](https://github.com/Number531/Legal-API/pull/119): 96.8% Exa vs 96.1% Anthropic, both PASS) | Independent websearch verification of every footnote before final synthesis. Dual-mode (existence haiku / full-content sonnet). Tiered hybrid strategy with **Exa MCP tools (`fetch_document`, `exa_web_search`) as the production-default verifier path** — Anthropic `WebSearch`/`WebFetch` retained as SDK-level fallback only. W5-004 tag downgrade pipeline. |
 | 5 | ~~**Database Enhancements (5-Tier)**~~ | **SHIPPED** (v4.0.0, Issue #30) | 5 tiers implemented | Hook-to-DB bridge persisting sessions, agent audit, gate checks, tool calls, code execution, remediation tracking. `HOOK_DB_PERSISTENCE=true`. |
 | 6 | **Files API Chart Extraction** | Merged to main (v4.1.0) | Feature-flagged | Charts extracted from code execution sandbox via `files-api-2025-04-14` beta. Persisted to `reports/{session}/charts/`. Two flags: `FILES_API_CHART_EXTRACTION`, `CHART_PERSISTENCE`. |