Number531 · Number531 · May 7, 2026 · May 7, 2026 · May 7, 2026 · May 7, 2026
diff --git a/.claude/skills/api-integration/SKILL.md b/.claude/skills/api-integration/SKILL.md
@@ -7,12 +7,13 @@ description: Build and integrate a new API client into the Super Legal MCP platf
 
 ## Overview
 
-This skill integrates a new API data source into the Super Legal MCP platform following the canonical pattern used by all 36 existing clients. The process produces a fully operational hybrid client with native-first routing, Exa two-phase fallback (search + /contents enrichment), circuit breaker protection, caching, observability, and frontend catalog display.
+This skill integrates a new API data source into the Super Legal MCP platform following the canonical pattern used by all 38 existing clients. The process produces a fully operational hybrid client with native-first routing, Exa two-phase fallback (search + /contents enrichment), circuit breaker protection, caching, observability, and frontend catalog display.
 
-**Current platform state** (update these counts after each integration):
-- API clients: 36
-- Base MCP domains: 33 (+conditional: code-execution, direct-fetch, exa-search)
-- Tool schemas: 149+
+**Current platform state** (v7.0.1; update these counts after each integration):
+- API clients: 38 (+FMP equity-research gated by FMP_ENABLED, +DirectFetch)
+- Base MCP domains: 34 (+conditional: code-execution, direct-fetch, exa-search, equities)
+- Tool schemas: 197 (161 base + 36 FMP equity tools when FMP_ENABLED=true)
+- For high-precision integrations (live financial data, regulatory APIs), follow the FMP-derived empirical-first methodology — see Phase 1.5 (Empirical Capture & Probing) and Phase 1.6 (Endpoint Classification) below.
 - Production entry point: `Dockerfile:59` → `bootstrap.js` → `claude-sdk-server.js` → `clientRegistry.js`
 - `EnhancedLegalMcpServer.js` is legacy/local-dev only — do NOT wire new clients there
 

diff --git a/.claude/skills/client-backup-restore/SKILL.md b/.claude/skills/client-backup-restore/SKILL.md
@@ -187,10 +187,23 @@ gcloud sql backups restore {backup_id} \
 
 | Type | What's included | Size estimate | Duration |
 |---|---|---|---|
-| `full` | Database + reports directory | ~150-300 MB | 3-5 min |
-| `database-only` | Cloud SQL export (all tables + data) | ~30-80 MB | 1-2 min |
+| `full` | Database + reports directory | ~200-400 MB (v7.0.0+: includes transcript_events ~700KB-1MB/session) | 3-6 min |
+| `database-only` | Cloud SQL export (all tables + data) | ~30-100 MB (larger on v7.0.0+ deployments due to transcript_events) | 1-3 min |
 | `reports-only` | Reports directory (sessions, raw sources) | ~100-250 MB | 2-4 min |
 
+**v7.0.0+ database-only scope** — Cloud SQL export captures all tables, including the new compliance/observability tables introduced in v7.0.0:
+- `transcript_events` — full SSE event history per session (~700KB-1MB per session × N sessions). **Largest growth vector** at 10K+ sessions.
+- `code_executions` (now with 13+ reproducibility columns: model_id, llm_name, anthropic_request_id, anthropic_message_id, system_prompt_hash, python_code, container_id, tool_use_id, stop_reason, turn_count, pause_count, refusal_detected, etc.) — required for byte-replay envelope per EU AI Act Art. 15
+- `code_execution_inputs` — data lineage junction (small table, 1-5 rows per execution)
+- `citation_source_links` — citation→source bridge with confidence scores (1 row per matched citation)
+- `hook_audit_log` — now includes `bridge_metadata` JSONB column with `git_sha + sdk_version + container_id + system_prompt_hash` (regulator-replay envelope)
+
+Restore verification (Phase 4) should confirm these row counts post-restore for v7.0.0+ deployments:
+- `SELECT COUNT(*) FROM transcript_events` matches pre-backup count
+- `SELECT COUNT(*) FROM code_executions WHERE model_id IS NOT NULL` matches pre-backup count (NULL model_id = pre-v6.8.4 row, allowed)
+- `SELECT COUNT(*) FROM citation_source_links` matches pre-backup count
+- `SELECT event_data->'bridge_metadata' IS NOT NULL FROM hook_audit_log WHERE tool_name='run_python_analysis'` — bridge_metadata preserved on restore
+
 ## Storage Locations
 
 All backups stored in the client's WORM bucket:

diff --git a/.claude/skills/client-offboarding/SKILL.md b/.claude/skills/client-offboarding/SKILL.md
@@ -55,7 +55,12 @@ bash /Users/ej/Super-Legal/.claude/skills/client-offboarding/scripts/offboard-cl
 
 **Step 7**: Verify archives — checks that archive files exist in GCS and have non-zero size. Reports checksums.
 
-**Step 6.5**: Archive Wave 3 audit tables as dedicated CSV artifacts — `access_log` (EU AI Act Article 12 read-side evidence) and `human_interventions` (EU AI Act Article 14 operator governance evidence) exported via `psql COPY TO STDOUT` + gzip to `gs://super-legal-worm-{client_id}/archive/{table}-{date}.csv.gz`. Cloud SQL's native `gcloud sql export csv` doesn't support table-level `--query` filtering, so the script uses psql directly against the connection string resolved from Secret Manager. Runs AFTER archive verification (Step 7) and BEFORE any destructive deletion (Phase 3) — these tables must survive the DB drop as standalone legal records. Legacy clients predating Wave 3 (tables don't exist) gracefully skip via `2>/dev/null || warn`. v6.5.1+ instances also have `hook_audit_log WHERE event_type = 'KGBuild'` entries — include in archive for KG build audit trail. Requires v6.6.0+ for complete telemetry (background task tracking, pool survival).
+**Step 6.5**: Archive compliance audit tables as dedicated CSV artifacts. **Wave 3 tables**: `access_log` (EU AI Act Article 12 read-side evidence) and `human_interventions` (EU AI Act Article 14 operator governance evidence). **v7.0.0 tables** (added in this scope):
+- `transcript_events` — full SSE event history per session (~700KB-1MB per session × N sessions; may be largest archive file). Required for byte-faithful session-reload audit if regulator queries any session history.
+- `citation_source_links` — citation→raw-source bridge with confidence scores. Required for hallucination audit (any citation with `confidence < 0.85` flagged for QA review at session-time should be reproducible from this archive).
+- `code_execution_inputs` — data lineage junction linking code executions to upstream subagent reports/embeddings/KG nodes. Required for EU AI Act Art. 15 reproducibility chain ("which subagent's output drove this DCF result?").
+
+All exported via `psql COPY TO STDOUT` + gzip to `gs://super-legal-worm-{client_id}/archive/{table}-{date}.csv.gz`. Cloud SQL's native `gcloud sql export csv` doesn't support table-level `--query` filtering, so the script uses psql directly against the connection string resolved from Secret Manager. Runs AFTER archive verification (Step 7) and BEFORE any destructive deletion (Phase 3) — these tables must survive the DB drop as standalone legal records. Legacy clients predating each table gracefully skip via `2>/dev/null || warn`. v6.5.1+ instances also have `hook_audit_log WHERE event_type = 'KGBuild'` entries — include in archive for KG build audit trail. **v7.0.0+ instances** have `hook_audit_log` rows with `bridge_metadata` JSONB (`git_sha + sdk_version + container_id + system_prompt_hash`) — these are the regulator-replay envelope and MUST be preserved in the audit log archive. Requires v6.6.0+ for complete telemetry (background task tracking, pool survival).
 
 ### Phase 3: Resource Deletion (DESTRUCTIVE — requires --confirm)
 

diff --git a/.claude/skills/code-execution-models/SKILL.md b/.claude/skills/code-execution-models/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: code-execution-models
-description: Add new financial models to the code execution sandbox catalog. Use when the user asks to "add a model", "create a financial model", "add [model name] to code execution", "new analysis model", or wants to expand the PE/IB/M&A quantitative analysis toolkit. The sandbox runs Claude-generated Python (pandas, numpy, scipy, sklearn, matplotlib, seaborn) via the Anthropic code_execution_20260120 tool to produce structured JSON results, charts (PNG), and formatted tables. Currently 45 models across 13 categories. Also use when the user says "/code-execution-models".
+description: Add new financial models to the code execution sandbox catalog. Use when the user asks to "add a model", "create a financial model", "add [model name] to code execution", "new analysis model", or wants to expand the PE/IB/M&A quantitative analysis toolkit. The sandbox runs Claude-generated Python (pandas, numpy, scipy, sklearn, matplotlib, seaborn) via the Anthropic code_execution_20260120 tool to produce structured JSON results, charts (PNG), and formatted tables. Currently 56 models across 13 categories (M46–M55, M58 added in v7.0.0 for FMP equity research, gated by FMP_ENABLED). Also use when the user says "/code-execution-models".
 ---
 
 # Code Execution Models — Add Financial Analysis Model
@@ -159,7 +159,7 @@ Append to the `CODE_EXECUTION_MODELS` array after the last entry:
 }
 ```
 
-**Format guidelines** (match existing 45 models):
+**Format guidelines** (match existing 56 models):
 - `description`: 100-300 words, business-context-rich, mentions charts/tables produced
 - `methodology`: cites specific standards (ASC, IRC, academic papers) with thresholds
 - `outputFormat`: explicitly states chart types and table formats to generate

diff --git a/.claude/skills/deploy/SKILL.md b/.claude/skills/deploy/SKILL.md
@@ -137,6 +137,42 @@ gcloud compute instances add-access-config $INSTANCE --zone=us-east1-c --access-
 gcloud compute ssh $INSTANCE --zone=us-east1-c --command='docker restart $(docker ps -q | head -1)'
 ```
 
+### Variant: MIG instance replacement mid-retries
+
+**Observed on**: 2026-05-06 v7.0.1 deploy.
+
+**Symptom**: Step 7 retries 5x with `Could not fetch resource: super-legal-staging-XXXX` even though the script logs `IP is RESERVED` and proceeds to `Attempt 1/5: Assigning ...`. The log line keeps showing the SAME instance name across all 5 attempts. Meanwhile, `gcloud compute instances list` reveals a DIFFERENT instance name is actually running.
+
+**Root cause**: The MIG terminated the instance the script was targeting (e.g., `super-legal-staging-0239`) and rolled forward to a new one (e.g., `super-legal-staging-bzx4`) DURING step 7's retry budget. The script captured the original instance name in step 6 and did not re-resolve it on each retry. Every `add-access-config` call hits a deleted resource.
+
+**Detection between retries**:
+```bash
+gcloud compute instances list --filter='name~super-legal-staging AND status=RUNNING' --format='value(name)'
+```
+If this returns a different instance name than what the script's log shows, the variant has triggered.
+
+**Manual recovery on the new instance**:
+```bash
+NEW_INSTANCE=$(gcloud compute instances list --filter='name~super-legal-staging AND status=RUNNING' --format='value(name)' | head -1)
+gcloud compute instances delete-access-config $NEW_INSTANCE --zone=us-east1-c --access-config-name=external-nat --quiet
+sleep 10
+gcloud compute instances add-access-config $NEW_INSTANCE --zone=us-east1-c --access-config-name=external-nat --address=34.26.70.60 --quiet
+sed -i '' '/compute\./d' ~/.ssh/google_compute_known_hosts
+gcloud compute ssh $NEW_INSTANCE --zone=us-east1-c --command='docker restart $(docker ps -q | head -1)'
+```
+
+Wait 60s, then verify via `curl http://34.26.70.60:3001/health`.
+
+**Future deploy.sh hardening** (not yet implemented): Step 7's retry loop should re-resolve the instance name on each attempt:
+```bash
+for attempt in 1 2 3 4 5; do
+  INSTANCE=$(gcloud compute instances list --filter='name~super-legal-staging AND status=RUNNING' --format='value(name)' | head -1)
+  gcloud compute instances add-access-config $INSTANCE --zone=us-east1-c --access-config-name=external-nat --address=34.26.70.60 --quiet 2>err && break
+  sleep 30
+done
+```
+Filed as v7.0.x follow-up code change.
+
 ### Docker push transient broken pipe
 
 **Observed on**: 2026-04-27 v6.7.0 deploy.

diff --git a/.claude/skills/infrastructure-health/SKILL.md b/.claude/skills/infrastructure-health/SKILL.md
@@ -2,7 +2,7 @@
 name: infrastructure-health
 description: >
   Tiered infrastructure health monitoring for Super Legal MCP platform. Monitors GCE instances,
-  PostgreSQL/pgvector, Anthropic API circuit breakers, 36 API clients, Gemini embedding
+  PostgreSQL/pgvector, Anthropic API circuit breakers, 38 API clients (incl. FMP equity-research, gated), Gemini embedding
   service, memory trends, EPO OAuth tokens, Prometheus alerts, session hygiene, API key expiration,
   Docker image drift, and dependency vulnerabilities. Triggers on: "infrastructure health",
   "health check", "infra status", "system health", "check infrastructure", "run health checks",
@@ -135,7 +135,7 @@ Read these subskill references:
 - [references/dependency-vulnerabilities.md](references/dependency-vulnerabilities.md) — npm audit
 
 ### Execution
-1. Fetch `<base_url>/metrics` and check for circuit breaker trips, high error rates. Wave 4 metrics to verify: `claude_subagent_duration_ms`, `claude_api_client_results_total` (check for `outcome="zero_results"`), `claude_document_conversion_duration_ms`, `claude_document_conversion_errors_total`, `claude_embedding_duration_ms`, `claude_gate_check_results_total`, `claude_kg_build_total` (check for `status="error"` or `status="skipped_breaker"`), `claude_kg_build_duration_ms`
+1. Fetch `<base_url>/metrics` and check for circuit breaker trips, high error rates. Wave 4 metrics to verify: `claude_subagent_duration_ms`, `claude_api_client_results_total` (check for `outcome="zero_results"` and `fetch_source` distribution — `exa_fallback` dominating for FMP tools indicates `FMP_API_KEY` issues), `claude_document_conversion_duration_ms`, `claude_document_conversion_errors_total`, `claude_embedding_duration_ms`, `claude_gate_check_results_total`, `claude_kg_build_total` (check for `status="error"` or `status="skipped_breaker"`), `claude_kg_build_duration_ms`. **v7.0.0 metrics to verify**: `claude_hook_persistence_failures_total` (any non-`unknown` reason = data loss vector), `claude_hook_circuit_breaker_state` (any value ≥2 = persistence skipping), `claude_code_execution_failures_total` by reason, `claude_hook_invocations_total` (success path counter — should grow during active sessions), `claude_tool_invocations_v2_total` (replaces deprecated v1; verify both still emitting during dual-emission window). **OTel sampler check**: container env `OTEL_TRACES_SAMPLER_ARG` — `1.0` indicates verification window, `0.1` is steady-state. See `references/prometheus-alerts.md` for full alert rule + remediation table.
 2. Run `scripts/pg-health.sh` for session hygiene and table sizes
 3. Calculate days until SAM_GOV_API_KEY expiry (set 2026-02-11, 90-day lifetime → ~2026-05-12)
 4. Run `scripts/docker-drift.sh` (requires gcloud auth — skip gracefully if unavailable)

diff --git a/.claude/skills/infrastructure-health/references/postgresql.md b/.claude/skills/infrastructure-health/references/postgresql.md
@@ -1,7 +1,10 @@
 # PostgreSQL Health — Subskill Reference
 
+**Version**: v7.0.1 (2026-05-06)
+
 ## Connection
-- Pool max: `PG_POOL_MAX` env (default: 10)
+- Pool max: `PG_POOL_MAX` env (default: **15** — bumped from 10 in v7.0.0 for 33% burst margin during simultaneous live stream + 3-rebuild reconciliation + transcript flush)
+- `statement_timeout`: 120,000 ms (preserved — extending was found unnecessary and risky during v6.8.0 audit)
 - Connection string: `PG_CONNECTION_STRING` or `DATABASE_URL`
 - Extension: pgvector (required when `EMBEDDING_PERSISTENCE=true`)
 
@@ -10,11 +13,15 @@
 |-------|---------|----------------|
 | sessions | Session tracking | 1 row per pipeline run |
 | reports | Report versions | ~10-20 rows per session |
-| hook_audit_log | Agent activity audit | 100-500 rows per session |
+| hook_audit_log | Agent activity audit | 100-500 rows per session; v7.0.0 adds `bridge_metadata` JSONB + `tool_use_id` columns |
+| code_executions | Per-`run_python_analysis` execution audit | 1 row per code execution; **v7.0.0 adds reproducibility columns** (model_id, llm_name, anthropic_request_id, anthropic_message_id, input/output/cache tokens, system_prompt_hash, python_code, container_id, tool_use_id, stop_reason, turn_count, pause_count, refusal_detected) |
+| code_execution_inputs | **v7.0.0** — data lineage junction linking each code execution to upstream subagent reports/embeddings/KG nodes | 1-5 rows per code execution |
+| transcript_events | **v7.0.0** — full-fidelity SSE event capture (`migrations/012_transcript-events.up.sql`); buffered batch insert | ~4,000-6,000 rows per 30-50 min session; **~700KB-1MB storage per session** |
+| citation_source_links | **v7.0.0** — citation→source bridge with fuzzy matching (URL exact / URL fuzzy / title fuzzy / embedding cosine) + confidence score | 1 row per memo footnote matched |
 | report_embeddings | pgvector embeddings | ~50-100 chunks per report |
 | agent_states | Agent lifecycle | ~40 rows per session |
 | source_writes | Wave 3 WAL — raw source persistence reconciliation | 1 row per raw source capture; hourly reconciler |
-| access_log | Wave 3 — EU AI Act Art. 12 read-side audit | 1 row per `/api/sessions/:id/*` read (fire-and-forget) |
+| access_log | Wave 3 — EU AI Act Art. 12 read-side audit | 1 row per `/api/sessions/:id/*` read (fire-and-forget); v7.0.0 audit-export reads also logged |
 | human_interventions | Wave 3 — EU AI Act Art. 14 operator governance audit | 0-5 rows per session (admin actions only) |
 | pii_mappings | Wave 3 — GDPR Art. 17 pseudonymization backing store | 0-N per session when PII detected |