Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .claude/skills/client-audit-export/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ Compatible with `shasum -a 256 -c manifest.txt` for verification.
- **`encrypted_value` exclusion** — hard-coded in `range-query.py`'s SELECT clause. Adding a new PII column requires explicit allow-list change here AND update to `references/art-13-fields.md`.
- **Manifest regeneration** — manifest is rebuilt every run. Operators verifying chain-of-custody check the file hash against the manifest, not the upload date.
- **WORM upload** — files land in the per-client WORM bucket with Object Lock. Cannot be modified or deleted by anyone (including project owner) until lock period elapses.
- **A3 query capture verification (v7.6.1)** — `range-query.py` counts `hook_audit_log` rows with `event_data ? 'exa_a3'` over the export window and writes the count into the manifest. Zero rows during a window with active A3 traffic = forensic gap; investigate `hookDBBridge` wiring (PR #114 defect 4.3.2). The full `event_data.exa_a3` JSONB (additional_queries[], query_count, ab_arm, ab_outcome) is included in the standard `hook_audit_log__csv.gz` export, satisfying EU AI Act Art. 12 query-reconstruction requirements.
- **A3 query capture verification (v7.6.1)** — `range-query.py` counts `hook_audit_log` rows with `event_data ? 'exa_a3'` over the export window and writes the count into the manifest. **Interpretation depends on flag state during the window**: (a) if `EXA_ADDITIONAL_QUERIES=true` was active and ≥1 session ran, zero rows = forensic gapinvestigate `hookDBBridge` wiring (PR #114 defect 4.3.2); (b) if `EXA_ADDITIONAL_QUERIES=false` during the window, zero rows is expected and no action is needed (do not file incident). When non-zero, the full `event_data.exa_a3` JSONB (`additional_queries[]`, `query_count`, `ab_arm`, `ab_outcome`, `otel_trace_id`) is included in the standard `hook_audit_log__csv.gz` export, satisfying EU AI Act Art. 12 query-reconstruction requirements.

## Output report

Expand Down
10 changes: 10 additions & 0 deletions .claude/skills/deploy/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,16 @@ Report these to the user:
- Health status and feature flags (from script output)
- Any warnings or issues encountered

### Production flag verification

`curl http://34.26.70.60:3001/health | jq '.feature_flags'` should show the Exa production state established in `flags.env`:

- `EXA_WEB_TOOLS=true` — production-locked since 2026-04-18 (PR [#76](https://github.com/Number531/Legal-API/pull/76)); validated 2026-05-12 via PRs [#118](https://github.com/Number531/Legal-API/pull/118)/[#119](https://github.com/Number531/Legal-API/pull/119) (96.8% Exa vs 96.1% Anthropic citation-verifier rate, both PASS gate).
- `EXA_ADDITIONAL_QUERIES=true` — all-treatment rollout in production `flags.env` since 2026-05-11.
- `EXA_ADDITIONAL_QUERIES_AB_SAMPLE=0.0` — all-treatment (set to `0.5` only for staging A/B windows; see `docs/runbooks/exa-a3-ab-staging.md`).

If any of these flip to `false` after a deploy, the deploy regressed an env-var; do not advance traffic until reconciled.

## Pre-Deploy Checks

Before running the script, verify these prerequisites. The script validates Docker but cannot fix auth or project issues.
Expand Down
2 changes: 1 addition & 1 deletion .claude/skills/subagent-scaffold/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: subagent-scaffold
description: Generate a new Claude Agent SDK subagent across all 7 mandatory wiring files. Mirrors the equity-analyst canonical template — agent file in legalSubagents/agents/, index.js import + registration tuple, _promptConstants.js CAPABILITY constant, domainMcpServers.js SUBAGENT_DOMAIN_MAP entry, hookSSEBridge.js classifyAgent map, optional p0GateHook.js RESEARCH_AGENTS Set, catalogDisplay/agentClassifications.js + agentDisplayMeta.js. Triggers — subagent scaffold, new subagent, generate agent, /subagent-scaffold. Supports flags — --name <kebab-name>, --phase research|synthesis|qa, --domains <comma-list>, --keywords <comma-list>, --a3-eligible (auto-include EXA_ADDITIONAL_QUERIES_GUIDANCE for subagents that use Exa-routable tools).
description: Generate a new Claude Agent SDK subagent across all 7 mandatory wiring files. Mirrors the equity-analyst canonical template — agent file in legalSubagents/agents/, index.js import + registration tuple, _promptConstants.js CAPABILITY constant, domainMcpServers.js SUBAGENT_DOMAIN_MAP entry, hookSSEBridge.js classifyAgent map, optional p0GateHook.js RESEARCH_AGENTS Set, catalogDisplay/agentClassifications.js + agentDisplayMeta.js. Triggers — subagent scaffold, new subagent, generate agent, /subagent-scaffold. Supports flags — --name <kebab-name>, --phase research|synthesis|qa, --domains <comma-list>, --keywords <comma-list>, --a3-eligible (RECOMMENDED for --phase research; auto-includes EXA_ADDITIONAL_QUERIES_GUIDANCE — pre-wires the orchestrator query-variation prompt for Exa-routable tools).
---

# Subagent Scaffold — Generate a New Agent SDK Subagent
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -763,7 +763,9 @@ All clients use the BaseHybridClient pattern: native API first, automatic fallba
| `ENABLE_GEMINI_FILTERING` | false | Gemini-based content filtering |
| `PTAB_PERMISSIVE_MODE` | false | Lenient PTAB API error handling |
| `ENHANCED_SUMMARY_QUERIES` | true | Enhanced summary generation |
| `EXA_WEB_TOOLS` | true | Exa-powered web search tools in agent context |
| `EXA_WEB_TOOLS` | true | Exa-powered web search tools (`fetch_document`, `exa_web_search`) replacing Anthropic `WebFetch`/`WebSearch`. **Production-locked since 2026-04-18 (PR [#76](https://github.com/Number531/Legal-API/pull/76)); validated 2026-05-12 via production-fidelity A/B (PRs [#118](https://github.com/Number531/Legal-API/pull/118)/[#119](https://github.com/Number531/Legal-API/pull/119)): 96.8% Exa vs 96.1% Anthropic on 467-footnote citation-verifier fixture, both PASS gate.** |
| `EXA_ADDITIONAL_QUERIES` | true (production all-treatment since 2026-05-11) | Orchestrator-authored `additionalQueries` forwarded to Exa /search across 20 high-traffic MCP tools (v7.1.0 → v7.6.2) |
| `EXA_ADDITIONAL_QUERIES_AB_SAMPLE` | 0.0 | A/B split for staging quality-lift measurement (set to 0.5 for balanced split) |
| `RAW_SOURCE_ARCHIVE` | true | Content-addressed raw source capture + SHA-256 hashing for audit traceability (v6.0.0) |
| `PROMPT_INJECTION_DETECTION` | true | Regex-based injection detection on tool outputs (v6.0.0) |
| `SLA_TELEMETRY` | true | Per-tool latency histograms + 7-day SLA dashboard (v6.0.0) |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -506,7 +506,7 @@ Every PE fund, investment bank, and law firm should ask these 10 questions of an
| **5** | **Can you show the complete audit trail for how a specific conclusion was reached?** | No. Conversational interface with no disclosed provenance chain. | Partial. Shows source documents but no full provenance chain through intermediate reasoning. | Yes. Session-level traceability: agent → database queried → API response → fact registry → section draft → QA score → remediation → final memorandum. |
| **6** | **Does your system test for completeness — not just accuracy of what it produces?** | No disclosed completeness testing. BigLaw Bench tests answer quality on provided tasks only. | No disclosed mechanism for detecting gaps in its own research scope. | Phase 2 research review includes explicit completeness checks. QA "Completeness" dimension (10 points) scores whether all material issues are addressed. |
| **7** | **What is your citation validation methodology?** | BigLaw Bench reports 68% source reliability — 32% unreliable. No disclosed methodology for improvement. | Links to Westlaw sources. No disclosed programmatic citation validation independent of the model. | Three-layer: (1) Bluebook standards in agent prompts, (2) programmatic Python validation independent of LLM, (3) QA "Citation Quality" dimension (12 points — highest-weighted single dimension). |
| **8** | **How many regulatory databases does your system query directly — not via web search?** | Not disclosed. No disclosed direct API integrations with government databases. | Westlaw + Practical Law (Thomson Reuters proprietary). No disclosed government database integrations. | 50+ database integrations via 134 domain-specific tools: SEC EDGAR, CourtListener, USPTO, FDA, EPA, Federal Register, GovInfo, BLS, ClinicalTrials.gov, USAspending, SAM.gov, ECB, ECHR, EUR-Lex, EPO, and more. |
| **8** | **How many regulatory databases does your system query directly — not via web search?** | Not disclosed. No disclosed direct API integrations with government databases. | Westlaw + Practical Law (Thomson Reuters proprietary). No disclosed government database integrations. | 50+ database integrations via 134 domain-specific tools: SEC EDGAR, CourtListener, USPTO, FDA, EPA, Federal Register, GovInfo, BLS, ClinicalTrials.gov, USAspending, SAM.gov, ECB, ECHR, EUR-Lex, EPO, and more. Web-source citations (state-court rules, agency enforcement bulletins) route through Exa's MCP tools — blind A/B-validated 2026-05-12 at 96.8% confirmation on a 370-footnote production fixture (PRs [#118](https://github.com/Number531/Legal-API/pull/118)/[#119](https://github.com/Number531/Legal-API/pull/119)), both PASS production gate. |
| **9** | **Does your system enforce least-privilege access for internal components?** | Not disclosed. | Not disclosed. | Yes. 25 domain-scoped MCP servers partition the 134-tool catalog. Each specialist agent receives only tools relevant to its domain (84-93% reduction). |
| **10** | **Will you submit to an independent, blinded evaluation against human expert work product?** | Not disclosed. BigLaw Bench is self-designed and self-scored. | Participated in VLAIR (third-party) but tests task completion, not memorandum-grade output. | Yes. Independent blind evaluation on roadmap: law school professors design rubric, retired M&A partners score anonymized output. Architecture is built to survive this test. |

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,7 @@ USER QUERY + DOCUMENTS
| Words per memorandum | 100,000+ | vs. 10-30 pages from traditional firms (varies by deal complexity) |
| Citations per memorandum | 523+ unique | Bluebook 22nd Edition format |
| Citation verification rate | 99%+ | Against live databases; uncertainties explicitly tagged for user review |
| Citation websearch verifier (G5) | 96.8% confirmed (Exa) / 96.1% (Anthropic), both PASS production gate | Independent blind A/B on 467-footnote production fixture, 2026-05-12 (PRs [#118](https://github.com/Number531/Legal-API/pull/118) + [#119](https://github.com/Number531/Legal-API/pull/119)). Exa MCP tools (`fetch_document`, `exa_web_search`) are the production-default verifier path since 2026-04-18. |
| Time to delivery | 2 hours 47 minutes | vs. 6-8 weeks for equivalent manual diligence |
| Client configurability | 11 surfaces | Domain selection, database selection, agent roster, depth parameters, QA thresholds, certification floors, risk tolerance, deliverable format, remediation cycles, tool-level toggles, model routing |
| Subscription pricing | $400K+/month | Deal infrastructure, not per-seat SaaS |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -517,7 +517,7 @@ These four objections come up in nearly every compliance-conscious buyer convers

**Q1: "How do you handle EU AI Act?"**

> Articles 12-15 are mapped row-by-row to shipping artifacts. Article 12 logging is `hook_audit_log` + `access_log`. Article 13 transparency is the audit-export endpoint at `/api/session/:sessionKey/audit-report`. Article 14 human oversight is the admin router (`/halt`, `/override`, `/legal-hold`) with everything written to `human_interventions`. Article 15 reproducibility is the byte-replay envelope on every code execution — `system_prompt_hash + python_code + git_sha + sdk_version + container_id + anthropic_request_id`. We don't claim "AI Act ready"; we ship the artifacts. Bring your auditor.
> Articles 12-15 are mapped row-by-row to shipping artifacts. Article 12 logging is `hook_audit_log` + `access_log`. Article 13 transparency is the audit-export endpoint at `/api/session/:sessionKey/audit-report`. Article 14 human oversight is the admin router (`/halt`, `/override`, `/legal-hold`) with everything written to `human_interventions`. Article 15 reproducibility is the byte-replay envelope on every code execution — `system_prompt_hash + python_code + git_sha + sdk_version + container_id + anthropic_request_id`. Citation reproducibility extends through the verification layer itself: every footnote is checked against live regulatory/government databases via Exa MCP tools (production-default since 2026-04-18; blind A/B-validated 2026-05-12 at 96.8% confirmation on 370 footnotes — PRs [#118](https://github.com/Number531/Legal-API/pull/118)/[#119](https://github.com/Number531/Legal-API/pull/119)). We don't claim "AI Act ready"; we ship the artifacts. Bring your auditor.

**Q2: "What about GDPR data deletion?"**

Expand Down
16 changes: 9 additions & 7 deletions super-legal-mcp-refactored/company-strategy/system-design.md
Original file line number Diff line number Diff line change
Expand Up @@ -371,7 +371,7 @@ The P0 agent runs in a **dedicated agentQuery** before the main orchestrator, wi
| `withWrite` | Read, Grep, Glob, Write, Edit |
| `withWriteAndWeb` | Read, Grep, Glob, Write, Edit, WebFetch*, WebSearch* |

*When EXA_WEB_TOOLS=true: WebFetch → fetch_document, WebSearch → exa_web_search (Exa-powered MCP tools)
*Production config (EXA_WEB_TOOLS=true since 2026-04-18, PR [#76](https://github.com/Number531/Legal-API/pull/76)): WebFetch → fetch_document, WebSearch → exa_web_search (Exa-powered MCP tools). Validated 2026-05-12 (PRs [#118](https://github.com/Number531/Legal-API/pull/118) + [#119](https://github.com/Number531/Legal-API/pull/119)) — Exa arm 358/370 = 96.8% vs Anthropic arm 340/354 = 96.1% on 467-footnote citation-verifier fixture; both PASS production gate.

### 4.3 Agent Model Selection

Expand Down Expand Up @@ -495,7 +495,7 @@ The `SessionStart` hook performs similar recovery on session resume, checking fo

## 6. Tool & Domain Architecture

### 6.1 Tool Inventory (148 Tools across 30 Domains, +2 with EXA_WEB_TOOLS)
### 6.1 Tool Inventory (150 Tools across 30 Domains — Exa MCP tools production-default since 2026-04-18)

| Domain | Tool Count | Primary API | Example Tools |
|--------|-----------|-------------|---------------|
Expand Down Expand Up @@ -528,8 +528,8 @@ The `SessionStart` hook performs similar recovery on session resume, checking fo
| `analysis` | 1 | Exa comprehensive | comprehensive_legal_entity_analysis |
| `filing` | 1 | Internal | draft_legal_filing |
| `state-statutes` | 1 | Exa web search | search_state_statute |
| `direct-fetch` | 1 | Exa `/contents` | fetch_document (conditional: EXA_WEB_TOOLS) |
| `exa-search` | 1 | Exa search API | exa_web_search (conditional: EXA_WEB_TOOLS) |
| `direct-fetch` | 1 | Exa `/contents` | fetch_document (production-default; replaces WebFetch) |
| `exa-search` | 1 | Exa search API | exa_web_search (production-default; replaces WebSearch) |
| `code-execution` | 1 | Anthropic sandbox | run_python_analysis (conditional) |

### 6.2 Hybrid Client Pattern
Expand Down Expand Up @@ -560,7 +560,7 @@ Behind `SCOPED_MCP_SERVERS=false` (default OFF):

| Mode | MCP Servers | Tools Per Agent | Tool Name Pattern |
|------|-------------|----------------|-------------------|
| **Monolithic** (default) | 1 (`super-legal-tools`) | ~98 (all, +2 with EXA_WEB_TOOLS) | `mcp__super-legal-tools__search_sec_filings` |
| **Monolithic** (default) | 1 (`super-legal-tools`) | ~150 (includes fetch_document + exa_web_search as production-default tools) | `mcp__super-legal-tools__search_sec_filings` |
| **Scoped** | 25+ domain servers | 4-21 (per agent) | `mcp__sec__search_sec_filings` |

**Subagent-to-Domain Mapping** (when scoped):
Expand Down Expand Up @@ -1880,7 +1880,9 @@ These invariants are why the v6.8.5 audit-export endpoint can claim byte-faithfu
| `CITATION_CHAT` | `true` | Session-scoped RAG Q&A with Anthropic Citations API (requires EMBEDDING_PERSISTENCE) |
| `KNOWLEDGE_GRAPH` | `true` | 10-phase KG extraction, provenance chains, force-graph visualization, graph Q&A (requires EMBEDDING_PERSISTENCE + HOOK_DB_PERSISTENCE) |
| `AUTH_ENABLED` | `true` | Cookie-based authentication with bcrypt password hashing |
| `EXA_WEB_TOOLS` | `true` | Exa-powered fetch_document + exa_web_search replacing WebFetch/WebSearch |
| `EXA_WEB_TOOLS` | `true` | Exa-powered fetch_document + exa_web_search replacing WebFetch/WebSearch. **Production-locked since 2026-04-18 (PR [#76](https://github.com/Number531/Legal-API/pull/76)); validated 2026-05-12 via PRs [#118](https://github.com/Number531/Legal-API/pull/118) + [#119](https://github.com/Number531/Legal-API/pull/119) — 96.8% Exa vs 96.1% Anthropic on 467-footnote fixture, both PASS gate.** |
| `EXA_ADDITIONAL_QUERIES` | `false` (production: `true`, all-treatment) | Orchestrator-authored `additionalQueries` forwarded to Exa /search across 20 high-traffic MCP tools (v7.1.0 → v7.6.2). All-treatment rollout in production `flags.env` since 2026-05-11. |
| `EXA_ADDITIONAL_QUERIES_AB_SAMPLE` | `0.0` | A/B split fraction for staging quality-lift measurement (set to `0.5` for balanced split). |
| `PROMPT_ENHANCEMENT` | `true` | Intake research pre-phase for short queries (< 1000 chars) |
| `RAW_SOURCE_ARCHIVE` | `true` | Content-addressed raw source capture + SHA-256 hashing for audit traceability (v6.0.0) |
| `PROMPT_INJECTION_DETECTION` | `true` | Regex-based injection detection on tool outputs — OWASP LLM Top 10 (v6.0.0) |
Expand Down Expand Up @@ -2485,7 +2487,7 @@ When `DOCUMENT_PROCESSING=true`, two sequential `agentQuery()` calls run (P0 + m
| 1 | ~~**PostgreSQL Migration**~~ | **SHIPPED** (v4.0.0, Issue #30) | 5-tier DB enhancements | Cross-session analytics, ACID storage. Hook-to-DB bridge with 6 tables, 16+ indexes, frontend query router. |
| 2 | **JSON Structured Reports** | [#10](https://github.com/Number531/Legal-API/issues/10) | ~3,000 lines, 18 files | Zod schemas for all 42 subagent outputs. Enables frontend rendering of structured findings, machine-readable risk tables, and API consumption of research results. |
| 3 | **Document Processing (P0 Enable)** | [#8](https://github.com/Number531/Legal-API/issues/8) | Flag flip + integration testing | `DOCUMENT_PROCESSING=true` — P0 pre-wave subagent for client document upload and extraction. Code complete, empirical evidence in spec that prompt-only steering fails without enforcement gate. |
| 4 | ~~**G5 Citation WebSearch Verification**~~ | **SHIPPED** (v3.7.4, flag now `true` by default) | 60 tests passing | Independent websearch verification of every footnote before final synthesis. Dual-mode (existence haiku / full-content sonnet). Tiered hybrid strategy (WebFetch -> Exa MCP -> Anthropic WebSearch). W5-004 tag downgrade pipeline. |
| 4 | ~~**G5 Citation WebSearch Verification**~~ | **SHIPPED** (v3.7.4, flag now `true` by default; Exa-primary since 2026-04-18) | 60 tests passing + production-fidelity A/B validation (2026-05-12, PRs [#118](https://github.com/Number531/Legal-API/pull/118)/[#119](https://github.com/Number531/Legal-API/pull/119): 96.8% Exa vs 96.1% Anthropic, both PASS) | Independent websearch verification of every footnote before final synthesis. Dual-mode (existence haiku / full-content sonnet). Tiered hybrid strategy with **Exa MCP tools (`fetch_document`, `exa_web_search`) as the production-default verifier path** — Anthropic `WebSearch`/`WebFetch` retained as SDK-level fallback only. W5-004 tag downgrade pipeline. |
| 5 | ~~**Database Enhancements (5-Tier)**~~ | **SHIPPED** (v4.0.0, Issue #30) | 5 tiers implemented | Hook-to-DB bridge persisting sessions, agent audit, gate checks, tool calls, code execution, remediation tracking. `HOOK_DB_PERSISTENCE=true`. |
| 6 | **Files API Chart Extraction** | Merged to main (v4.1.0) | Feature-flagged | Charts extracted from code execution sandbox via `files-api-2025-04-14` beta. Persisted to `reports/{session}/charts/`. Two flags: `FILES_API_CHART_EXTRACTION`, `CHART_PERSISTENCE`. |

Expand Down
Loading