diff --git a/super-legal-mcp-refactored/company-strategy/gtm-positioning-strategy.md b/super-legal-mcp-refactored/company-strategy/gtm-positioning-strategy.md index e64a45151..1a9a6ddfe 100644 --- a/super-legal-mcp-refactored/company-strategy/gtm-positioning-strategy.md +++ b/super-legal-mcp-refactored/company-strategy/gtm-positioning-strategy.md @@ -1,14 +1,14 @@ # Aperture: GTM Positioning Strategy ## Platform Positioning, Messaging Architecture & Competitive Differentiation -**Prepared: March 2026 | Classification: GTM Partner Briefing Document** +**Prepared: March 2026 | Updated: May 2026 (v6.8.5 — Wave 5 compliance + FMP equity)** | Classification: GTM Partner Briefing Document** **For: GTM Advisory Firm Engagement** --- ## 1. What Aperture Is -Aperture is institutional-grade deal facilitation infrastructure. It autonomously produces complete, verified due diligence memoranda — 100,000+ words with 400+ verified citations across 14 regulatory domains — in under 3 hours. Legal due diligence is a core component, but Aperture's scope spans the full deal service stack: regulatory compliance analysis, financial modeling (45 models including DCF, Monte Carlo, Altman Z-Score, Benford's Law, macro stress testing, recession probability), contract review and risk scoring, risk quantification and heat mapping, remediation planning, and QA/certification — collapsing 5-7 external vendor relationships ($530K-3M+ per deal) into a single, integrated pipeline. +Aperture is institutional-grade deal facilitation infrastructure. It autonomously produces complete, verified due diligence memoranda — 100,000+ words with 400+ verified citations across 14 regulatory domains — in under 3 hours. Legal due diligence is a core component, but Aperture's scope spans the full deal service stack: regulatory compliance analysis, financial modeling (56 models including DCF, Monte Carlo, Altman Z-Score, Benford's Law, macro stress testing, recession probability, 11 equity research models added in v6.8.5), equity research and live securities data (FMP integration — 36 tools spanning prices, multiples, analyst estimates, earnings transcripts, institutional holdings), contract review and risk scoring, risk quantification and heat mapping, remediation planning, and QA/certification — collapsing 5-7 external vendor relationships ($530K-3M+ per deal) into a single, integrated pipeline. It is **not** an AI assistant, copilot, chatbot, document review tool, or legal search engine. It is a verification system that reveals the full landscape of a transaction and produces the deliverable the investment committee reads. @@ -23,7 +23,7 @@ The product sits at the intersection of legal services and enterprise software, | Priority | Signal | Why It Leads | |----------|--------|-------------| | **1. Trust** | "I would stake my career on this output" | PE partners' reputations depend on analysis quality. Leading with trust eliminates the primary objection before it forms. | -| **2. Precision** | "This is engineered, not generated" | Distinguishes Aperture from chatbot-style AI. The architecture IS the product — 40 agents, 5 gates, 134 tools. | +| **2. Precision** | "This is engineered, not generated" | Distinguishes Aperture from chatbot-style AI. The architecture IS the product — 45 agents, 5 gates, 197 tools. | | **3. Capacity** | "Your team, multiplied" | A 20-30 person deal team at 160-200 person throughput. Not "faster diligence" but "expanded deal pipeline." | | **4. Speed** | "Without sacrificing trust, precision, or capacity" | Speed alone is commodity. Speed with institutional quality is unprecedented. Always mention last. | @@ -70,7 +70,7 @@ An aperture is the precisely engineered opening in an optical system that determ |-------|------------------|---------------|-----------| | **1. Controlled Precision** | Iris blades calibrate what passes through — not a hole, a mechanism | 5 validation gates control what qualifies as verified output | "We don't just generate. We engineer what passes through." | | **2. Depth of Field** | Everything from near to far in sharp focus simultaneously | 14 regulatory domains analyzed in parallel — securities, antitrust, IP, FDA, EPA, employment, tax, privacy, cyber, AI governance, government contracts, insurance, commercial, litigation | "Every domain, at full resolution, simultaneously. Nothing blurred." | -| **3. Resolution** | Precision lens resolves details invisible to lesser instruments | 134 tools connected to 50+ databases, 726KB prompt corpus, 6-wave remediation | "Our resolution exceeds what human teams can achieve at scale." | +| **3. Resolution** | Precision lens resolves details invisible to lesser instruments | 197 tools connected to 50+ databases, 726KB prompt corpus, 6-wave remediation | "Our resolution exceeds what human teams can achieve at scale." | | **4. Revelation** | An aperture reveals — it does not create light | Risks and liabilities exist whether or not anyone finds them | "We don't generate findings. We reveal what the transaction already contains." | | **5. Force Multiplier** | Wider aperture = more light = more scenes captured | 20-30 person deal team at 160-200 person throughput, 10 memoranda/day | "Your pipeline was constrained by how much your team could see at once." | @@ -99,7 +99,7 @@ USER QUERY + DOCUMENTS | Phase 1: Research Planning (orchestrator creates research plan) | - Phase 2: Parallel Research (18+ specialist agents across 50+ databases) + Phase 2: Parallel Research (20+ specialist agents across 50+ databases) | Phase 3: Research Review Gate (completeness verification) | @@ -243,6 +243,45 @@ Every claim in the output carries one of 8 verification tags: | **REJECT_LOOP** | <88%, remediation cycles <2 | Return to automated remediation | | **REJECT_ESCALATE** | <88%, cycles ≥2 | Human review required | +### 3.8 Compliance & Audit Posture (v6.8.5 — Wave 5) + +The architecture above is engineered. The compliance posture is regulator-ready. v6.8.5 closes the transparency gap that copilot-class tools cannot address by design. + +**EU AI Act mapping**: + +| Article | Requirement | Aperture Implementation | +|---|---|---| +| **Art. 12 — Logging** | High-risk AI systems must maintain automatic logs of operation | `hook_audit_log` records every tool invocation, agent dispatch, and code execution with bounded reason enums and Prometheus failure counters | +| **Art. 13 — Transparency** | Operators must be able to interpret system output | `GET /api/session/:sessionKey/audit-report` returns the complete audit trail per session — code executions with model identity, tokens, prompt hashes; bridge_metadata with git_sha + sdk_version + container_id; access log; human interventions; citation source links | +| **Art. 14 — Human Oversight** | Designated humans must be able to override and intervene | Admin endpoints (`/api/sessions/:key/halt`, `/override`, `/legal-hold`, `/tombstone`) write to `human_interventions` with user identity, reason, and timestamp | +| **Art. 15 — Accuracy & Robustness** | Reproducibility from audit log | Every `run_python_analysis` execution is byte-replayable from `system_prompt_hash + python_code + git_sha + sdk_version + container_id + model_id + anthropic_request_id` | + +**GDPR mapping**: + +| Article | Requirement | Aperture Implementation | +|---|---|---| +| **Art. 17 — Right to Erasure** | Data subjects can request deletion | `redactSessionEventData()` overwrites JSONB content paths to `[REDACTED]` at offboarding-time; cascade DELETE via FK on session removal; `pii_mappings.erasePII()` permanently removes real-value mappings while preserving pseudonyms in reports | +| **Art. 20 — Data Portability** | Export structured data | Audit-export endpoint supports JSON and CSV.gz formats | +| **Art. 32 — Security** | Encryption at rest + in transit | Cloud SQL TLS + Google-managed CMEK; bcrypt password hashing at cost factor 12 | + +**SEC 17a-4 record-keeping**: 5-class retention manager with `LITIGATION_HOLD`, `REGULATORY_7Y`, `STANDARD_3Y`, `RESEARCH_1Y`, `TRANSIENT_90D`. Legal hold prevents all deletion regardless of class. WORM bucket with 8-year object lock for raw source archive. + +**The differentiation**: copilots cannot offer this because their architecture treats the LLM as a black box invoked through chat. Aperture treats the LLM as a component in an instrumented pipeline — every call is audited at the boundary, not at the chat surface. This is why the audit-export endpoint can claim byte-faithful reproducibility while the streaming pipeline stays under 200ms hook latency. + +**Buyer message**: "Bloomberg gave you market data you could trust. Aperture gives you AI output your regulator will trust." + +### 3.9 The Equity Research Layer (v6.8.5 — FMP Integration) + +Pre-v6.8.5, Aperture's `securities-researcher` agent had access to SEC filings (10-K/Q/8-K) but zero access to live market data — stock prices, multiples, analyst estimates, earnings call transcripts, institutional holdings. IB/PE/M&A memos cite all of these routinely. + +v6.8.5 closes the gap with a dedicated `equity-analyst` subagent backed by Financial Modeling Prep's `/stable` API: + +- **36 tools** — peer cohorts, multiple decomposition, premium/discount bridges, EPS surprise analysis, reverse DCF, earnings call sentiment, analyst rating distributions, institutional 13F holdings, batch quotes +- **11 code-execution models** (M46–M55, M58) — quantitative cohort identification, multiple regression, control-premium estimation, PE/strategic buyer pricing differentials +- **Architectural separation from financial-analyst** — financial-analyst reviews **financials** (DCF, LBO, capital structure, working capital). Equity-analyst reviews **securities data**. Two distinct workflows, two distinct prompts, cleaner outputs than a multi-purpose agent + +This pairing — comprehensive securities research alongside legal/regulatory due diligence in the same pipeline — does not exist in any competitor's offering. Bloomberg gives you market data; Harvey gives you legal AI; Aperture is the first to do both inside one verified, audited memorandum. + --- ## 4. Competitive Positioning @@ -303,7 +342,7 @@ The augmentation model costs 52-125% more per deliverable, produces 25-50x fewer ### 4.5 vs. "Build It Ourselves" -- 40-agent orchestration with state management, 50+ database integrations (each with its own API versioning, authentication, rate limiting), 5-gate validation architecture, 134 tools, 12-dimension QA, 6-wave remediation +- 45-agent orchestration with state management, 50+ database integrations (each with its own API versioning, authentication, rate limiting), 5-gate validation architecture, 197 tools, 12-dimension QA, 6-wave remediation - "Our validation infrastructure alone took 18 months of dedicated engineering. That's before a single research agent was built." ### 4.6 vs. Deal-Execution AI Ecosystem @@ -331,7 +370,7 @@ As Aperture processes more deals, it generates aggregate cross-domain regulatory - **Authoritative but not stiff** — think *The Economist* meets a top-tier M&A partner's cover letter - **Technical but not alienating** — buyers are sophisticated (understand CREAC, Bluebook, HSR thresholds) but not engineers -- **Understated rather than hyperbolic** — lead with architecture numbers: "40 agents. 5 validation gates. 14 domains. 134 tools." Output metrics from production runs are proof points, not promises. +- **Understated rather than hyperbolic** — lead with architecture numbers: "45 agents. 5 validation gates. 14 domains. 197 tools. 56 financial models." Output metrics from production runs are proof points, not promises. ### 5.2 Word Choice @@ -377,11 +416,11 @@ As Aperture processes more deals, it generates aggregate cross-domain regulatory | Audience | Tagline | |----------|---------| | PE / SWF | "Every deal. Full resolution." | -| AmLaw / BigLaw | "40 agents. 14 domains. Every claim tagged to source. Certified." | +| AmLaw / BigLaw | "45 agents. 14 domains. Every claim tagged to source. Certified. EU AI Act Art. 12-15 audit-ready." | | Ecosystem / Integration | "The verification layer for deal intelligence." | | Conference / Event | "What your diligence isn't showing you." | | IB / M&A Advisory | "A lean 20-person deal team at the capacity of 160-200 people." | -| Technical / API | "40 specialist agents. 134 tools. 50+ databases (legal, financial, government). One memorandum." | +| Technical / API | "45 specialist agents. 197 tools. 50+ databases (legal, financial, government, equity-research). One memorandum. Byte-replayable from audit log alone." | | Investor / Press | "Institutional-grade deal due diligence at machine speed." | --- @@ -397,7 +436,7 @@ As Aperture processes more deals, it generates aggregate cross-domain regulatory ### 2-Minute (Formal Introduction) "The fundamental problem with deal due diligence isn't speed — it's blindness. A deal team coordinates 5-7 external firms, each examining their slice. The cross-domain connections stay dark. Aperture solves blindness. -It's deal facilitation infrastructure — 40 specialized AI agents analyzing 14 regulatory domains simultaneously against 50+ live databases, with 45 financial models running sandboxed analysis. Legal, regulatory, financial, contract review, risk quantification, remediation planning — all in parallel, all cross-referenced, all verified. +It's deal facilitation infrastructure — 45 specialized AI agents analyzing 14 regulatory domains simultaneously against 50+ live databases, with 56 financial and equity-research models running sandboxed analysis. Legal, regulatory, financial, equity, contract review, risk quantification, remediation planning — all in parallel, all cross-referenced, all verified, all byte-replayable from audit log. The output is a CREAC-structured memorandum with 400+ footnotes, risk quantification tables, draft contract language, and an executive summary. Every citation is tagged: verified, unverified, or inferred. A 12-dimension QA system scores the output on a 100-point scale. Below 93, it doesn't ship — it goes through automated remediation and re-certification. diff --git a/super-legal-mcp-refactored/company-strategy/gtm-sales-playbook.md b/super-legal-mcp-refactored/company-strategy/gtm-sales-playbook.md index 9398dae69..01a3de8c2 100644 --- a/super-legal-mcp-refactored/company-strategy/gtm-sales-playbook.md +++ b/super-legal-mcp-refactored/company-strategy/gtm-sales-playbook.md @@ -1,7 +1,7 @@ # Aperture: GTM Sales & Execution Playbook ## Sales Motion, Demo Protocol, Pricing, Content Strategy & GTM Execution -**Prepared: March 2026 | Classification: GTM Partner Briefing Document** +**Prepared: March 2026 | Updated: May 2026 (v6.8.5 — Wave 5 compliance + FMP equity research)** | Classification: GTM Partner Briefing Document** **For: GTM Advisory Firm Engagement** --- @@ -43,7 +43,7 @@ Prospect provides: Aperture provides: - NDA-protected access to run the deal through the platform -- Real-time observation of the pipeline (optional — some buyers want to watch the 40 agents work) +- Real-time observation of the pipeline (optional — some buyers want to watch the 45 agents work) **Step 4: Output Delivery** @@ -55,6 +55,7 @@ Complete memorandum package: - Citation verification manifest (what was checked, against which database, when) - QA certification score (12-dimension breakdown) - Converted DOCX/PDF in IB-grade styling (Georgia serif, navy headers, zebra-striped tables) +- **Audit-export bundle** (v6.8.5): regulator-ready JSON or CSV.gz from `GET /api/session/:sessionKey/audit-report` containing every code execution with model identity, system_prompt_hash, python_code, container_id, git_sha, sdk_version, anthropic_request_id; all admin actions; all access events; all citation→source links. Closing artifact for compliance-conscious buyers — turns "trust us" into "here's the byte-replay envelope." **Step 5: Side-by-Side Comparison (The Decisive Moment)** @@ -494,13 +495,43 @@ Before the comparison output program, before the contract discussion — the int Competitors pitch "EU AI Act ready" as a forward-looking commitment. Aperture ships the artifacts today. +**Wave 3 (v6.2.0)**: - **Cloud Trace timeline per session** — every tool call, every raw-source capture, every DB write, as an OpenTelemetry span with agent attribution. Compliance auditor asks "how was evidence X produced?" → trace URL is the answer. - **`access_log` — EU AI Act Article 12**. Every session read, every report download, every raw-source fetch persists with requester identity + timestamp. Week-two procurement ask; Aperture satisfies with one query. - **`human_interventions` — EU AI Act Article 14**. Every MD halt, override, legal hold, retention change, or PII erasure is a row with operator + reason. Human-in-the-loop evidence, queryable. - **PII pseudonymization + automated erasure — GDPR Article 17**. One admin endpoint (`POST /api/admin/pii-erase`) processes a subject access request. European funds don't sign DPAs without this. - **WORM archive — 8-year Object Retention Lock**. Raw source bodies tiered to GCS with per-object retention lock set at `upload + 8 years`. Cannot be deleted before expiry — SEC Rule 17a-4 + MiFID II readiness. -In the institutional buyer's EU AI Act readiness checklist, the answer to each row is a table name, an endpoint, or a Cloud Trace URL — not an architecture diagram. +**Wave 5 (v6.8.5 — May 2026)**: +- **`code_executions` byte-replay envelope — EU AI Act Article 15**. Every Claude-generated Python execution stores `model_id` (exact `claude-sonnet-4-6-...` revision), `system_prompt_hash`, `python_code` (full source), `python_code_hash`, `container_id` (Anthropic sandbox ID), `git_sha` (deployed code revision), `sdk_version`, `anthropic_request_id`. Combined with `bridge_metadata` JSONB, a regulator can byte-faithfully reproduce any production code execution from the audit log alone. No competitor offers this. +- **Audit-export endpoint — EU AI Act Article 13** (`GET /api/session/:sessionKey/audit-report`). Aggregates `code_executions`, `bridge_metadata`, `human_interventions`, `access_log`, and `citation_source_links` into a single regulator-facing JSON or CSV.gz. Auth-gated, access-logged, no PII redaction at read-time (regulators require complete data). +- **`citation_source_links` — citation→raw-source bridge**. Every memo footnote is matched (exact URL, fuzzy URL, fuzzy title, embedding cosine) to the archived raw source with a confidence score. Answers "show me the source for footnote 47" in milliseconds. +- **`redactSessionEventData()` — GDPR Article 17 erasure boundary**. Wired at offboarding-time via the `client-offboarding` skill, NOT at admin read-time. Overwrites JSONB content paths to `[REDACTED]` while preserving row structure for SEC 17a-4 metadata retention. Cascade DELETE via FK on session removal. +- **Hook persistence observability — operational evidence**. `claude_hook_persistence_failures_total{hook, reason}` counter (10-value bounded enum) + `claude_hook_circuit_breaker_state` gauge + zod envelope validators on the highest-volume tools detect upstream API drift before it corrupts the audit log. Anthropic SDK upgrades cannot silently break our compliance posture. + +In the institutional buyer's EU AI Act readiness checklist, the answer to each row is a table name, an endpoint, a metric, or a Cloud Trace URL — not an architecture diagram. + +### 10.2 Common Regulatory Objections (Update Q2 2026) + +These four objections come up in nearly every compliance-conscious buyer conversation. Memorize the responses. + +**Q1: "How do you handle EU AI Act?"** + +> Articles 12-15 are mapped row-by-row to shipping artifacts. Article 12 logging is `hook_audit_log` + `access_log`. Article 13 transparency is the audit-export endpoint at `/api/session/:sessionKey/audit-report`. Article 14 human oversight is the admin router (`/halt`, `/override`, `/legal-hold`) with everything written to `human_interventions`. Article 15 reproducibility is the byte-replay envelope on every code execution — `system_prompt_hash + python_code + git_sha + sdk_version + container_id + anthropic_request_id`. We don't claim "AI Act ready"; we ship the artifacts. Bring your auditor. + +**Q2: "What about GDPR data deletion?"** + +> Two layers. At the admin read-time, regulators get full audit data — that's the access boundary. At offboarding-time, when a client deployment is decommissioned, the `client-offboarding` skill calls `redactSessionEventData()` which overwrites JSONB content paths to `[REDACTED]` before the database is exported. Plus cascade DELETE via FK on session removal, plus `pii_mappings.erasePII()` for pseudonym-mapped real values. Erasure is a boundary, not an event triggered by every regulator query. + +**Q3: "How do regulators verify what your AI did?"** + +> Every `run_python_analysis` call stores 13 reproducibility columns. A regulator with the audit-export endpoint output can reconstitute the exact Python that ran, the exact model that wrote it, the exact prompt context (via system_prompt_hash collision check against your prompt library), the exact deployment SHA at execution time, and the exact Anthropic container that ran it. Byte-faithful. No competitor offers this — copilots treat the LLM as a black box invoked through chat; we treat it as a component in an instrumented pipeline. + +**Q4: "What happens if your model hallucinates a citation?"** + +> Two safety nets. First, every memo footnote is matched to the raw source archive via `citation_source_links` with a confidence score (1.00 = URL exact match; <0.85 = fuzzy match flagged for QA review). Second, the citation_websearch_verifier subagent gate-checks every citation against live databases before the memo certifies. A hallucinated citation either (a) fails the source-link match at confidence threshold, or (b) fails verification at the QA gate, or (c) trips the REJECT_ESCALATE certification and is returned to remediation. We don't claim citations are never wrong — we claim wrong ones don't reach delivered output. + +**Closing move on regulatory questions**: offer to share an anonymized audit-export from a recent production session. Three minutes of curl and a redacted JSON file does more than fifty slides. If they're seriously evaluating, they'll engage with the artifact. --- @@ -520,4 +551,4 @@ In the institutional buyer's EU AI Act readiness checklist, the answer to each r --- -*This document is derived from Aperture's brand strategy, system design, competitive intelligence, financial projections, and live deal output evidence. All metrics reflect v4.1.2 production benchmarks and market intelligence as of March 2026. Project Nexus proof points from March 7, 2026 live execution. Deal economics from Bain M&A Report 2025.* +*This document is derived from Aperture's brand strategy, system design, competitive intelligence, financial projections, and live deal output evidence. Production benchmarks reflect v6.8.5 (May 2026 — Wave 5 compliance + FMP equity research integration). Project Nexus proof points from March 7, 2026 live execution. Deal economics from Bain M&A Report 2025.* diff --git a/super-legal-mcp-refactored/company-strategy/system-design.md b/super-legal-mcp-refactored/company-strategy/system-design.md index 963d73f74..0b1a499e1 100644 --- a/super-legal-mcp-refactored/company-strategy/system-design.md +++ b/super-legal-mcp-refactored/company-strategy/system-design.md @@ -2,7 +2,7 @@ ## Architecture Overview of `claude-sdk-server.js` and Orchestration Pipeline -**Version**: v6.2.3 | **Date**: 2026-04-22 | **Orchestrator Model**: claude-sonnet-4-6 +**Version**: v6.8.5 | **Date**: 2026-05-06 | **Orchestrator Model**: claude-sonnet-4-6 --- @@ -13,7 +13,7 @@ 1. [System Summary](#1-system-summary) 2. [Server Architecture](#2-server-architecture) 3. [Pipeline Phases](#3-pipeline-phases) -4. [Agent Catalog (42 Subagents)](#4-agent-catalog) +4. [Agent Catalog (45 Subagents)](#4-agent-catalog) 5. [Hook Lifecycle System](#5-hook-lifecycle-system) 6. [Tool & Domain Architecture](#6-tool--domain-architecture) 7. [Auto-Continuation Engine](#7-auto-continuation-engine) @@ -28,6 +28,7 @@ 13. [Citation Chat — Session-Scoped RAG Q&A](#13-citation-chat--session-scoped-rag-qa) 14. [Knowledge Graph — Extraction, Provenance & Visualization](#14-knowledge-graph--extraction-provenance--visualization) 14b. [Observability Wave 3 — Compliance & Governance Infrastructure](#14b-observability-wave-3--compliance--governance-infrastructure-v620) +14c. [v6.7–v6.8.5 — Reconciliation, Transcript Persistence & Wave 5 Compliance](#14c-v67v685--reconciliation-transcript-persistence--wave-5-compliance-machinery) **Part III — Platform Infrastructure** @@ -71,6 +72,14 @@ Super-Legal (Aperture) is an AI-powered legal research, memorandum generation, a 18. **GCS WORM tiering** with 8-year immutable retention and hot→Coldline lifecycle 19. **Provenance chain completion** with source_hash, edge provenance, and embedding similarity linking +**v6.7–v6.8.5 Additions** (transcript persistence, reproducibility, Wave 5 compliance): +20. **Auto-reconciliation loop** — KG node and embedding rebuilds for sessions that completed before persistence shipped (v6.7.0) +21. **Full-fidelity transcript persistence** — every SSE event captured to `transcript_events` for byte-faithful session reload (v6.8.0) +22. **Persistence observability & circuit breakers** — Prometheus failure counters, zod envelope validation, SIGTERM durability via `backgroundTasks` Set (v6.8.1–v6.8.2) +23. **Code-execution traceability** — 13 reproducibility columns (model_id, system_prompt_hash, python_code, container_id, git_sha) + OTel `code_execution.lifecycle` root span enabling byte-replayable LLM-generated code from audit log alone (v6.8.4) +24. **Wave 5 compliance machinery** — `citation_source_links` fuzzy matching, `redactSessionEventData()` GDPR Art. 17 erasure boundary, `GET /api/session/:sessionKey/audit-report` regulator-facing transparency export endpoint, zod tool-envelope drift canary (v6.8.5) +25. **FMP equity-analyst integration** — standalone subagent + 36-tool client trio + 11 valuation/M&A code-execution models, gated behind `FMP_ENABLED=false` (v6.8.5) + The server exposes Express.js endpoints streaming via SSE to a vanilla JS frontend with three interactive tabs (Transcript, Chat, Graph). The Graph tab provides three visualization modes: Force Graph (topology), Tree View (hierarchy), and Flow View (IC-grade progressive disclosure with deal snapshot, financial waterfall, scenario analysis, regulatory pathways, and provenance drill-down). --- @@ -130,7 +139,7 @@ The server lazily initializes three tiers of clients: |------|------|------| | **OpenTelemetry** | Auto-instrumentation SDK (Express, HTTP, pg spans) | bootstrap.js (before imports, when `OTEL_ENABLED`) | | **API Clients** | 38 hybrid clients (SEC, FDA, EPA, USPTO, Congress.gov, regulations.gov, NASA NTRS, FRED, FMP-equities, DirectFetch, etc.) | First request needing tool execution | -| **SDK Tools** | 149+ tool definitions (151+ with EXA_WEB_TOOLS) with real handlers | First `/api/research` call | +| **SDK Tools** | 197 tool definitions (gated: 161 base + 36 FMP equity tools when `FMP_ENABLED=true`) with real handlers | First `/api/research` call | | **Agent SDK MCP** | MCP server wrapping all tools for `agentQuery` | First `/api/stream` call | | **PostgreSQL** | Session/agent/tool persistence via `HOOK_DB_PERSISTENCE` | First hook event (when flag=true) | | **GCS Tiering** | WORM bucket connection + daily daemon schedule | Server startup (when `GCS_TIERING`) | @@ -295,7 +304,7 @@ The P0 agent runs in a **dedicated agentQuery** before the main orchestrator, wi ## 4. Agent Catalog -### 4.1 Summary Table (42 Subagents) +### 4.1 Summary Table (45 Subagents) | # | Agent | Phase | Model | Tools | Duration | |---|-------|-------|-------|-------|----------| @@ -1589,6 +1598,260 @@ DDL is also added to `USERS_DDL` in `src/db/postgres.js` (dual-path convention --- +## 14c. v6.7–v6.8.5 — Reconciliation, Transcript Persistence & Wave 5 Compliance Machinery + +This section covers the eight releases shipped between Wave 3 (v6.2.x) and the v6.8.5 audit-export endpoint. Each ships independently feature-flagged with zero impact on the core pipeline when disabled. + +### 14c.1 Auto-Reconciliation Loop (v6.7.0) + +**Flag**: `SESSION_RECONCILIATION=true` | **Files**: `src/observability/reconciliationDaemon.js`, `src/db/postgres.js` (kg_status columns) + +Wave 3 shipped KG persistence and embedding persistence as forward-only systems. Sessions completed before those flags landed retained their raw reports but had no KG nodes or embeddings — the KG tab and Citation Chat returned empty for older sessions. v6.7.0 closes this gap with a daemon that detects under-built sessions and rebuilds them in the background. + +**Schema additions** (on `sessions` table — `ALTER TABLE ADD COLUMN IF NOT EXISTS` per dual-path convention): + +``` +sessions + |-- kg_status VARCHAR(20) -- 'pending' | 'building' | 'complete' | 'failed' + |-- kg_node_count INTEGER + |-- kg_edge_count INTEGER + |-- kg_built_at TIMESTAMPTZ + |-- kg_error TEXT + |-- embedding_status VARCHAR(20) + |-- embedding_count INTEGER + |-- embedding_built_at TIMESTAMPTZ + |-- embedding_error TEXT + |-- reconciliation_attempts INTEGER DEFAULT 0 +``` + +**Daemon flow**: +1. Every 5 minutes, scan `sessions` where `status='complete'` AND (`kg_status IS NULL` OR `kg_status='failed' AND reconciliation_attempts < 3`) +2. For each candidate, dispatch to `kgBuilder.buildFromReports(sessionId)` and `embeddingService.persistReportEmbeddings(sessionId)` via `setImmediate` (non-blocking, fire-and-forget) +3. On success: `kg_status='complete'`, set counts, set `kg_built_at` +4. On failure: `kg_status='failed'`, increment `reconciliation_attempts`, store error +5. After 3 failed attempts the session is parked (manual intervention required via admin endpoint) + +**Concurrency cap**: 3 simultaneous rebuilds — bounded so reconciliation cannot saturate the connection pool while live streaming is active. + +**Backfill semantics** (`BACKFILL_SESSION_STATUS_SEMANTICS_DDL`): on first boot after v6.7.0, sessions with non-null `final_score` and no terminal status are migrated to `status='complete'` so they become reconciliation candidates. The DDL is idempotent and wraps the UPDATE in an `information_schema.tables` IF EXISTS gate (added in v6.8.2 — fixes fresh-DB throw on new client provisioning). + +**Admin endpoint** (added in v6.7.0): `POST /api/admin/reconcile/:sessionKey` triggers an immediate rebuild outside the 5-minute cycle. The admin frontend exposes a "Rebuild KG" button on each session row. + +### 14c.2 Deploy Hardening & Prompt Cleanup (v6.7.1–v6.7.3) + +These three hotfixes address operational issues observed during the v6.7.0 deploy and subsequent memo runs. + +**v6.7.1** — Static IP race recovery in `deploy.sh`: +- Pre-flight wait: poll `gcloud compute addresses list` until status is `RESERVED` (not `IN_USE`); up to 90s window +- Bumped retry attempts from 3 → 5 with 30s wait (total 150s) +- Captured stderr to a tempfile so failures surface in deploy log instead of being swallowed by `2>/dev/null` +- On final failure, prints manual recovery commands directly to stderr + +**v6.7.2** — Boot-time schema-init container restart in `deploy.sh` Step 8.5: +- After static IP assignment succeeds, SSH into instance and `docker restart` the container +- Forces `ensureHookSchema()` to retry with the now-whitelisted IP (Cloud SQL only whitelists `34.26.70.60`; new GCE instance briefly holds an ephemeral IP before MIG reassigns the static IP) +- Only fires when `ASSIGNED=true` (skipping when ephemeral IP is in use, since the same Cloud SQL whitelist failure would recur) + +**v6.7.3** — Orchestrator and source-level emoji suppression: +- Edits to `prompts/memorandum-synthesis/{intake-research,legal-standards,roles}.md` add explicit "no decorative pictographs" guidance +- Aligns memo aesthetics with executive-prose conventions for IB/PE/M&A clients +- Zero schema impact + +### 14c.3 Transcript Persistence (v6.8.0) + +**Flag**: `TRANSCRIPT_DB_PERSISTENCE=true` (default OFF until activated) | **File**: `src/server/streamContext.js`, `src/server/dbFrontendRouter.js` + +The center transcript pane in the React frontend renders content from 18 SSE event types — `agent_progress`, `hook_event`, thinking variants, `tool_call`, `delta`, `assistant_text`, `doc_convert`, `system_init`, etc. All flow through one chokepoint (`ctx.send()` in `streamContext.js`) but pre-v6.8.0 none were persisted. On session reload, users saw reconstructed agent cards + final reports but lost the orchestrator's reasoning narrative, inline tool blocks, thinking blocks, and system status messages. v6.8.0 captures every SSE event for byte-faithful replay on session reload. + +**Schema**: +``` +transcript_events + |-- id BIGSERIAL PRIMARY KEY + |-- session_id UUID NOT NULL REFERENCES sessions(id) ON DELETE CASCADE -- GDPR Art. 17 cascade + |-- session_key VARCHAR(75) NOT NULL + |-- sequence_number BIGINT NOT NULL + |-- event_type VARCHAR(64) NOT NULL + |-- event_data JSONB NOT NULL -- full payload, no filtering + |-- created_at TIMESTAMPTZ DEFAULT NOW() + +CREATE INDEX idx_transcript_session_seq + ON transcript_events(session_id, sequence_number ASC); -- single replay-path index +``` + +**Buffered batch insert pattern** (write-volume mitigation): typical 30-50 minute session produces 4,000–6,000 SSE events distributed across 43 event types, peaking at 8–12 events/sec during orchestrator streaming. Synchronous fire-and-forget writes would saturate the 10-connection pool. Solution: + +1. Each `ctx.send()` call appends `{sessionId, sessionKey, sequenceNumber, eventType, eventData}` to an in-memory `transcriptBuffer` attached to `ctx` +2. Flush triggers: buffer reaches 50 events OR 2 seconds elapse since last flush OR `ctx.end()` (final flush) +3. Each flush issues one multi-row `INSERT INTO transcript_events VALUES (...), (...), ...` +4. Collapses ~5,000 individual writes into ~100 batched inserts per session — pool load is trivial + +**FK timing — buffer-until-resolved**: early SSE events (`system_init`, `prompt_enhancement_status`) fire 1–5 ms into stream, but `sessions` row INSERT happens at first `SubagentStart` hook (~1–3 s in). To avoid losing the early 50–100 events: + +1. Always buffer events as soon as they fire, even when `ctx.dbSessionId` is null (preserves event ordering) +2. `_flushTranscriptBuffer()` checks `dbSessionId` at flush time: + - If still null → re-arm timer, retry on next tick (events stay in buffer, no loss) + - If resolved → back-fill null sessionIds with `this.dbSessionId` and INSERT +3. Bound: if `dbSessionId` never resolves (DB down at request start), failure counter stops retries after 3 attempts + +**Frontend replay** (`test/react-frontend/app.js`): `replayTranscript(sessionKey)` fetches `GET /api/db/sessions/:sessionKey/transcript`, sets `replayMode=true`, and dispatches every event through `handleStreamEvent()`. Performance optimization: consecutive `delta` events for the same bubble are batched into a single render to avoid 2,000 markdown re-renders. Replay time: <100ms for ~5,000 events. + +**Fallback**: pre-v6.8 sessions return `{status: 'no_transcript'}`. Frontend falls back to `reconstructPhaseProgress()` + `reconstructTimeline()` (the pre-v6.8 behavior) and shows a small banner. + +**Storage**: ~700KB-1MB per session × 10K sessions ≈ 7-10 GB; ~$1-2/month at GCP storage pricing. + +### 14c.4 Persistence Observability (v6.8.1) + +**Files**: `src/utils/sdkMetrics.js`, `src/utils/hookDBBridge.js`, `src/schemas/toolEnvelopes.js` + +Three independent additions hardening the v6.6.0 + v6.8.0 persistence path: + +**Prometheus failure counters**: +``` +claude_hook_persistence_failures_total{hook, reason} +claude_hook_circuit_breaker_state (gauge: 0=closed, 1=open, 2=half-open) +claude_code_execution_failures_total +claude_hook_invocations_total{hook, status} +``` + +**Zod envelope validator** on `run_python_analysis`: every code-execution tool invocation is validated against a strict schema before persisting. Drift detection — if Anthropic ever changes the tool input shape, the validator fires `claude_hook_persistence_failures_total{reason='envelope_mismatch'}` and circuit-breaks before corrupting the audit log. + +**SIGTERM durability fix in `persistCodeExecution`**: pre-v6.8.1 the code-execution persistence was fire-and-forget. On SIGTERM mid-flight, in-flight writes were lost. v6.8.1 registers each persistence promise with the `backgroundTasks` Set (introduced in v6.6.0); graceful shutdown awaits all in-flight tasks before exit. + +### 14c.5 Schema Bootstrap & Durability Fixes (v6.8.2) + +**Schema**: `BACKFILL_RECONCILIATION_STATUS_DDL` now wraps the `UPDATE` in an `information_schema.tables` IF EXISTS gate. Pre-v6.8.2 a fresh-DB boot would throw because the DDL referenced `kg_status` before the column was created; the gate makes the DDL safe on both fresh provisioning and existing databases. + +**Prometheus alerts** (`prometheus/alerts.yml`): three rules added — `ClaudeHookPersistenceFailureRateHigh`, `ClaudeCircuitBreakerOpen`, `ClaudeCodeExecutionFailureRateHigh`. Wired to `severity: warning` with 5-minute windows. + +**Six fire-and-forget durability fixes** (each wraps an in-flight async write with `backgroundTasks.add()`): kg-build daemon, embedding daemon, transcript flush, citation persistence, audit log INSERT, archive-old-sessions sweep. Confirms graceful-shutdown semantics across every async write path. + +### 14c.6 Code-Execution Traceability (v6.8.4) + +**Files**: `src/db/postgres.js` (13 new columns + lineage junction), `src/tools/codeExecutionBridge.js`, `src/utils/buildVersion.js`, `Dockerfile` (`COMMIT_SHA` build arg) + +Every `run_python_analysis` execution is now byte-replayable from the audit log alone. Before v6.8.4, the bridge logged success/failure and chart count; the actual prompt, generated code, model identity, and runtime environment were ephemeral. EU AI Act Article 12 logging and Article 15 reproducibility require the complete picture. + +**13 new columns on `code_executions`** (added via `ALTER TABLE ADD COLUMN IF NOT EXISTS` per dual-path convention): + +| Column | Purpose | +|---|---| +| `model_id` | exact `claude-sonnet-4-6-...` revision | +| `llm_name` | provider identity (e.g., `anthropic`) | +| `anthropic_request_id` | server-side correlation ID for replay against Anthropic logs | +| `input_tokens`, `output_tokens` | per-execution token counts | +| `cache_read_tokens`, `cache_creation_tokens` | prompt-caching metrics | +| `system_prompt_hash` | SHA-256 of the system prompt — detects prompt drift | +| `python_code` | full generated Python source (TEXT, no cap; bounded by 15-block multi-turn limit) | +| `python_code_hash` | SHA-256 for deduplication and quick equality checks | +| `container_id` | Anthropic code-execution sandbox identifier | +| `tool_use_id` | exact correlation to PostToolUse hook | +| `stop_reason` | `end_turn` | `pause_turn` | `refusal` | `max_tokens` | +| `refusal_detected` | boolean — Anthropic-side refusal trapped in bridge | + +**`code_execution_inputs` lineage junction**: links each execution to the upstream subagent reports/embeddings/KG nodes whose data fed the code. Enables data-lineage queries — "which subagent's output drove this DCF result?" + +**`bridge_metadata` JSONB column on `hook_audit_log`**: captured at PostToolUse time, contains `{git_sha, sdk_version, container_id, system_prompt_hash}`. Combined with `python_code` from `code_executions`, gives a complete reproducibility envelope. + +**OTel root span** `code_execution.lifecycle` (in `codeExecutionBridge.js`): wraps the full multi-turn execution including pause-turn continuations. Span attributes include all 13 traceability fields plus turn count, pause count, chart count. + +**`COMMIT_SHA` build arg** (Dockerfile + `src/utils/buildVersion.js`): +```dockerfile +ARG COMMIT_SHA=unknown +ENV COMMIT_SHA=${COMMIT_SHA} +``` +The `deploy` and `client-provisioner` skills now invoke `docker build --build-arg COMMIT_SHA=$(git rev-parse HEAD)` automatically. Without it, `bridge_metadata.git_sha='unknown'` (graceful but information-poor for regulator replay). + +**Replay envelope**: a regulator can replay any production code execution from `system_prompt_hash + python_code + git_sha + sdk_version + container_id + model_id + anthropic_request_id`. Byte-faithful reproduction. + +### 14c.7 Wave 5 Compliance Machinery (v6.8.5) + +This is the largest single release in the v6.8.x series. Five components ship together to close the regulator-facing transparency gap. + +**Citation source bridge** (`citation_source_links` table + `src/utils/citationParser.js`): +``` +citation_source_links + |-- id BIGSERIAL PRIMARY KEY + |-- report_id UUID REFERENCES reports(id) ON DELETE CASCADE + |-- citation_marker TEXT -- e.g., '[12]', 'Smith 2024' + |-- source_hash VARCHAR(64) -- SHA-256 of matched raw source + |-- match_method TEXT -- 'url_exact' | 'url_fuzzy' | 'title_fuzzy' | 'embedding_cosine' + |-- confidence_score NUMERIC(3,2) -- 0.00–1.00 + |-- matched_at TIMESTAMPTZ +``` +Closes the loop from memo footnote → archived raw source. Fuzzy URL/title matching with confidence scores makes "show me the source for footnote 47" answerable in milliseconds. + +**GDPR Article 17 erasure** (`redactSessionEventData()` in `src/utils/retentionManager.js`): +- UPDATE (not DELETE) — overwrites JSONB content paths to `[REDACTED]` +- Idempotent — safe to call twice +- Legal-hold gate is the caller's responsibility (the `tombstoneSession` admin endpoint already enforces this) +- **Wired at offboarding-time, not access-time** — regulators querying the audit-export endpoint require full data. Redaction fires when a client deployment is decommissioned via the `client-offboarding` skill (Step 6.5: redact before `gcloud sql export`) + +**Regulator-facing audit-export endpoint** (`GET /api/session/:sessionKey/audit-report`): + +Aggregates the complete audit trail for one session into a single JSON or CSV.gz response: +- `code_executions` with all 13 traceability fields + `python_code_length` +- `code_execution_inputs` data lineage counts +- `hook_audit_log` event sequence with `bridge_metadata` per code execution +- `human_interventions` (Wave 3 admin actions) +- `access_log` (Wave 3 evidence-read trail) +- `citation_source_links` (memo cell → source) + +Auth: `cookieAuthMiddleware` server-wide + `createAccessAuditMiddleware('session_data')` router-wide. Every audit-export read is itself logged. Format: JSON default, `?format=csv` for CSV.gz. Designed for EU AI Act Article 13 transparency demands. + +**Zod tool envelope schemas** (`src/schemas/toolEnvelopes.js`): five strict schemas for the highest-volume tools — `run_python_analysis`, `generate_chart`, `search_sec_filings`, `get_court_opinions`, `analyze_patent`. Each tool's input is validated before persistence; mismatches emit `claude_hook_persistence_failures_total{reason='envelope_drift'}`. Drift canary for Anthropic SDK upgrades. + +**Subagent CAPABILITY constants** (instrumented across 7 subagents): each subagent now exports a `CAPABILITY` constant declaring its tool surface, output schema, and `data_provenance` claim. Surfaces in `code_execution_inputs` for full lineage. + +**Metric churn**: +- 4 new metrics (persistence failures, circuit breaker, code execution failures, hook invocations — see §14c.4) +- 1 deprecated counter: `claude_tool_invocations_total` → `claude_tool_invocations_v2_total{tool_name}` (bounded labels) +- 7-day dual-emission window before legacy counter is removed in v6.8.6 + +### 14c.8 FMP Equity-Analyst Integration (v6.8.5) + +**Flag**: `FMP_ENABLED=false` (default OFF) | **Files**: `src/api-clients/FMP{Client,HybridClient,WebSearchClient}.js`, `src/config/legalSubagents/agents/equity-analyst.js` + +Closes the platform's largest specialist gap: pre-v6.8.5 the existing `securities-researcher` had zero access to live stock prices, fundamentals, valuation multiples, analyst ratings, earnings call transcripts, or institutional holdings — all of which IB/PE/M&A memos routinely cite. + +**3-file FMP client trio**: +- `FMPClient` — 36 native methods against `financialmodelingprep.com/stable/*` +- `FMPWebSearchClient` — Exa-fallback mirrors for graceful degradation when API key absent or rate-limited +- `FMPHybridClient` — wrapper choosing native vs. fallback based on availability and cost + +Defensive engineering captured during Day 1–2 empirical probing (182 fixtures across 36 endpoints): 30s `AbortController`, 429 backoff, 200-with-error envelope detection, `_applySliceProfile` DRY helper, Tier 0/1/2 deterministic transcript trimming. + +**`equity-analyst` standalone subagent** with full peer parity: +- `## Your Expertise / ## Research Methodology / ## Output Format` prompt structure +- 14-keyword `MUST BE USED` line (peer cohorts, multiples, analyst estimates, transcripts, institutional holdings, etc.) +- Domain assignment: `['equities', 'sec', 'fred', 'code-execution', ...]` when `FMP_ENABLED=true`, falls back to `['sec', 'fred', 'code-execution', ...]` when off +- Catalog stage: `research_support` (matches `data-analyst` peer pattern) + +**11 code-execution models** (M46–M55, M58): +- Valuation cohort × 8: peer cohort identification, multiple decomposition regression, premium/discount bridge, quality-adjusted multiples, EPS surprise analysis, reverse DCF, earnings-call sentiment, etc. +- M&A Analysis × 3: deal-comparable selection, control-premium estimation, PE/strategic buyer pricing differential + +**Architectural rationale for standalone agent** (validated against `SCOPED_MCP_SERVERS=false` reality): +- `financial-analyst` reviews **financials** (statements, ratios, DCF, LBO, capital structure). Data: SEC filings, code-execution models M01–M45. +- `equity-analyst` reviews **securities data** (stock prices, multiples, ratings, transcripts, holdings). Data: FMP `/stable` API, code-execution models M46–M55, M58. +- `securities-researcher` continues to handle SEC filings (EDGAR, 10-K/Q/8-K) — does not pull live market data. + +With scoped MCP servers permanently off (blocked on Anthropic SDK upstream issue #14), the boundary between agents is enforced by the system prompt + the orchestrator's keyword-based delegation routing. Two separate prompts produce cleaner outputs than one fuzzy multi-purpose prompt; folding the agents would degrade routing fidelity. + +**Activation runbook**: `docs/pending-updates/equity-analyst-update.md` § 8.4.X documents 4 mandatory deployment-test verifications (V1: hook_audit_log for FMP tools, V2: hook_audit_log for M46–M58, V3: Cloud Trace SubagentStart with `stage=research_support`, V4: citation-websearch-verifier accepts FMP citations) executed inside the container before MIG promotion. + +### 14c.9 Architectural Invariants Preserved Through v6.8.5 + +Every release in this section adheres to four invariants tested empirically: + +1. **Dual-path schema discipline** — every new column lands as `ALTER TABLE ADD COLUMN IF NOT EXISTS` in BOTH a versioned migration file (`migrations/0NN_*.up.sql`) AND `ensure*Schema()` runtime DDL. Verified by §11 schema audit. +2. **Feature-flag gating** — every release ships behind a default-OFF flag; activation is a `flags.env` flip + redeploy. Per `feedback_user_value_paramount.md`, no release activates implicitly. +3. **Zero hot-path blocking** — every persistence write is fire-and-forget via `setImmediate()` or `backgroundTasks.add()`. The streaming chokepoint (`ctx.send()`) is augmented with buffered writes but never awaited. +4. **Graceful shutdown** — `backgroundTasks` Set is awaited on SIGTERM; in-flight writes complete before exit. Verified across SIGTERM tests in `code-execution-hook-e2e.test.js`. + +These invariants are why the v6.8.5 audit-export endpoint can claim byte-faithful reproducibility while the streaming pipeline stays under p99 200ms hook latency. + +--- + ## 15. Feature Flag Registry | Flag | Default | Purpose | diff --git a/super-legal-mcp-refactored/docs/api-reference.md b/super-legal-mcp-refactored/docs/api-reference.md new file mode 100644 index 000000000..c79f0f428 --- /dev/null +++ b/super-legal-mcp-refactored/docs/api-reference.md @@ -0,0 +1,465 @@ +# API Reference — Operator & Regulator Endpoints + +**Version**: v6.8.5 | **Date**: 2026-05-06 | **Scope**: operator/regulator-facing endpoints only (admin governance, audit/compliance, health, authentication, knowledge graph, search, document conversion). Frontend-internal endpoints (catalog, agent-progress polling, timeline reconstruction) are excluded. + +For frontend-internal endpoints, see `src/server/dbFrontendRouter.js` and `src/server/claude-sdk-server.js` directly. + +--- + +## Authentication & Authorization + +The server uses a **two-layer auth model**: + +1. **`cookieAuthMiddleware`** — applied server-wide before `dbFrontendRouter` and `adminRouter` mount points. Validates the session cookie issued by `POST /api/auth/login`. When `AUTH_ENABLED=false` (local dev), all requests pass through with a synthesized anonymous user. +2. **`requireAdmin`** — applied per-route on every `adminRouter` endpoint. Rejects requests where `user.role !== 'admin'` with HTTP 403. + +Two access-audit middlewares wrap subsets of read paths: +- **`createAccessAuditMiddleware('session_data')`** — applied router-wide on `dbFrontendRouter`. Every `/api/db/*` and `/api/session/*` read writes a row to `access_log`. +- **`kgAccessAudit`** / **`transcriptAccessAudit`** — applied per-route on KG and transcript endpoints. Specializes the resource_type label. + +--- + +## Table of Contents + +1. [Health & Metrics](#1-health--metrics) +2. [Authentication](#2-authentication) +3. [Audit & Compliance Endpoints](#3-audit--compliance-endpoints) +4. [Admin Governance Endpoints](#4-admin-governance-endpoints) +5. [Knowledge Graph Endpoints](#5-knowledge-graph-endpoints) +6. [Search Endpoints](#6-search-endpoints) +7. [Reports & Raw Source Access](#7-reports--raw-source-access) +8. [Document Conversion](#8-document-conversion) +9. [Common Response Conventions](#9-common-response-conventions) + +--- + +## 1. Health & Metrics + +### `GET /health` + +**Auth**: none (public) +**Purpose**: liveness + feature flag dump + circuit breaker state + reconciliation status. + +**Response**: +```json +{ + "status": "ok", + "uptime_s": 1234, + "version": "v6.8.5", + "git_sha": "98a1a406", + "flags": { + "AUTH_ENABLED": true, + "HOOK_DB_PERSISTENCE": true, + "TRANSCRIPT_DB_PERSISTENCE": true, + "KNOWLEDGE_GRAPH": true, + "FMP_ENABLED": false, + "OTEL_ENABLED": true, + "...": "..." + }, + "dependencies": { + "database": { "status": "ok", "latency_ms": 3 }, + "anthropic": { "status": "ok", "circuit_breaker": "closed" } + }, + "reconciliation": { + "kg_pending": 0, + "artifacts_pending": 0, + "last_scan_at": "2026-05-06T18:45:00Z" + } +} +``` + +**Status codes**: `200 OK` always (even when degraded — body indicates degradation). + +### `GET /metrics` + +**Auth**: none (network-layer ACL — Prometheus scrapes from same VPC) +**Purpose**: Prometheus exposition endpoint. +**Response**: `text/plain; version=0.0.4`. See `docs/metrics-catalog.md` for full metric inventory. + +--- + +## 2. Authentication + +### `POST /api/auth/login` + +**Auth**: none (this IS the auth endpoint) +**Body**: +```json +{ "email": "user@example.com", "password": "..." } +``` +**Response (200)**: sets `Set-Cookie: session=...; HttpOnly; Secure; SameSite=Strict`. +```json +{ "id": 42, "email": "user@example.com", "role": "admin" } +``` +**Errors**: +- `401` — invalid credentials +- `403` — user `status !== 'active'` (deactivated) + +**Notes**: passwords verified via `bcrypt.compare()` against stored `password_hash` (hashed with `bcrypt.hash(..., BCRYPT_ROUNDS=12)`). On success, `last_login = NOW()` is fire-and-forget. + +### `POST /api/auth/logout` + +**Auth**: none +**Response (200)**: clears session cookie. + +### `GET /api/auth/me` + +**Auth**: `cookieAuthMiddleware` +**Response**: current user identity. +```json +{ "id": 42, "email": "user@example.com", "role": "admin", "status": "active" } +``` + +### `POST /api/auth/change-password` + +**Auth**: `cookieAuthMiddleware` +**Body**: +```json +{ "current_password": "...", "new_password": "..." } +``` +**Response (200)**: `{ "ok": true }`. Rehashes with bcrypt at `BCRYPT_ROUNDS` cost factor. +**Errors**: `401` if `current_password` does not match. + +--- + +## 3. Audit & Compliance Endpoints + +### `GET /api/session/:sessionKey/audit-report` + +**Auth**: `cookieAuthMiddleware` + `createAccessAuditMiddleware('session_data')` +**Purpose**: EU AI Act Art. 13 transparency export. Aggregates the complete audit trail for one session into a single response. + +**Path params**: `sessionKey` — must match `SESSION_KEY_RE` regex. + +**Query params**: +- `format=json` (default) — returns JSON object +- `format=csv` — returns CSV.gz (gzipped CSV stream of code_executions only) + +**Response (200, JSON)**: +```json +{ + "session": { + "id": "uuid", + "session_key": "string", + "status": "complete", + "transaction_name": "...", + "created_at": "...", + "updated_at": "...", + "quality_tier": "...", + "final_score": 0.94 + }, + "code_executions": [ + { + "id": 123, + "agent_type": "financial-analyst", + "model_id": "claude-sonnet-4-6-...", + "anthropic_request_id": "...", + "input_tokens": 12450, + "output_tokens": 3200, + "cache_read_tokens": 50000, + "cache_creation_tokens": 0, + "system_prompt_hash": "sha256:...", + "python_code_length": 2843, + "tool_use_id": "...", + "stop_reason": "end_turn", + "refusal_detected": false, + "execution_time_ms": 4521, + "turn_count": 1, + "pause_count": 0, + "chart_count": 3, + "input_source_count": 4, + "created_at": "..." + } + ], + "bridge_metadata": [ + { "tool_use_id": "...", "bridge_metadata": { "git_sha": "...", "sdk_version": "...", "container_id": "...", "system_prompt_hash": "..." }, "created_at": "..." } + ], + "citations": [ + { "report_id": "...", "citation_marker": "[12]", "source_hash": "...", "match_method": "url_exact", "confidence_score": 1.00, "matched_at": "..." } + ], + "human_interventions": [], + "access_log": [ + { "user_id": 42, "endpoint": "/api/db/sessions/.../report/...", "method": "GET", "accessed_at": "..." } + ] +} +``` + +**Response (404)**: `{ "error": "Session not found" }` +**Response (400)**: `{ "error": "Invalid session key format" }` +**Response (503)**: `{ "error": "Database not configured" }` + +**Note on PII**: redaction fires at offboarding-time via `retentionManager.redactSessionEventData()` per GDPR Art. 17, NOT at admin read-time. Regulators require full audit data; redaction is the erasure boundary, not the access boundary. + +**Reference**: `docs/runbooks/v6.8.5-audit-export.md` for response interpretation and CSV format. + +### `GET /api/db/sessions/:sessionKey/transcript` + +**Auth**: `cookieAuthMiddleware` + `transcriptAccessAudit` +**Purpose**: v6.8.0 full-fidelity SSE event replay for session reload. + +**Response (200, when transcript exists)**: +```json +{ + "status": "available", + "events": [ + { "sequence_number": 0, "event_type": "system_init", "event_data": {...}, "created_at": "..." }, + { "sequence_number": 1, "event_type": "agent_progress", "event_data": {...}, "created_at": "..." } + ], + "total": 5634 +} +``` + +**Response (200, when no transcript)** — pre-v6.8.0 sessions: +```json +{ "status": "no_transcript", "message": "Session predates transcript persistence (v6.8.0)", "events": [], "total": 0 } +``` + +**Errors**: `400` (invalid session key), `503` (DB unavailable), `401` (auth required when `AUTH_ENABLED=true`). + +--- + +## 4. Admin Governance Endpoints + +All endpoints in this section require `requireAdmin` middleware. Reject with `403 Forbidden` for non-admin users. All admin actions are logged to `human_interventions` with user identity, action, reason, and timestamp. + +### User Lifecycle + +| Method | Path | Body | Purpose | Shipped | +|---|---|---|---|---| +| POST | `/api/admin/users` | `{email, password, role}` | Create user. Hashes password with bcrypt at `BCRYPT_ROUNDS`. | v6.2.0 | +| GET | `/api/admin/users` | — | List users (id, email, role, status, last_login, created_at). | v6.2.1 | +| POST | `/api/admin/users/:id/deactivate` | `{reason}` | Set `status='deactivated'`. Login rejected with 403. | v6.2.1 | +| POST | `/api/admin/users/:id/activate` | `{reason}` | Reactivate user (`status='active'`). | v6.2.1 | + +### Session Governance + +| Method | Path | Body | Purpose | Shipped | +|---|---|---|---|---| +| POST | `/api/sessions/:sessionKey/halt` | `{reason}` | Halt active session/agent. Persists to `human_interventions`. | v6.2.0 | +| POST | `/api/sessions/:sessionKey/override` | `{reason, instructions}` | Override agent decision; resume with new instructions. Persists to `human_interventions`. | v6.2.0 | +| POST | `/api/admin/sessions/:sessionId/legal-hold` | `{enabled: true, reason}` | Set/clear legal hold. Prevents all deletion/modification. | v6.2.0 | +| POST | `/api/admin/sessions/:sessionId/retention-class` | `{class: 'STANDARD_3Y'\|...}` | Assign retention class. Computes `retention_expires_at`. | v6.2.0 | +| POST | `/api/admin/sessions/:sessionId/tombstone` | `{reason}` | Tombstone session — redact content, preserve skeleton. Throws `RetentionViolationError` if `legal_hold=true`. | v6.2.0 | +| POST | `/api/admin/pii/erase/:sessionId` | — | Erase PII mappings for session (GDPR Art. 17). Real values permanently lost; pseudonyms remain in reports as opaque tokens. | v6.2.0 | + +### Reconciliation + +| Method | Path | Body | Purpose | Shipped | +|---|---|---|---|---| +| POST | `/api/admin/sessions/:sessionKey/rebuild-kg` | — | Trigger immediate KG rebuild outside the 5-minute reconciliation cycle. Sets `kg_status='building'`, dispatches via `setImmediate`. | v6.7.0 | +| POST | `/api/admin/sessions/:sessionKey/rebuild-artifacts` | — | Trigger immediate artifact rebuild (DOCX/PDF). | v6.7.0 | + +**Common response shape** (all admin endpoints): +```json +{ "ok": true, "intervention_id": 42 } +``` +or `{ "ok": false, "error": "..." }` on failure. + +--- + +## 5. Knowledge Graph Endpoints + +All endpoints below require `kgAccessAudit` middleware (logs to `access_log` with `resource_type='kg'`). + +### `GET /api/db/sessions/:sessionKey/kg/graph` + +**Purpose**: full KG topology for force-graph visualization. +**Response**: +```json +{ + "nodes": [{ "id": "...", "label": "...", "type": "entity|event|...", "properties": {...} }], + "edges": [{ "source": "...", "target": "...", "type": "...", "weight": 1.0 }], + "node_count": 487, + "edge_count": 1023 +} +``` + +### `GET /api/db/sessions/:sessionKey/kg/neighbors/:nodeId` + +**Purpose**: 1-hop neighborhood around a specified node. +**Query params**: `depth=1` (default; `depth=2` allowed for richer expansion). + +### `GET /api/db/sessions/:sessionKey/kg/evolution` + +**Purpose**: temporal sequence of node creations across the session. +**Auth**: `cookieAuthMiddleware` (no `kgAccessAudit` — read-only timeline view). + +### `GET /api/db/sessions/:sessionKey/kg/provenance/:nodeId` + +**Purpose**: full provenance chain for a single KG node — agent + tool + raw text + source_hash. +**Response**: +```json +{ + "node_id": "...", + "provenance_chain": [ + { "agent_type": "...", "tool_name": "...", "raw_text": "...", "source_hash": "sha256:...", "created_at": "..." } + ] +} +``` + +### `GET /api/db/sessions/:sessionKey/kg/raw-sources/:nodeId` + +**Purpose**: drill-down to archived raw source content for a KG node. Used by the "View Raw Sources" button in the Graph tab. +**Response**: array of raw source documents with content, capture timestamp, SHA-256 hash, agent attribution. + +### `POST /api/db/sessions/:sessionKey/kg/build` + +**Auth**: `cookieAuthMiddleware` (admin recommended; not `requireAdmin` for legacy reasons) +**Purpose**: synchronous KG build trigger. Most callers should prefer the async admin endpoint at §4. + +--- + +## 6. Search Endpoints + +### `GET /api/db/search` + +**Purpose**: full-text search across all session reports. +**Query params**: +- `q=...` (required) — search string +- `limit=10` (default 10, max 100) +- `sessionId=...` (optional — restrict to one session) + +**Response**: +```json +{ + "results": [{ "report_id": "...", "session_id": "...", "snippet": "...", "rank": 0.92 }], + "total": 47 +} +``` + +### `GET /api/db/search-semantic` + +**Purpose**: pgvector cosine-similarity search across `report_embeddings` (Gemini `gemini-embedding-001`, 1536 dims). +**Query params**: +- `q=...` (required) — natural-language query (embedded server-side) +- `limit=10` (default 10, max 100) +- `threshold=0.3` (default; cosine similarity floor) +- `sessionId=...` (optional) + +**Response**: +```json +{ + "results": [{ "report_id": "...", "session_id": "...", "chunk": "...", "similarity": 0.87 }], + "total": 23 +} +``` + +### `GET /api/db/search-artifacts` + +**Purpose**: search across artifact metadata (DOCX, PDF outputs). +**Query params**: `q`, `format=docx|pdf|all`, `sessionId`, `limit`. + +--- + +## 7. Reports & Raw Source Access + +### `GET /api/db/sessions/:sessionKey/report/:reportKey` + +**Purpose**: fetch a specific generated report (memo, agent transcript, etc.). +**Response**: `text/markdown` or `application/json` depending on `Accept` header. + +### `GET /api/sessions/:sessionId/raw-sources/:hash` + +**Purpose**: fetch raw archived source by content-addressed SHA-256 hash. Wave 3 raw source archive. +**Response**: raw content bytes (`application/octet-stream`). + +### `GET /api/sessions/:sessionId/raw-sources/:hash/meta` + +**Purpose**: metadata for a raw source (capture timestamp, tool, agent, content_length, content-type). + +### `GET /api/sessions/:sessionId/raw-sources` + +**Purpose**: list all raw sources captured in a session. +**Query params**: `limit=50` (default), `offset=0`, `tool_name=...` (optional filter). + +### `GET /api/sessions/:sessionId/agents/:agentType/sources` + +**Purpose**: raw sources captured by a specific agent type. Useful for per-agent provenance audits. + +--- + +## 8. Document Conversion + +Endpoints require `DOCUMENT_CONVERSION=true`. + +### `POST /api/convert/:sessionId` + +**Auth**: `cookieAuthMiddleware` +**Body**: `{ "format": "docx" | "pdf", "report_key": "memo-final" }` +**Purpose**: trigger Pandoc/Typst conversion. Asynchronous — returns 202 with job ID. +**Response (202)**: +```json +{ "job_id": "uuid", "status": "queued", "estimated_duration_ms": 45000 } +``` + +### `GET /api/convert/status/:sessionId` + +**Purpose**: poll conversion status. +**Response**: +```json +{ "session_id": "...", "jobs": [{ "format": "docx", "status": "complete", "output_path": "..." }] } +``` + +### `GET /api/convert/download/:sessionId/*` + +**Purpose**: stream completed artifact (DOCX/PDF) for download. +**Response**: `application/octet-stream` with `Content-Disposition: attachment`. + +### `POST /api/convert/backfill/:sessionKey` + +**Auth**: `cookieAuthMiddleware` (admin recommended) +**Purpose**: re-run conversion for a session (e.g., after a Pandoc template fix). + +--- + +## 9. Common Response Conventions + +### Status codes + +| Code | Meaning | +|---|---| +| `200 OK` | Success | +| `202 Accepted` | Async job queued (document conversion, KG rebuild) | +| `400 Bad Request` | Malformed input (invalid `sessionKey` regex, missing required body field) | +| `401 Unauthorized` | Missing or invalid session cookie (when `AUTH_ENABLED=true`) | +| `403 Forbidden` | Non-admin attempting admin-scoped action; or deactivated user attempting login | +| `404 Not Found` | Session, report, KG node, or artifact not found | +| `409 Conflict` | Legal-hold active; tombstone refused | +| `500 Internal Server Error` | Unexpected server fault — query the `claude_errors_total` counter for taxonomy | +| `503 Service Unavailable` | Database not configured or pool exhausted | + +### Error envelope + +All non-200 responses follow this shape: +```json +{ "error": "human-readable message", "code": "OPTIONAL_TAXONOMY_CODE" } +``` + +### `sessionKey` format + +`sessionKey` is a UUID-derived identifier with format `^[a-zA-Z0-9_-]{1,75}$` (validated via `SESSION_KEY_RE` regex). Distinct from the internal `session_id` UUID — `sessionKey` is the public-facing slug. + +### `access_log` audit trail + +Every read on `dbFrontendRouter` (all `/api/db/*` and `/api/session/*` endpoints) writes a row to `access_log` with: +- `user_id` (from `cookieAuthMiddleware`) +- `endpoint` (the request path) +- `method` (`GET` / `POST`) +- `ip_address`, `user_agent` +- `response_status` +- `accessed_at` (TIMESTAMPTZ) + +Operators can query the audit trail via `SELECT * FROM access_log WHERE session_id = ...` for SOC 2 / ISO 27001 evidence. + +### Rate limiting + +The server does not currently impose application-level rate limits on operator/regulator endpoints. Network-layer rate limiting via Cloud Run / GCE load balancer is the boundary. Per-domain rate limiters apply to outbound API calls only (see `src/utils/rateLimiter.js`). + +--- + +**Reference docs**: +- v6.8.5 audit-export runbook: `docs/runbooks/v6.8.5-audit-export.md` +- v6.7.0 reconciliation runbook: `docs/runbooks/v6.7.0-session-reconciliation.md` +- Feature flag registry: `docs/feature-flags.md` +- Metrics catalog: `docs/metrics-catalog.md` +- System design v6.8.5: `company-strategy/system-design.md` §14b (admin router) + §14c (v6.7-v6.8.5) diff --git a/super-legal-mcp-refactored/docs/metrics-catalog.md b/super-legal-mcp-refactored/docs/metrics-catalog.md new file mode 100644 index 000000000..f25848c12 --- /dev/null +++ b/super-legal-mcp-refactored/docs/metrics-catalog.md @@ -0,0 +1,305 @@ +# Metrics Catalog + +**Version**: v6.8.5 | **Date**: 2026-05-06 | **Source**: `src/utils/sdkMetrics.js`, `src/config/alertingRules.js`, `prometheus/alerts.yml` + +This catalog enumerates every Prometheus metric, OTel span, and alert rule emitted by the Super Legal MCP server. Operators reference it when building Grafana dashboards, debugging production incidents, or extending the observability surface. + +--- + +## Table of Contents + +1. [Scrape Endpoint & Conventions](#1-scrape-endpoint--conventions) +2. [Request, Stream & Tool Latency Histograms](#2-request-stream--tool-latency-histograms) +3. [Tool Invocation Counters](#3-tool-invocation-counters) +4. [Structured Output & Circuit Breaker](#4-structured-output--circuit-breaker) +5. [Hook Persistence Metrics (v6.8.1)](#5-hook-persistence-metrics-v681) +6. [Code Execution Metrics (v6.8.5)](#6-code-execution-metrics-v685) +7. [Auto-Reconciliation Metrics (v6.7.0)](#7-auto-reconciliation-metrics-v670) +8. [Knowledge Graph & Embedding Metrics](#8-knowledge-graph--embedding-metrics) +9. [Subagent & API Client Metrics](#9-subagent--api-client-metrics) +10. [Document Conversion Metrics](#10-document-conversion-metrics) +11. [Token Usage Counters](#11-token-usage-counters) +12. [Wave 3 Observability Errors](#12-wave-3-observability-errors) +13. [OpenTelemetry Spans](#13-opentelemetry-spans) +14. [Alert Rules](#14-alert-rules) +15. [Deprecation & Migration Notes](#15-deprecation--migration-notes) + +--- + +## 1. Scrape Endpoint & Conventions + +**Endpoint**: `GET /metrics` (Prometheus exposition format, `text/plain; version=0.0.4`) + +**Authentication**: none (rely on network-layer ACLs — Prometheus scrapes from the same VPC; not exposed publicly). + +**Default metrics**: `client.collectDefaultMetrics()` runs once per process via `initSdkMetrics()`. Includes Node process metrics (heap, CPU, GC, file descriptors). + +**Naming convention**: every custom metric is prefixed `claude_*`. Histograms end in `_ms` for milliseconds. Counters end in `_total`. Gauges end in `_state`, `_pending_sessions`, etc. (no fixed suffix). + +**Cardinality discipline**: every label that takes user-supplied or tool-derived values is bounded by an enum. See `KNOWN_TOOL_NAMES` set in `sdkMetrics.js` (line ~321) and `classifyPersistenceFailure()` (line ~266) for the enforcement points. + +--- + +## 2. Request, Stream & Tool Latency Histograms + +| Metric | Type | Labels | Buckets (ms) | Purpose | +|---|---|---|---|---| +| `claude_request_duration_ms` | Histogram | `path`, `model`, `status` | `[50, 100, 250, 500, 1000, 2000, 5000, 10000, 20000]` | Express request duration end-to-end. `metricsMiddleware` records `endpoint` from `req.path`, `model` from `res.locals.model`, `status` from `res.statusCode`. | +| `claude_stream_duration_ms` | Histogram | `path`, `model`, `status` | same as above | Per-SSE-stream lifetime. Records via `recordStreamDuration({path, model, status}, ms)` at end of `/api/stream`. | +| `claude_tool_duration_ms` | Histogram | `tool_name`, `client`, `status` | `[10, 50, 100, 250, 500, 1000, 2500, 5000, 10000, 30000, 60000]` | Wave 1 widened label set. `client` distinguishes which external API actually served (e.g., `direct_fetch` vs `exa_fallback` for `fetch_document`). Cardinality cap: ~50 tools × 6 clients × 3 statuses ≈ 900 series. | + +**Recording functions**: `recordToolDuration(labels, ms)` (Wave 1+) or `recordToolDuration(toolName, status, ms)` (legacy). The `deriveClient(toolName, hybridMetadata)` helper computes the `client` label from response metadata. + +--- + +## 3. Tool Invocation Counters + +| Metric | Type | Labels | Status | Purpose | +|---|---|---|---|---| +| `claude_tool_invocations_total` | Counter | `tool`, `status` | **DEPRECATED** (removed in v6.8.6) | Pre-v6.8.5 invocation counter with unbounded `tool` label. Still emitted during 7-day dual-emission window. | +| `claude_tool_invocations_v2_total` | Counter | `tool_name`, `status` | Active (canonical) | v6.8.5 W5.6 replacement with bounded `tool_name` label via `KNOWN_TOOL_NAMES` enum. Unknown tools fall through to `tool_name='other_tool'`. | + +**Recording function**: `incrementToolInvocation(tool, status='ok')` writes to BOTH counters during the dual-emission window. Migrate Grafana queries from the legacy counter to v2 before v6.8.6 ships. + +--- + +## 4. Structured Output & Circuit Breaker + +| Metric | Type | Labels | Purpose | +|---|---|---|---| +| `claude_structured_output_attempts_total` | Counter | `tool` | Every JSON Schema validation attempt | +| `claude_structured_output_success_total` | Counter | `tool` | Success path | +| `claude_structured_output_failures_total` | Counter | `tool` | Failure path; alerted via `StructuredOutputValidationFailure` | +| `claude_circuit_breaker_trips_total` | Counter | `domain` | Per-domain circuit breaker state changes | +| `claude_errors_total` | Counter | `code`, `path` | Untyped SDK error counter | +| `claude_thinking_blocks_total` | Counter | `path` | Anthropic thinking-block emission count | + +--- + +## 5. Hook Persistence Metrics (v6.8.1) + +Surfaces failures that the `hookDBBridge` wrapper try/catch silently swallowed pre-v6.8.1. + +| Metric | Type | Labels | Purpose | +|---|---|---|---| +| `claude_hook_persistence_failures_total` | Counter | `hook`, `reason` | Hook persistence failures by hook type and bounded reason enum. Cardinality: ~7 hooks × ~10 reasons = ≤70 series. | +| `claude_hook_circuit_breaker_state` | Gauge | `hook` | Hook persistence circuit breaker state: `0=closed`, `1=half-open`, `2=open`. | +| `claude_hook_invocations_total` | Counter | `hook` | Per-hook invocation counter (success path). v6.8.5 W5.8 — closes the "only failure metrics, no success metrics" gap. Cardinality: 9 hooks. | + +**`reason` enum** (from `classifyPersistenceFailure(err)`): +- `unique_violation` (PG SQLSTATE 23505) +- `fk_violation` (23503) +- `not_null_violation` (23502) +- `connection_refused` (Node `ECONNREFUSED`) +- `connection_timeout` (Node `ETIMEDOUT` or message includes `timeout`) +- `dns_failed` (`ENOTFOUND`) +- `pool_error` (message includes `pool`) +- `envelope_non_json` (zod safeParse: not valid JSON) +- `envelope_shape_drift` (zod safeParse: shape mismatch) +- `other_db` (any 2*-prefixed SQLSTATE) +- `unknown` (fallthrough) + +**Recording functions**: `recordPersistenceFailure(hook, reason)`, `setCircuitBreakerState(hook, state)`. + +--- + +## 6. Code Execution Metrics (v6.8.5) + +| Metric | Type | Labels | Purpose | +|---|---|---|---| +| `claude_code_execution_failures_total` | Counter | `reason` | W5.7 — code execution failures by bounded reason enum. Reasons: `refusal_detected`, `timeout`, `api_error`, `container_error`, `envelope_parse_error`. | + +**Note**: code execution lifecycle is also tracked via the OTel `code_execution.lifecycle` root span — see §13. + +--- + +## 7. Auto-Reconciliation Metrics (v6.7.0) + +| Metric | Type | Labels | Buckets | Purpose | +|---|---|---|---|---| +| `claude_reconciliation_scans_total` | Counter | `status` | — | Scan iteration outcomes: `ok`, `error`, `skipped`, `skipped_overlap` | +| `claude_reconciliation_rebuilds_total` | Counter | `type`, `status` | — | Per-rebuild attempt. `type`: `kg` \| `artifacts`. `status`: `ok`, `failed`, `skipped_state_changed`, `skipped_breaker`, `skipped_inflight_cap`, `failed_breaker_persistent` | +| `claude_reconciliation_scan_duration_ms` | Histogram | — | `[50, 100, 250, 500, 1000, 2500, 5000, 10000, 30000, 60000, 300000, 900000]` | Scan iteration duration | +| `claude_reconciliation_pending_sessions` | Gauge | `type` | — | Sessions awaiting reconciliation. Updated via `/health` cache (30s TTL). `type`: `kg` \| `artifacts` | + +--- + +## 8. Knowledge Graph & Embedding Metrics + +| Metric | Type | Labels | Buckets | Purpose | +|---|---|---|---|---| +| `claude_kg_build_total` | Counter | `status` | — | KG build attempts. Wave 4.5. | +| `claude_kg_build_duration_ms` | Histogram | `status` | `[1000, 5000, 10000, 30000, 60000, 120000, 300000]` | KG build latency. | +| `claude_embedding_duration_ms` | Histogram | `operation`, `status` | `[50, 100, 250, 500, 1000, 2500, 5000, 10000]` | Embedding operation latency (Gemini calls + pgvector writes). | +| `claude_gate_check_results_total` | Counter | `agent_type`, `status` | — | Gate check results per subagent. | + +--- + +## 9. Subagent & API Client Metrics + +| Metric | Type | Labels | Buckets | Purpose | +|---|---|---|---|---| +| `claude_subagent_duration_ms` | Histogram | `agent_type`, `status` | `[1000, 5000, 10000, 30000, 60000, 120000, 300000, 600000]` | Wave 4 Phase A. Per-subagent execution latency. | +| `claude_api_client_results_total` | Counter | `tool_name`, `fetch_source`, `outcome` | — | Wave 4 Phase A. Tracks native vs Exa-fallback usage by tool. | + +--- + +## 10. Document Conversion Metrics + +| Metric | Type | Labels | Buckets | Purpose | +|---|---|---|---|---| +| `claude_document_conversion_duration_ms` | Histogram | `format`, `status` | `[100, 500, 1000, 2500, 5000, 10000, 30000, 60000]` | Pandoc/Typst conversion latency. | +| `claude_document_conversion_errors_total` | Counter | `format`, `error_type` | — | Conversion errors. | + +--- + +## 11. Token Usage Counters + +| Metric | Type | Labels | Purpose | +|---|---|---|---| +| `claude_tokens_input_total` | Counter | `model` | Input tokens per model | +| `claude_tokens_output_total` | Counter | `model` | Output tokens per model | +| `claude_tokens_cached_total` | Counter | `model` | Cache read tokens (legacy) | +| `claude_cache_read_tokens_total` | Counter | `model` | Cache read input tokens | +| `claude_cache_creation_tokens_total` | Counter | `model` | Cache creation input tokens | + +**Recording function**: `recordTokens({model, input, output, cached, cacheRead, cacheCreation})`. + +--- + +## 12. Wave 3 Observability Errors + +| Metric | Type | Labels | Purpose | +|---|---|---|---| +| `claude_observability_errors_total` | Counter | `error_code`, `module` | Typed errors from `src/utils/errors.js` | +| `claude_backpressure_shed_total` | Counter | `module` | Work items shed due to backpressure | + +--- + +## 13. OpenTelemetry Spans + +OpenTelemetry auto-instrumentation captures Express, HTTP, and pg spans automatically when `OTEL_ENABLED=true`. Manual spans on the raw source pipeline and code execution lifecycle are listed below. + +### 13.1 Wave 3 Manual Spans (v6.2.2) + +Seven manual spans wrap `RawSourceService.persist()`: + +| Span Name | Attributes | +|---|---| +| `raw_source.persist` | `session_id`, `tool_name`, `agent_type`, `content_length` | +| `raw_source.hash` | `hash_method='sha256'`, `content_length` | +| `raw_source.sanitize` | `sanitizer_version`, `redactions_count` | +| `raw_source.dedup` | `cache_hit`, `existing_source_id` | +| `raw_source.pool_write` | `pool_state`, `connection_acquire_ms` | +| `raw_source.manifest_append` | `manifest_path`, `bytes_written` | +| `raw_source.gcs_archive` | `bucket`, `object_path`, `worm_class` | + +### 13.2 Code Execution Lifecycle Span (v6.8.5 W5.1) + +| Span Name | Attributes | +|---|---| +| `code_execution.lifecycle` | `model_id`, `system_prompt_hash`, `python_code_hash`, `container_id`, `git_sha`, `sdk_version`, `turn_count`, `pause_count`, `chart_count`, `input_tokens`, `output_tokens`, `cache_read_tokens`, `cache_creation_tokens`, `stop_reason`, `refusal_detected` | + +Wraps the full multi-turn execution including pause-turn continuations. Sampler config: +- `OTEL_TRACES_SAMPLER=parentbased_traceidratio` (default) +- `OTEL_TRACES_SAMPLER_ARG=0.1` (10% sampling — v6.8.5 W5.1; tunable to bound Cloud Trace cost) + +--- + +## 14. Alert Rules + +Defined in `prometheus/alerts.yml` (operator copy-paste source) and `src/config/alertingRules.js` (documentation export). The alert rules below are the Prometheus YAML; the JS export mirrors them. + +### 14.1 Tool & Latency Alerts + +| Alert | Severity | Expression | For | Description | +|---|---|---|---|---| +| `ClaudeToolErrorRateHigh` | warning | `rate(claude_tool_invocations_v2_total{status="error"}[5m]) / rate(claude_tool_invocations_v2_total[5m]) > 0.05` | 5m | Tool error rate above 5%. Migrated to v2 counter in v6.8.5. | +| `ClaudeLatencyRegression` | warning | `histogram_quantile(0.95, sum(rate(claude_request_duration_ms_bucket[5m])) by (le)) > 10000` | 10m | P95 request latency above 10s | +| `StructuredOutputValidationFailure` | critical | `rate(claude_structured_output_failures_total[5m]) / rate(claude_structured_output_attempts_total[5m]) > 0.02` | 5m | Structured output validation failure rate above 2% | +| `CircuitBreakerTripping` | critical | `increase(claude_circuit_breaker_trips_total[15m]) > 3` | 1m | Circuit breaker tripped 3+ times in 15 minutes | +| `RateLimitExhaustion` | warning | `sum(rate(claude_errors_total{code="RATE_LIMIT_ERROR"}[5m])) > 10` | 5m | Rate limit errors above 10/min | + +### 14.2 Reconciliation Alerts (v6.7.0) + +| Alert | Severity | Expression | For | Description | +|---|---|---|---|---| +| `ReconciliationKgBacklog` | warning | `claude_reconciliation_pending_sessions{type="kg"} > 50` | 10m | KG reconciliation backlog above 50 sessions | +| `ReconciliationKgCritical` | critical | `claude_reconciliation_pending_sessions{type="kg"} > 100` | 5m | KG reconciliation backlog critical (>100 sessions) — loop draining slower than ingest | +| `ReconciliationArtifactsBacklog` | warning | `claude_reconciliation_pending_sessions{type="artifacts"} > 50` | 10m | Artifacts reconciliation backlog above 50 sessions | +| `ReconciliationScanSlow` | warning | `histogram_quantile(0.95, sum(rate(claude_reconciliation_scan_duration_ms_bucket[1h])) by (le)) > 900000` | 15m | Reconciliation scan P95 above 15 min — likely 15-min Promise.race timeouts firing | +| `ReconciliationScanErrors` | warning | `rate(claude_reconciliation_scans_total{status="error"}[1h]) > 0.0003` | 30m | Reconciliation scan error rate above ~1/hour | + +### 14.3 Hook Persistence Alerts (v6.8.2) + +| Alert | Severity | Expression | For | Description | +|---|---|---|---|---| +| `HookPersistenceFailures` | warning | `sum by (hook, reason) (rate(claude_hook_persistence_failures_total{reason!="unknown"}[5m])) > 0` | 5m | Hook persistence failure (any reason). Per-hook + per-reason labels exposed in alert. | +| `HookCircuitBreakerOpen` | critical | `max by (hook) (claude_hook_circuit_breaker_state) >= 2` | 2m | Hook circuit breaker open — persistence is being skipped, rows are being lost. 2m threshold absorbs cold-start churn during rolling deploys. | +| `HookEnvelopeShapeDrift` | critical | `sum(rate(claude_hook_persistence_failures_total{reason="envelope_shape_drift"}[5m])) > 0` | 1m | Tool envelope shape drift — likely cause: SDK upgrade or upstream API field rename. Update the schema (not the test mock). 1m TTL because silent data loss starts immediately on drift. | + +--- + +## 15. Deprecation & Migration Notes + +### 15.1 `claude_tool_invocations_total` → `_v2_total` (v6.8.5 → v6.8.6) + +**Status**: 7-day dual-emission window. Both counters increment until v6.8.6 ships, then the legacy counter is removed. + +**Why**: pre-v6.8.5 the `tool` label accepted any string the caller passed, leading to unbounded label cardinality on Grafana. The v2 counter uses a bounded `tool_name` enum (`KNOWN_TOOL_NAMES` set in `sdkMetrics.js`); unknown tools fall through to `'other_tool'`. + +**Action for operators**: +1. Update Grafana panels and Prometheus alerts to query `claude_tool_invocations_v2_total{tool_name=...}` instead of `claude_tool_invocations_total{tool=...}`. +2. The `ClaudeToolErrorRateHigh` alert in `alertingRules.js` and `prometheus/alerts.yml` was updated in v6.8.5 W5.6 — already on the v2 counter. +3. Verify `KNOWN_TOOL_NAMES` covers your top-50 tools by query volume; unknown tools collapse to `tool_name='other_tool'`. + +### 15.2 Future cardinality watch + +Three labels are bounded today but worth monitoring as the platform grows: +- `claude_tool_duration_ms{tool_name, client, status}` — currently ≈900 series; new clients (DirectFetch v2, FMP-secondary, etc.) widen the matrix +- `claude_hook_persistence_failures_total{hook, reason}` — capped at ~70 series; if new hooks land, recount +- `claude_api_client_results_total{tool_name, fetch_source, outcome}` — bounded by `KNOWN_TOOL_NAMES` × ~3 sources × ~5 outcomes + +If any series count exceeds 5000 in production, audit label sources before the Prometheus TSDB pressure becomes operationally visible. + +--- + +## Appendix: Recording Function Quick Reference + +| Function | Records to | +|---|---| +| `recordStreamDuration({path, model, status}, ms)` | `claude_stream_duration_ms` | +| `recordToolDuration(labels, ms)` | `claude_tool_duration_ms` | +| `incrementToolInvocation(tool, status)` | `claude_tool_invocations_total` (legacy) + `_v2_total` (during dual-emission) | +| `recordStructuredOutputAttempt(tool)` | `claude_structured_output_attempts_total` | +| `recordStructuredOutputSuccess(tool)` | `claude_structured_output_success_total` | +| `recordStructuredOutputFailure(tool)` | `claude_structured_output_failures_total` | +| `recordCircuitBreakerTrip(domain)` | `claude_circuit_breaker_trips_total` | +| `recordError(code, path)` | `claude_errors_total` | +| `recordThinkingBlock(path)` | `claude_thinking_blocks_total` | +| `recordSubagentDuration(agent_type, status, ms)` | `claude_subagent_duration_ms` | +| `recordApiClientResult(tool, fetch_source, outcome)` | `claude_api_client_results_total` | +| `recordDocumentConversion(format, status, ms)` | `claude_document_conversion_duration_ms` | +| `recordDocumentConversionError(format, error_type)` | `claude_document_conversion_errors_total` | +| `recordTokens({model, input, output, cached, cacheRead, cacheCreation})` | 5 token counters | +| `recordObservabilityError(error_code, module)` | `claude_observability_errors_total` | +| `recordBackpressureShed(module)` | `claude_backpressure_shed_total` | +| `recordEmbeddingDuration(operation, status, ms)` | `claude_embedding_duration_ms` | +| `recordGateCheckResult(agent_type, status)` | `claude_gate_check_results_total` | +| `recordKgBuild(status)` | `claude_kg_build_total` | +| `recordKgBuildDuration(status, ms)` | `claude_kg_build_duration_ms` | +| `recordReconciliationScan(status)` | `claude_reconciliation_scans_total` | +| `recordReconciliationRebuild(type, status)` | `claude_reconciliation_rebuilds_total` | +| `recordReconciliationScanDuration(ms)` | `claude_reconciliation_scan_duration_ms` | +| `setReconciliationPending(type, count)` | `claude_reconciliation_pending_sessions` | +| `recordPersistenceFailure(hook, reason)` | `claude_hook_persistence_failures_total` | +| `setCircuitBreakerState(hook, state)` | `claude_hook_circuit_breaker_state` | +| `classifyPersistenceFailure(err)` | (helper — returns bounded `reason` enum) | + +--- + +**Reference docs**: +- Operator audit-export runbook: `docs/runbooks/v6.8.5-audit-export.md` +- Reconciliation runbook: `docs/runbooks/v6.7.0-session-reconciliation.md` +- Feature flag registry: `docs/feature-flags.md` §31a (OTel sampler) +- System design v6.8.5: `company-strategy/system-design.md` §14c