Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 123 additions & 1 deletion super-legal-mcp-refactored/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,46 @@

All notable changes to the Super Legal MCP Server are documented in this file.

## [6.8.0] - 2026-05-02equity-analyst integration (FMP)
## [6.8.5.1] - 2026-05-06Deferred documentation patch (PR #93)

### Added
- `docs/metrics-catalog.md` (NEW, 305 lines) — comprehensive Prometheus + OTel reference enumerating 33 metrics across 12 categories, 13 alert rules, 8 OpenTelemetry manual spans (7 Wave 3 raw_source spans + `code_execution.lifecycle` root span), with recording-function quick reference for code authors. Source: `src/utils/sdkMetrics.js`, `src/config/alertingRules.js`, `prometheus/alerts.yml`.
- `docs/api-reference.md` (NEW, 465 lines) — operator/regulator-facing endpoint catalog covering 40+ routes across 8 categories: health/metrics, authentication, audit & compliance (audit-export + transcript), admin governance (user lifecycle + session governance + reconciliation), knowledge graph (graph/neighbors/evolution/provenance/raw-sources), search (full-text + semantic + artifacts), reports & raw sources, document conversion. Two-layer auth model documented (`cookieAuthMiddleware` + `requireAdmin`) plus access-audit middleware coverage. Common Response Conventions appendix.

### Changed
- `company-strategy/system-design.md` — refreshed v6.2.3 → v6.8.5. New §14c "v6.7–v6.8.5 — Reconciliation, Transcript Persistence & Wave 5 Compliance Machinery" with 9 subsections covering every release between Wave 3 and the audit-export endpoint. Topology counts updated throughout: 42→45 subagents, 36→38 hybrid clients, 149+→197 tools.
- `company-strategy/gtm-positioning-strategy.md` — preserves April 2026 institutional voice. Topology counts updated. New §3.8 Compliance & Audit Posture (EU AI Act Art. 12-15 + GDPR Art. 17/20/32 + SEC 17a-4 mapping tables). New §3.9 The Equity Research Layer (FMP integration narrative).
- `company-strategy/gtm-sales-playbook.md` — Step 4 Output Delivery adds audit-export bundle as closing artifact. §10.1 split into Wave 3 + Wave 5 artifact sections. New §10.2 Common Regulatory Objections (4 scripted Q&A items for EU AI Act, GDPR, reproducibility, hallucinated citations).

### Out of scope (deferred to v6.8.6)
- `docs/database-enhancements/database-enhancements.md` schema refresh (still v1.1.0)
- `CLAUDE.md` creation
- Field-level encryption for `python_code` column
- P1 brand-doc count sweep (`website-spec.md`, `zee-super-legal-strategy-brand.md`, `aperture-ai-governance-positioning.md`)

## [6.8.5] - 2026-05-06 — Wave 5 compliance machinery + FMP equity-analyst (PR #92)

Two independent scopes shipped together as PR #92: regulator-facing transparency machinery (Phase 2A–2D + W5.1–W5.10) and the FMP equity-analyst standalone subagent integration. Documented separately below for clarity; both land at `main` HEAD `98a1a406`.

### Added — Wave 5 compliance machinery (Phase 2A–2D, W5.1–W5.10)

**Citation source bridge** (`citation_source_links` table + `src/utils/citationParser.js`) — closes the loop from memo footnote → archived raw source. Fuzzy URL/title matching with confidence scores; supports exact URL match, fuzzy URL, fuzzy title, and embedding cosine match modes. Answers "show me the source for footnote 47" in milliseconds.

**GDPR Article 17 erasure boundary** (`redactSessionEventData()` in `src/utils/retentionManager.js`) — UPDATE not DELETE; overwrites JSONB content paths to `[REDACTED]` while preserving row structure for SEC 17a-4 metadata retention. Idempotent. Wired at offboarding-time via `client-offboarding` skill (Step 6.5: redact before `gcloud sql export`), NOT at admin read-time. Regulators querying the audit-export endpoint require complete data; redaction is the erasure boundary, not the access boundary.

**Regulator-facing audit-export endpoint** (`GET /api/session/:sessionKey/audit-report`) — aggregates the complete audit trail per session: `code_executions` with all 13 traceability fields + `python_code_length`; `code_execution_inputs` data lineage counts; `hook_audit_log` event sequence with `bridge_metadata` per code execution; `human_interventions` (Wave 3 admin actions); `access_log` (Wave 3 evidence-read trail); `citation_source_links` (memo cell → source). Auth: `cookieAuthMiddleware` server-wide + `createAccessAuditMiddleware('session_data')` router-wide. Format: JSON default, `?format=csv` for CSV.gz. Designed for EU AI Act Article 13 transparency demands.

**OTel `code_execution.lifecycle` root span** (Phase 2C / W5.1, `src/tools/codeExecutionBridge.js`) — wraps the full multi-turn execution including pause-turn continuations. Span attributes include all 13 traceability fields plus turn count, pause count, chart count, refusal_detected.

**Sampler config** (W5.1) — `OTEL_TRACES_SAMPLER=parentbased_traceidratio` + `OTEL_TRACES_SAMPLER_ARG=0.1` (10% sampling default) to bound Cloud Trace cost. Tunable via Cloud Run revision env. Plumbed through `flags.env` → `deploy.sh` → container `--container-env`.

**Zod tool envelope schemas** (Phase 2B, `src/schemas/toolEnvelopes.js`) — five strict schemas for the highest-volume tools (`run_python_analysis`, `generate_chart`, `search_sec_filings`, `get_court_opinions`, `analyze_patent`). Each tool's input is validated before persistence; mismatches emit `claude_hook_persistence_failures_total{reason='envelope_drift'}`. Drift canary for Anthropic SDK upgrades.

**Subagent CAPABILITY constants** (Phase 2B) — instrumented across 7 subagents. Each declares its tool surface, output schema, and `data_provenance` claim. Surfaces in `code_execution_inputs` for full lineage queries.

**Operator runbook** (W5.10, `docs/runbooks/v6.8.5-audit-export.md`) — endpoint usage, response shape, PII handling, escalation paths, CSV format reference.

### Added — FMP equity-analyst integration

New `equity-analyst` subagent + 36-tool FMP client + 11 code-execution models, gated behind `FMP_ENABLED=false` feature flag (production activation requires FMP Enterprise contract + Data Display & Redistribution Agreement, expected 4–8 weeks parallel commercial track).

Expand Down Expand Up @@ -64,6 +101,91 @@ These skill updates are global to the project's Claude config — not part of th
- Production activation (FMP_ENABLED=true) gated behind contract signing
- Cross-quarter transcript embedding via existing embeddingService.js pipeline

### Changed (Wave 5)
- Metric `claude_tool_invocations_total{tool}` deprecated in favor of `claude_tool_invocations_v2_total{tool_name}` with bounded `KNOWN_TOOL_NAMES` enum (prevents cardinality explosion). 7-day dual-emission window before removal in v6.8.6.
- `alertingRules.js` `ClaudeToolErrorRateHigh` migrated to v2 counter (W5.6).
- `dbFrontendRouter.js:1215` audit-export endpoint comment corrected (PII redaction fires at offboarding-time, not access-time — regulators require complete data).

## [6.8.4] - 2026-05-06 — Code execution traceability for regulator audit

### Added — 13 reproducibility columns on `code_executions` (commits `40d4a01b`, `9227bd55`)

Every `run_python_analysis` execution is now byte-replayable from the audit log alone. Pre-v6.8.4 the bridge logged success/failure and chart count; the actual prompt, generated code, model identity, and runtime environment were ephemeral. EU AI Act Article 12 (logging) and Article 15 (reproducibility) require the complete picture.

**Schema additions** (via `ALTER TABLE ADD COLUMN IF NOT EXISTS` per dual-path convention; migration `013_code-executions-model-id.up.sql` + runtime DDL in `src/db/postgres.js`):
- `model_id` — exact `claude-sonnet-4-6-...` revision
- `llm_name` — provider identity (e.g., `anthropic`)
- `anthropic_request_id` — server-side correlation ID for replay against Anthropic logs
- `input_tokens`, `output_tokens` — per-execution token counts
- `cache_read_tokens`, `cache_creation_tokens` — prompt-caching metrics
- `system_prompt_hash` — SHA-256 of the system prompt (detects prompt drift)
- `python_code` — full generated Python source (TEXT, no cap; bounded by 15-block multi-turn limit)
- `python_code_hash` — SHA-256 for deduplication
- `container_id` — Anthropic code-execution sandbox identifier
- `tool_use_id` — exact correlation to PostToolUse hook
- `stop_reason` — `end_turn` | `pause_turn` | `refusal` | `max_tokens`
- `refusal_detected` — boolean

**Data lineage junction** — `code_execution_inputs` table links each execution to upstream subagent reports/embeddings/KG nodes. Enables data-lineage queries ("which subagent's output drove this DCF result?").

**`bridge_metadata` JSONB column on `hook_audit_log`** — captured at PostToolUse time with `{git_sha, sdk_version, container_id, system_prompt_hash}`. Combined with `python_code` from `code_executions`, gives a complete reproducibility envelope.

**`COMMIT_SHA` build arg** (`Dockerfile`, `src/utils/buildVersion.js`):
```dockerfile
ARG COMMIT_SHA=unknown
ENV COMMIT_SHA=${COMMIT_SHA}
```
The `deploy` and `client-provisioner` skills now invoke `docker build --build-arg COMMIT_SHA=$(git rev-parse HEAD)` automatically. Without it, `bridge_metadata.git_sha='unknown'` (graceful but information-poor for regulator replay).

### Added — `idx_audit_session_tool` index (`migrations/014_audit-log-model-id-index.up.sql`)

Composite index on `hook_audit_log(session_id, tool_name)` to support regulator query patterns from the audit-export endpoint (W5.10). Non-CONCURRENT `CREATE INDEX IF NOT EXISTS` — brief AccessExclusiveLock during boot. Watch first restart on busy production DB; switch to `CREATE INDEX CONCURRENTLY` follow-up if blocked >5s.

### Fixed — Post-audit polish (4 P1–P4 findings, commit `9227bd55`)

Closes audit findings from the v6.8.4 code review. Documentation-grade fixes plus one operational hardening: `bridge_metadata` crash-path coverage (now logged on bridge faults, not just on success), `tool_use_id` exact correlation column added (was inferred), CI workflow path filter extended for new files (`Dockerfile`, `flags.env`).

## [6.8.2] - 2026-05-05 — Schema bootstrap split + alert rules + durability

### Fixed — `BACKFILL_RECONCILIATION_STATUS_DDL` IF EXISTS gate (commit `6eb60f71`)

Pre-v6.8.2 a fresh-DB boot would throw because `BACKFILL_RECONCILIATION_STATUS_DDL` referenced `kg_status` before the column was created on first provisioning. Wraps the UPDATE in an `information_schema.tables` IF EXISTS gate; idempotent on both fresh and existing databases. Closes new-client-provisioning hang first observed during a hypothetical `client-provisioner` skill cold start.

### Added — 3 Prometheus alert rules

- `HookPersistenceFailures` (warning, 5m) — fires on any non-`unknown` reason in `claude_hook_persistence_failures_total{reason!="unknown"}`. Per-hook + per-reason labels exposed in alert.
- `HookCircuitBreakerOpen` (critical, 2m) — fires when `claude_hook_circuit_breaker_state >= 2` (open). Persistence is being skipped; rows are being lost. 2m threshold absorbs cold-start churn during rolling deploys.
- `HookEnvelopeShapeDrift` (critical, 1m) — fires on `claude_hook_persistence_failures_total{reason="envelope_shape_drift"}`. Likely cause: SDK upgrade or upstream API field rename. Update the schema (not the test mock). 1m TTL because silent data loss starts immediately on drift.

### Fixed — 6 fire-and-forget durability gaps via `backgroundTasks` Set (commit `55d2e03a`)

Six in-flight async writes were not registered with the `backgroundTasks` Set introduced in v6.6.0, meaning SIGTERM mid-flight could lose them. Fixed: kg-build daemon, embedding daemon, transcript flush, citation persistence, audit log INSERT, archive-old-sessions sweep. Confirms graceful-shutdown semantics across every async write path.

## [6.8.1] - 2026-05-05 — Hook persistence metrics + envelope validation + durability fix

### Added — Prometheus persistence observability (commit `6fd7832a`)

Surfaces failures that the `hookDBBridge` wrapper try/catch (line ~1505) silently swallowed via `console.warn`-only pre-v6.8.1. Day 6.H's 30s test hang was the canary; in production the same class of regression could drop rows for hours with no metric, no alert, no visibility.

- `claude_hook_persistence_failures_total{hook, reason}` — bounded enum: `unique_violation`, `fk_violation`, `not_null_violation`, `connection_refused`, `connection_timeout`, `dns_failed`, `pool_error`, `envelope_non_json`, `envelope_shape_drift`, `other_db`, `unknown`. Cardinality cap: ~7 hooks × ~10 reasons = ≤70 series.
- `claude_hook_circuit_breaker_state{hook}` gauge — 0=closed, 1=half-open, 2=open.
- `classifyPersistenceFailure(err)` helper — maps PG SQLSTATE (23505/23503/23502/...) + Node syscall codes (`ECONNREFUSED`/`ETIMEDOUT`/`ENOTFOUND`) + common message patterns to the bounded enum.
- `recordPersistenceFailure(hook, reason)` and `setCircuitBreakerState(hook, state)` exports.

### Added — Zod envelope validator on `run_python_analysis`

Replaces the empty `catch { /* non-JSON or malformed */ }` at the envelope-lift with `safeParse` + structured warn + metric counter (`envelope_non_json` | `envelope_shape_drift`). `runPythonAnalysisEnvelopeSchema` mirrors all 14 fields the bridge sets across success/refusal/timeout paths; `.passthrough()` permits forward-compat fields. Fallback merges raw parsed object on schema drift (preserves Day 6.G hotfix intent: better partial data than no data). Stdout warns throttled once-per-30s-per-reason; metric counter unthrottled.

### Fixed — `persistCodeExecution` SIGTERM durability

Changed `persistCodeExecution` from `.catch()` fire-and-forget to awaited try/catch. Phase 1 audit (3 Explore agents) confirmed the asymmetry was accidental, introduced in commit `182f7083` with no justifying comment. Profile: ~50ms scalar INSERTs, no I/O. Cloud Run SIGTERM does NOT flush `.catch()`-only writes (the `backgroundTasks` Set never registered them); awaiting matches every other `persist*()` in the dispatcher. CircuitBreaker still bounds latency under DB stress.

**Verification**: 5/5 e2e tests pass in <250ms. 28/28 domain MCP tests pass. Manual `/metrics` scrape confirms counter + gauge render with bounded labels.

### Added — Post-audit documentation polish (commit `66f4ad6b`)

Documentation-only follow-up addressing review findings on the metrics + envelope work above. Zero runtime impact.

## [6.7.3] - 2026-04-28

### Changed — Final source-level emoji suppression (executive-memo aesthetic, pt. 3)
Expand Down