From 93ec5cd043b284358489d8273fb19cf9994db32c Mon Sep 17 00:00:00 2001 From: Number531 <120485065+Number531@users.noreply.github.com> Date: Wed, 6 May 2026 23:41:39 -0400 Subject: [PATCH 1/6] =?UTF-8?q?docs(skill/infra-health):=20v7.0.1=20alignm?= =?UTF-8?q?ent=20=E2=80=94=209=20alerts,=205=20metrics,=203=20tables?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Phase A1 of v7.0.1 skill alignment. prometheus-alerts.md (33 → 165 lines, full rewrite): - Documents all 13 alert rules: 5 tool/latency, 3 hook persistence (HookPersistenceFailures/HookCircuitBreakerOpen/HookEnvelopeShapeDrift), 5 reconciliation - Documents all v7.0.0 metrics: claude_hook_persistence_failures_total (10-value reason enum), claude_hook_circuit_breaker_state, claude_ code_execution_failures_total, claude_hook_invocations_total, claude_tool_invocations_v2_total - v7.0.0 table health probes: transcript_events FK integrity, citation_source_links confidence distribution, code_execution_inputs lineage row count, bridge_metadata.git_sha 'unknown' detection - OTel sampler tuning section (1.0 verification → 0.1 steady-state) - FMP_ENABLED health probe section - Per-alert remediation table postgresql.md: - Pool max 10 → 15 documented - 3 new tables added: code_execution_inputs (lineage junction), transcript_events (~700KB-1MB per session), citation_source_links - code_executions row updated: 13 reproducibility columns - hook_audit_log row updated: bridge_metadata JSONB + tool_use_id SKILL.md: - Description: 36 → 38 API clients (incl. FMP equity-research, gated) - Tier 3 execution step extended with v7.0.0 metric checks + fetch_source distribution monitoring + OTel sampler env probe No code changes. Documentation only. Co-Authored-By: Claude Opus 4.7 (1M context) --- .claude/skills/infrastructure-health/SKILL.md | 4 +- .../references/postgresql.md | 13 +- .../references/prometheus-alerts.md | 173 +++++++++++++++--- 3 files changed, 162 insertions(+), 28 deletions(-) diff --git a/.claude/skills/infrastructure-health/SKILL.md b/.claude/skills/infrastructure-health/SKILL.md index 176ffea4c..eb45a07e7 100644 --- a/.claude/skills/infrastructure-health/SKILL.md +++ b/.claude/skills/infrastructure-health/SKILL.md @@ -2,7 +2,7 @@ name: infrastructure-health description: > Tiered infrastructure health monitoring for Super Legal MCP platform. Monitors GCE instances, - PostgreSQL/pgvector, Anthropic API circuit breakers, 36 API clients, Gemini embedding + PostgreSQL/pgvector, Anthropic API circuit breakers, 38 API clients (incl. FMP equity-research, gated), Gemini embedding service, memory trends, EPO OAuth tokens, Prometheus alerts, session hygiene, API key expiration, Docker image drift, and dependency vulnerabilities. Triggers on: "infrastructure health", "health check", "infra status", "system health", "check infrastructure", "run health checks", @@ -135,7 +135,7 @@ Read these subskill references: - [references/dependency-vulnerabilities.md](references/dependency-vulnerabilities.md) — npm audit ### Execution -1. Fetch `/metrics` and check for circuit breaker trips, high error rates. Wave 4 metrics to verify: `claude_subagent_duration_ms`, `claude_api_client_results_total` (check for `outcome="zero_results"`), `claude_document_conversion_duration_ms`, `claude_document_conversion_errors_total`, `claude_embedding_duration_ms`, `claude_gate_check_results_total`, `claude_kg_build_total` (check for `status="error"` or `status="skipped_breaker"`), `claude_kg_build_duration_ms` +1. Fetch `/metrics` and check for circuit breaker trips, high error rates. Wave 4 metrics to verify: `claude_subagent_duration_ms`, `claude_api_client_results_total` (check for `outcome="zero_results"` and `fetch_source` distribution — `exa_fallback` dominating for FMP tools indicates `FMP_API_KEY` issues), `claude_document_conversion_duration_ms`, `claude_document_conversion_errors_total`, `claude_embedding_duration_ms`, `claude_gate_check_results_total`, `claude_kg_build_total` (check for `status="error"` or `status="skipped_breaker"`), `claude_kg_build_duration_ms`. **v7.0.0 metrics to verify**: `claude_hook_persistence_failures_total` (any non-`unknown` reason = data loss vector), `claude_hook_circuit_breaker_state` (any value ≥2 = persistence skipping), `claude_code_execution_failures_total` by reason, `claude_hook_invocations_total` (success path counter — should grow during active sessions), `claude_tool_invocations_v2_total` (replaces deprecated v1; verify both still emitting during dual-emission window). **OTel sampler check**: container env `OTEL_TRACES_SAMPLER_ARG` — `1.0` indicates verification window, `0.1` is steady-state. See `references/prometheus-alerts.md` for full alert rule + remediation table. 2. Run `scripts/pg-health.sh` for session hygiene and table sizes 3. Calculate days until SAM_GOV_API_KEY expiry (set 2026-02-11, 90-day lifetime → ~2026-05-12) 4. Run `scripts/docker-drift.sh` (requires gcloud auth — skip gracefully if unavailable) diff --git a/.claude/skills/infrastructure-health/references/postgresql.md b/.claude/skills/infrastructure-health/references/postgresql.md index 3d8623615..33a01ab50 100644 --- a/.claude/skills/infrastructure-health/references/postgresql.md +++ b/.claude/skills/infrastructure-health/references/postgresql.md @@ -1,7 +1,10 @@ # PostgreSQL Health — Subskill Reference +**Version**: v7.0.1 (2026-05-06) + ## Connection -- Pool max: `PG_POOL_MAX` env (default: 10) +- Pool max: `PG_POOL_MAX` env (default: **15** — bumped from 10 in v7.0.0 for 33% burst margin during simultaneous live stream + 3-rebuild reconciliation + transcript flush) +- `statement_timeout`: 120,000 ms (preserved — extending was found unnecessary and risky during v6.8.0 audit) - Connection string: `PG_CONNECTION_STRING` or `DATABASE_URL` - Extension: pgvector (required when `EMBEDDING_PERSISTENCE=true`) @@ -10,11 +13,15 @@ |-------|---------|----------------| | sessions | Session tracking | 1 row per pipeline run | | reports | Report versions | ~10-20 rows per session | -| hook_audit_log | Agent activity audit | 100-500 rows per session | +| hook_audit_log | Agent activity audit | 100-500 rows per session; v7.0.0 adds `bridge_metadata` JSONB + `tool_use_id` columns | +| code_executions | Per-`run_python_analysis` execution audit | 1 row per code execution; **v7.0.0 adds 13 reproducibility columns** (model_id, llm_name, anthropic_request_id, input/output/cache tokens, system_prompt_hash, python_code, python_code_hash, container_id, tool_use_id, stop_reason, refusal_detected) | +| code_execution_inputs | **v7.0.0** — data lineage junction linking each code execution to upstream subagent reports/embeddings/KG nodes | 1-5 rows per code execution | +| transcript_events | **v7.0.0** — full-fidelity SSE event capture (`migrations/012_transcript-events.up.sql`); buffered batch insert | ~4,000-6,000 rows per 30-50 min session; **~700KB-1MB storage per session** | +| citation_source_links | **v7.0.0** — citation→source bridge with fuzzy matching (URL exact / URL fuzzy / title fuzzy / embedding cosine) + confidence score | 1 row per memo footnote matched | | report_embeddings | pgvector embeddings | ~50-100 chunks per report | | agent_states | Agent lifecycle | ~40 rows per session | | source_writes | Wave 3 WAL — raw source persistence reconciliation | 1 row per raw source capture; hourly reconciler | -| access_log | Wave 3 — EU AI Act Art. 12 read-side audit | 1 row per `/api/sessions/:id/*` read (fire-and-forget) | +| access_log | Wave 3 — EU AI Act Art. 12 read-side audit | 1 row per `/api/sessions/:id/*` read (fire-and-forget); v7.0.0 audit-export reads also logged | | human_interventions | Wave 3 — EU AI Act Art. 14 operator governance audit | 0-5 rows per session (admin actions only) | | pii_mappings | Wave 3 — GDPR Art. 17 pseudonymization backing store | 0-N per session when PII detected | diff --git a/.claude/skills/infrastructure-health/references/prometheus-alerts.md b/.claude/skills/infrastructure-health/references/prometheus-alerts.md index a49a4c590..ed36f698d 100644 --- a/.claude/skills/infrastructure-health/references/prometheus-alerts.md +++ b/.claude/skills/infrastructure-health/references/prometheus-alerts.md @@ -1,33 +1,160 @@ # Prometheus Alert Review — Subskill Reference +**Version**: v7.0.1 (2026-05-06) | **Source**: `super-legal-mcp-refactored/prometheus/alerts.yml`, `src/utils/sdkMetrics.js`, `src/config/alertingRules.js` + ## Metrics Endpoint -`GET /metrics` on the server (same port 3001, or METRICS_PORT if configured) -## Key Alerts (from prometheus/alerts.yml) -| Alert | Condition | Duration | Severity | -|-------|-----------|----------|----------| -| ClaudeToolErrorRateHigh | Tool error rate >5% | 5m | warning | -| ClaudeLatencyRegression | P95 latency >10s | 10m | warning | -| StructuredOutputValidationFailure | Output failures >2% | 5m | critical | -| CircuitBreakerTripping | >3 trips in 15m | 1m | critical | -| RateLimitExhaustion | Rate limit errors >10/min | 5m | warning | +`GET /metrics` on the server (port 3001 in production, or `METRICS_PORT` if configured). Prometheus exposition format (`text/plain; version=0.0.4`). Authentication: none (network-layer ACL — Prometheus scrapes from same VPC). + +For full metric inventory see `super-legal-mcp-refactored/docs/metrics-catalog.md` (33 metrics across 12 categories). This reference focuses on what to scrape during Tier 3 health checks. + +## Alert Rules (13 total) + +### Tool & latency alerts (5, pre-v7.0.0) + +| Alert | Condition | Duration | Severity | Remediation | +|---|---|---|---|---| +| `ClaudeToolErrorRateHigh` | `rate(claude_tool_invocations_v2_total{status="error"}[5m]) / rate(claude_tool_invocations_v2_total[5m]) > 0.05` | 5m | warning | Identify failing tool via `{tool_name}` label. Check native API health via api-client-sweep | +| `ClaudeLatencyRegression` | `histogram_quantile(0.95, claude_request_duration_ms_bucket) > 10000` | 10m | warning | Check Anthropic API circuit breaker, network latency, recent SDK upgrade | +| `StructuredOutputValidationFailure` | `rate(claude_structured_output_failures_total[5m]) / rate(claude_structured_output_attempts_total[5m]) > 0.02` | 5m | critical | Schema validation rejecting LLM output. Usually transient. If persistent, check tool schema drift | +| `CircuitBreakerTripping` | `increase(claude_circuit_breaker_trips_total[15m]) > 3` | 1m | critical | Correlate `{domain}` label with API client sweep. Check upstream API status | +| `RateLimitExhaustion` | `sum(rate(claude_errors_total{code="RATE_LIMIT_ERROR"}[5m])) > 10` | 5m | warning | Anthropic-side rate limit. Check session concurrency; consider rpm/tpm bump | + +**Note (v7.0.0/v7.0.1)**: `ClaudeToolErrorRateHigh` was migrated to `claude_tool_invocations_v2_total` per W5.6. Legacy `claude_tool_invocations_total` is in 7-day dual-emission window; will be removed in v7.0.x. + +### Hook persistence alerts (3, v7.0.0 — CRITICAL data loss vectors) + +| Alert | Condition | Duration | Severity | Remediation | +|---|---|---|---|---| +| `HookPersistenceFailures` | `sum by (hook, reason) (rate(claude_hook_persistence_failures_total{reason!="unknown"}[5m])) > 0` | 5m | warning | Check DB pool health, CircuitBreaker state, recent deploys. Per-hook + per-reason labels exposed | +| `HookCircuitBreakerOpen` | `max by (hook) (claude_hook_circuit_breaker_state) >= 2` | 2m | critical | Persistence is being skipped — rows are being lost. Likely DB connectivity. 2m threshold absorbs cold-start churn during rolling deploys | +| `HookEnvelopeShapeDrift` | `sum(rate(claude_hook_persistence_failures_total{reason="envelope_shape_drift"}[5m])) > 0` | 1m TTL | critical | SDK upgrade or upstream API field rename. Update the schema (not the test mock). Short TTL because silent data loss starts immediately on drift | + +### Reconciliation alerts (5, v6.7.0) + +| Alert | Condition | Duration | Severity | Remediation | +|---|---|---|---|---| +| `ReconciliationKgBacklog` | `claude_reconciliation_pending_sessions{type="kg"} > 50` | 10m | warning | Check `/health.reconciliation`, `kg_build_last_error` distribution, kgBreaker state | +| `ReconciliationKgCritical` | `claude_reconciliation_pending_sessions{type="kg"} > 100` | 5m | critical | Loop draining slower than ingest rate. Investigate immediately — possible KG extractor regression or pool exhaustion | +| `ReconciliationArtifactsBacklog` | `claude_reconciliation_pending_sessions{type="artifacts"} > 50` | 10m | warning | Check `artifacts_build_last_error`; investigate document conversion pipeline | +| `ReconciliationScanSlow` | `histogram_quantile(0.95, sum(rate(claude_reconciliation_scan_duration_ms_bucket[1h])) by (le)) > 900000` | 15m | warning | P95 >15min — likely 15-min Promise.race timeouts firing. Check `kg_build_last_error` for `'kg_build_timeout_15min'` | +| `ReconciliationScanErrors` | `rate(claude_reconciliation_scans_total{status="error"}[1h]) > 0.0003` | 30m | warning | Loop throwing — check Cloud Logging for `'[SessionReconciliation] Scan failed'` | ## Key Metrics to Scrape + +### Pre-v7.0.0 (still active) + +``` +claude_circuit_breaker_trips_total{domain} +claude_tool_invocations_total{tool, status} # DEPRECATED — removed in v7.0.x +claude_tokens_input_total{model} +claude_tokens_output_total{model} +claude_errors_total{code, path} +claude_request_duration_ms_bucket +``` + +### v7.0.0 additions (5 new) + +``` +claude_tool_invocations_v2_total{tool_name, status} # bounded enum +claude_hook_persistence_failures_total{hook, reason} # 10-value reason enum +claude_hook_circuit_breaker_state{hook} # 0=closed, 1=half-open, 2=open +claude_code_execution_failures_total{reason} # refusal_detected | timeout | api_error | container_error | envelope_parse_error +claude_hook_invocations_total{hook} # success path counter +``` + +### Reconciliation (v6.7.0) + ``` -claude_circuit_breaker_trips_total # Counter by domain -claude_tool_invocations_total{status="error"} # Tool failure counts -claude_tokens_input_total # Token consumption -claude_tokens_output_total -claude_errors_total{code="..."} # Error breakdown +claude_reconciliation_scans_total{status} +claude_reconciliation_rebuilds_total{type, status} +claude_reconciliation_scan_duration_ms_bucket +claude_reconciliation_pending_sessions{type} ``` ## Check Method -Fetch `/metrics` and parse Prometheus text format. Look for: -1. Any `circuit_breaker_trips_total` value > 0 since last check -2. Error rate: `tool_invocations{status=error}` / `tool_invocations{status=success}` -3. Token consumption trends (cost monitoring) - -## Remediation -- **Tool error rate high**: Identify which tool via `{tool=...}` label. Check if the corresponding native API is down. -- **Structured output failures**: Schema validation rejecting LLM output. Usually transient. If persistent, check if tool schema changed. -- **Circuit breaker trips**: Correlate `{domain=...}` label with API client sweep results. + +Fetch `/metrics` and parse Prometheus text format. For each Tier 3 sweep: + +1. **Circuit breaker trips** — any `claude_circuit_breaker_trips_total > 0` since last check +2. **Tool error rate (v2)** — `claude_tool_invocations_v2_total{status="error"}` / total > 0.05 +3. **Hook persistence failures** — any `claude_hook_persistence_failures_total{reason!="unknown"}` increasing +4. **Hook circuit breaker** — any `claude_hook_circuit_breaker_state >= 2` (open) per hook +5. **Envelope shape drift** — any `claude_hook_persistence_failures_total{reason="envelope_shape_drift"}` > 0 (CRITICAL — fix immediately) +6. **Reconciliation backlog** — `claude_reconciliation_pending_sessions{type="kg"}` > 50 sustained +7. **Code execution failures** — `claude_code_execution_failures_total` by reason — `refusal_detected` is informational, others are operational +8. **Token consumption trends** — input/output/cache token counters for cost monitoring + +## v7.0.0 Table Health Probes + +For Tier 3 deeper checks, probe the new tables via PostgreSQL: + +```sql +-- transcript_events: row count per recent session, FK integrity +SELECT s.session_key, COUNT(t.id) AS event_count +FROM sessions s LEFT JOIN transcript_events t ON s.id = t.session_id +WHERE s.created_at > NOW() - INTERVAL '24 hours' +GROUP BY s.session_key +HAVING COUNT(t.id) = 0 -- sessions with no events = persistence broken +ORDER BY s.created_at DESC LIMIT 10; + +-- citation_source_links: confidence score distribution +SELECT match_method, COUNT(*) AS total, + SUM(CASE WHEN confidence_score < 0.85 THEN 1 ELSE 0 END) AS low_confidence +FROM citation_source_links +WHERE matched_at > NOW() - INTERVAL '24 hours' +GROUP BY match_method; + +-- code_execution_inputs: lineage row count per execution +SELECT ce.id, ce.agent_type, COUNT(cei.id) AS lineage_rows +FROM code_executions ce LEFT JOIN code_execution_inputs cei ON ce.id = cei.execution_id +WHERE ce.created_at > NOW() - INTERVAL '24 hours' +GROUP BY ce.id, ce.agent_type +HAVING COUNT(cei.id) = 0; -- executions with no lineage = CAPABILITY constants not wired + +-- bridge_metadata.git_sha = 'unknown' = COMMIT_SHA build arg missing +SELECT event_data->'bridge_metadata'->>'git_sha' AS git_sha, COUNT(*) +FROM hook_audit_log +WHERE tool_name='run_python_analysis' AND created_at > NOW() - INTERVAL '24 hours' +GROUP BY git_sha; +``` + +## OTel Sampler Tuning + +Container env: `OTEL_TRACES_SAMPLER=parentbased_traceidratio` + `OTEL_TRACES_SAMPLER_ARG=`. + +| Rate | Use case | +|---|---| +| `1.0` (100%) | First-light verification deploys, post-incident triage. Cloud Trace cost scales linearly. **Current value during v7.0.1 verification window.** | +| `0.1` (10%) | Production steady-state default. Bounds Cloud Trace cost; statistical visibility into 1-in-10 sessions. | +| `0.01` (1%) | Very high traffic deployments where 10% overwhelms Cloud Trace quota. | + +**Action**: when v7.0.1 verification completes (FMP first-light + § 8.4.X V1–V4 pass), reduce `OTEL_TRACES_SAMPLER_ARG` from `1.0` back to `0.1` for cost discipline. This is a `flags.env` flip + redeploy. + +## FMP_ENABLED Health Probe (v7.0.0) + +If `FMP_ENABLED=true` in container env: + +1. Verify `FMP_API_KEY` is set in container env (`docker exec env | grep FMP_API_KEY`) +2. Probe rate limiter remaining: `claude_api_client_results_total{tool_name=~"mcp__equities__.*"}` should show `fetch_source="fmp_native"` outcomes +3. If `fetch_source="exa_fallback"` dominates, FMP_API_KEY is invalid or rate-limited +4. § 8.4.X V1–V4 verification protocol — see `super-legal-mcp-refactored/docs/pending-updates/equity-analyst-update.md` + +## Remediation Quick Reference + +- **Tool error rate high (v2)**: Identify tool via `{tool_name}` label. Check native API in api-client-sweep. +- **Hook persistence failures**: Check DB pool, CircuitBreaker state, latest deploys. Reasons map to specific causes (e.g., `connection_timeout` → DB unreachable, `envelope_shape_drift` → SDK upgrade) +- **HookCircuitBreakerOpen**: Persistence skipped. Check DB connectivity. If DB is healthy, check for poison-pill events causing repeated failures +- **HookEnvelopeShapeDrift**: SDK or upstream API changed. Update zod schema in `src/schemas/toolEnvelopes.js`, NOT the test mock +- **Structured output failures**: Schema validation rejecting LLM output. Usually transient. If persistent, check tool schema drift +- **Circuit breaker trips**: Correlate `{domain}` label with api-client-sweep results +- **Reconciliation backlog**: Loop detecting partial sessions but not draining. Check `kg_build_last_error`, `artifacts_build_last_error`, kgBreaker state +- **Reconciliation scan slow**: 15-min Promise.race timeouts firing on rebuilds. Check for KG extractor regression +- **`bridge_metadata.git_sha = 'unknown'`**: COMMIT_SHA build arg missed during last `docker build`. Verify `deploy.sh:54-62` passes `--build-arg COMMIT_SHA=$(git rev-parse HEAD)` and `Dockerfile` has matching `ARG COMMIT_SHA=unknown` + `ENV COMMIT_SHA=${COMMIT_SHA}` + +## Reference docs + +- Full metric inventory: `super-legal-mcp-refactored/docs/metrics-catalog.md` +- Audit-export endpoint runbook: `super-legal-mcp-refactored/docs/runbooks/v6.8.5-audit-export.md` +- Reconciliation runbook: `super-legal-mcp-refactored/docs/runbooks/v6.7.0-session-reconciliation.md` +- Feature flag registry: `super-legal-mcp-refactored/docs/feature-flags.md` §31a/b (OTel sampler, COMMIT_SHA) From e19120f22a958f7937ef53b087d1d2e24c985149 Mon Sep 17 00:00:00 2001 From: Number531 <120485065+Number531@users.noreply.github.com> Date: Wed, 6 May 2026 23:42:29 -0400 Subject: [PATCH 2/6] =?UTF-8?q?docs(skill/session-diagnostics):=20v7.0.1?= =?UTF-8?q?=20alignment=20=E2=80=94=206=20new=20failure=20patterns?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Phase A2 of v7.0.1 skill alignment. Adds 6 new failure patterns covering v7.0.0 schema: - Pattern 10: Transcript replay gap (transcript_events row count=0 for completed session, indicating flush failed or persistence broken) - Pattern 11: Citation low-confidence (citation_source_links rows with score <0.85 for >20% of citations — possible hallucinated citations) - Pattern 12: Code-execution traceability NULL (code_executions rows with NULL model_id/python_code/python_code_hash/anthropic_request_id — regulator audit gap, EU AI Act Art. 15 byte-replay broken) - Pattern 13: bridge_metadata corruption / git_sha='unknown' (indicates COMMIT_SHA build arg missed during last docker build) - Pattern 14: Reconciliation stall (kg_status='building' >10min or reconciliation_attempts>=3 — parked, manual intervention) - Pattern 15: FMP equity-analyst routing failure (orchestrator dispatched equity-analyst but fetch_source='exa_fallback' dominates instead of 'fmp_native' indicating FMP_API_KEY invalid/rate-limited; OR M46-M58 model rows show success=false) Includes ready-to-paste SQL queries for each pattern. Patterns 10-15 align with the § 8.4.X V1-V4 verification protocol from docs/pending-updates/equity-analyst-update.md. No code changes. Documentation only. Co-Authored-By: Claude Opus 4.7 (1M context) --- .claude/skills/session-diagnostics/SKILL.md | 60 +++++++++++++++++++++ 1 file changed, 60 insertions(+) diff --git a/.claude/skills/session-diagnostics/SKILL.md b/.claude/skills/session-diagnostics/SKILL.md index aacce6356..91e0f8e7f 100644 --- a/.claude/skills/session-diagnostics/SKILL.md +++ b/.claude/skills/session-diagnostics/SKILL.md @@ -103,6 +103,66 @@ See `references/failure-patterns.md` for the full catalog. Summary: | 7 | Subagent crash (SubagentStart with no matching SubagentStop) | CRITICAL | | 8 | Empty session (0 reports rows) | INFO | | 9 | Hook audit gaps (audit_log row count < 0.5x reports count) | WARNING | +| 10 | **Transcript replay gap** (v7.0.0): `transcript_events` row count = 0 for completed session OR sessions exists but flush failed (missing late events) | CRITICAL | +| 11 | **Citation low-confidence** (v7.0.0): `citation_source_links.confidence_score < 0.85` for >20% of citations — fuzzy matches flagged for QA, possible hallucinated citations | WARNING | +| 12 | **Code-execution traceability NULL** (v7.0.0): `code_executions` rows with NULL `model_id`, `anthropic_request_id`, `python_code`, or `python_code_hash` — regulator audit gap, EU AI Act Art. 15 byte-replay envelope broken | CRITICAL | +| 13 | **bridge_metadata corruption / missing git_sha** (v7.0.0): `hook_audit_log` rows where `event_data->'bridge_metadata'` is malformed OR `bridge_metadata.git_sha = 'unknown'` indicating COMMIT_SHA build arg missed | WARNING | +| 14 | **Reconciliation stall** (v6.7.0): sessions with `kg_status='building'` >10 minutes OR `reconciliation_attempts >= 3` (parked, manual intervention required) | CRITICAL | +| 15 | **FMP equity-analyst routing failure** (v7.0.0): orchestrator dispatched equity-analyst (visible in `hook_audit_log` SubagentStart with `agent_type='equity-analyst'`) but `claude_api_client_results_total{tool_name LIKE 'mcp__equities__%', fetch_source}` shows `exa_fallback` instead of `fmp_native` (FMP_API_KEY invalid or rate-limited); OR M46–M58 model rows show `success=false` | WARNING | + +### v7.0.0 Diagnostic Queries (operator copy-paste) + +```sql +-- Pattern 10: Transcript replay gap +SELECT s.session_key, s.status, COUNT(t.id) AS event_count +FROM sessions s +LEFT JOIN transcript_events t ON s.id = t.session_id +WHERE s.session_key = '' AND s.status='complete' +GROUP BY s.session_key, s.status; +-- Expected: event_count > 0 (typical 4,000-6,000 events per memo session) + +-- Pattern 11: Citation low-confidence distribution +SELECT match_method, COUNT(*) AS total, + SUM(CASE WHEN confidence_score < 0.85 THEN 1 ELSE 0 END) AS low_conf +FROM citation_source_links csl +JOIN reports r ON csl.report_id = r.id +JOIN sessions s ON r.session_id = s.id +WHERE s.session_key = '' +GROUP BY match_method; + +-- Pattern 12: Code-execution traceability NULL check +SELECT COUNT(*) AS total, + SUM(CASE WHEN model_id IS NULL THEN 1 ELSE 0 END) AS missing_model_id, + SUM(CASE WHEN python_code IS NULL THEN 1 ELSE 0 END) AS missing_code, + SUM(CASE WHEN python_code_hash IS NULL THEN 1 ELSE 0 END) AS missing_hash, + SUM(CASE WHEN anthropic_request_id IS NULL THEN 1 ELSE 0 END) AS missing_request_id +FROM code_executions ce +JOIN sessions s ON ce.session_id = s.id +WHERE s.session_key = ''; + +-- Pattern 13: bridge_metadata.git_sha = 'unknown' indicates COMMIT_SHA build arg missed +SELECT event_data->'bridge_metadata'->>'git_sha' AS git_sha, COUNT(*) +FROM hook_audit_log hal +JOIN sessions s ON hal.session_id = s.id +WHERE s.session_key = '' AND tool_name='run_python_analysis' +GROUP BY git_sha; + +-- Pattern 14: Reconciliation stall +SELECT session_key, kg_status, embedding_status, kg_error, embedding_error, + reconciliation_attempts, kg_built_at, embedding_built_at, updated_at +FROM sessions +WHERE session_key = ''; + +-- Pattern 15: FMP routing — V1 + V2 from § 8.4.X +SELECT tool_name, COUNT(*) FROM hook_audit_log hal +JOIN sessions s ON hal.session_id = s.id +WHERE s.session_key = '' AND tool_name LIKE 'mcp__equities__%' +GROUP BY tool_name; +SELECT model_id, COUNT(*), SUM(CASE WHEN success THEN 0 ELSE 1 END) AS failed +FROM code_executions ce JOIN sessions s ON ce.session_id = s.id +WHERE s.session_key = '' AND model_id IN ('M46','M47','M48','M49','M50','M51','M52','M53','M54','M55','M58') +GROUP BY model_id; +``` ## Pre-flight Checks From 212c96eb21188b2bffc1788686d58ec11ae8cab2 Mon Sep 17 00:00:00 2001 From: Number531 <120485065+Number531@users.noreply.github.com> Date: Wed, 6 May 2026 23:43:54 -0400 Subject: [PATCH 3/6] =?UTF-8?q?docs(skills):=20v7.0.1=20alignment=20?= =?UTF-8?q?=E2=80=94=20deploy=20MIG=20variant=20+=20count=20refresh?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Phase B1 + Phase C of v7.0.1 skill alignment. deploy/SKILL.md (B1): - New "Variant: MIG instance replacement mid-retries" section under the Static IP assignment race troubleshooting block - Documents the 2026-05-06 v7.0.1 deploy observation: script's Step 7 retry loop holds the original instance name across all 5 attempts while MIG silently replaces the instance behind it - Adds detection command (gcloud compute instances list filter) - Adds full manual recovery sequence on the new instance - Documents the "future hardening" loop pattern that re-resolves instance name on each retry (filed as v7.0.x code follow-up) api-integration/SKILL.md (C1): - Header counts: 36 → 38 API clients, 33 → 34 base MCP domains, 149+ → 197 tool schemas - Cross-reference to FMP-derived Phase 1.5 + 1.6 empirical-first methodology for high-precision integrations code-execution-models/SKILL.md (C2): - Frontmatter description: 45 → 56 models with v7.0.0 attribution - Format guidelines section: 45 → 56 models No code changes. Documentation only. Co-Authored-By: Claude Opus 4.7 (1M context) --- .claude/skills/api-integration/SKILL.md | 11 +++--- .claude/skills/code-execution-models/SKILL.md | 4 +-- .claude/skills/deploy/SKILL.md | 36 +++++++++++++++++++ 3 files changed, 44 insertions(+), 7 deletions(-) diff --git a/.claude/skills/api-integration/SKILL.md b/.claude/skills/api-integration/SKILL.md index 006d40843..f08ea2d6c 100644 --- a/.claude/skills/api-integration/SKILL.md +++ b/.claude/skills/api-integration/SKILL.md @@ -7,12 +7,13 @@ description: Build and integrate a new API client into the Super Legal MCP platf ## Overview -This skill integrates a new API data source into the Super Legal MCP platform following the canonical pattern used by all 36 existing clients. The process produces a fully operational hybrid client with native-first routing, Exa two-phase fallback (search + /contents enrichment), circuit breaker protection, caching, observability, and frontend catalog display. +This skill integrates a new API data source into the Super Legal MCP platform following the canonical pattern used by all 38 existing clients. The process produces a fully operational hybrid client with native-first routing, Exa two-phase fallback (search + /contents enrichment), circuit breaker protection, caching, observability, and frontend catalog display. -**Current platform state** (update these counts after each integration): -- API clients: 36 -- Base MCP domains: 33 (+conditional: code-execution, direct-fetch, exa-search) -- Tool schemas: 149+ +**Current platform state** (v7.0.1; update these counts after each integration): +- API clients: 38 (+FMP equity-research gated by FMP_ENABLED, +DirectFetch) +- Base MCP domains: 34 (+conditional: code-execution, direct-fetch, exa-search, equities) +- Tool schemas: 197 (161 base + 36 FMP equity tools when FMP_ENABLED=true) +- For high-precision integrations (live financial data, regulatory APIs), follow the FMP-derived empirical-first methodology — see Phase 1.5 (Empirical Capture & Probing) and Phase 1.6 (Endpoint Classification) below. - Production entry point: `Dockerfile:59` → `bootstrap.js` → `claude-sdk-server.js` → `clientRegistry.js` - `EnhancedLegalMcpServer.js` is legacy/local-dev only — do NOT wire new clients there diff --git a/.claude/skills/code-execution-models/SKILL.md b/.claude/skills/code-execution-models/SKILL.md index bd0365d6c..292913ccc 100644 --- a/.claude/skills/code-execution-models/SKILL.md +++ b/.claude/skills/code-execution-models/SKILL.md @@ -1,6 +1,6 @@ --- name: code-execution-models -description: Add new financial models to the code execution sandbox catalog. Use when the user asks to "add a model", "create a financial model", "add [model name] to code execution", "new analysis model", or wants to expand the PE/IB/M&A quantitative analysis toolkit. The sandbox runs Claude-generated Python (pandas, numpy, scipy, sklearn, matplotlib, seaborn) via the Anthropic code_execution_20260120 tool to produce structured JSON results, charts (PNG), and formatted tables. Currently 45 models across 13 categories. Also use when the user says "/code-execution-models". +description: Add new financial models to the code execution sandbox catalog. Use when the user asks to "add a model", "create a financial model", "add [model name] to code execution", "new analysis model", or wants to expand the PE/IB/M&A quantitative analysis toolkit. The sandbox runs Claude-generated Python (pandas, numpy, scipy, sklearn, matplotlib, seaborn) via the Anthropic code_execution_20260120 tool to produce structured JSON results, charts (PNG), and formatted tables. Currently 56 models across 13 categories (M46–M55, M58 added in v7.0.0 for FMP equity research, gated by FMP_ENABLED). Also use when the user says "/code-execution-models". --- # Code Execution Models — Add Financial Analysis Model @@ -159,7 +159,7 @@ Append to the `CODE_EXECUTION_MODELS` array after the last entry: } ``` -**Format guidelines** (match existing 45 models): +**Format guidelines** (match existing 56 models): - `description`: 100-300 words, business-context-rich, mentions charts/tables produced - `methodology`: cites specific standards (ASC, IRC, academic papers) with thresholds - `outputFormat`: explicitly states chart types and table formats to generate diff --git a/.claude/skills/deploy/SKILL.md b/.claude/skills/deploy/SKILL.md index c8786096e..e787e4864 100644 --- a/.claude/skills/deploy/SKILL.md +++ b/.claude/skills/deploy/SKILL.md @@ -137,6 +137,42 @@ gcloud compute instances add-access-config $INSTANCE --zone=us-east1-c --access- gcloud compute ssh $INSTANCE --zone=us-east1-c --command='docker restart $(docker ps -q | head -1)' ``` +### Variant: MIG instance replacement mid-retries + +**Observed on**: 2026-05-06 v7.0.1 deploy. + +**Symptom**: Step 7 retries 5x with `Could not fetch resource: super-legal-staging-XXXX` even though the script logs `IP is RESERVED` and proceeds to `Attempt 1/5: Assigning ...`. The log line keeps showing the SAME instance name across all 5 attempts. Meanwhile, `gcloud compute instances list` reveals a DIFFERENT instance name is actually running. + +**Root cause**: The MIG terminated the instance the script was targeting (e.g., `super-legal-staging-0239`) and rolled forward to a new one (e.g., `super-legal-staging-bzx4`) DURING step 7's retry budget. The script captured the original instance name in step 6 and did not re-resolve it on each retry. Every `add-access-config` call hits a deleted resource. + +**Detection between retries**: +```bash +gcloud compute instances list --filter='name~super-legal-staging AND status=RUNNING' --format='value(name)' +``` +If this returns a different instance name than what the script's log shows, the variant has triggered. + +**Manual recovery on the new instance**: +```bash +NEW_INSTANCE=$(gcloud compute instances list --filter='name~super-legal-staging AND status=RUNNING' --format='value(name)' | head -1) +gcloud compute instances delete-access-config $NEW_INSTANCE --zone=us-east1-c --access-config-name=external-nat --quiet +sleep 10 +gcloud compute instances add-access-config $NEW_INSTANCE --zone=us-east1-c --access-config-name=external-nat --address=34.26.70.60 --quiet +sed -i '' '/compute\./d' ~/.ssh/google_compute_known_hosts +gcloud compute ssh $NEW_INSTANCE --zone=us-east1-c --command='docker restart $(docker ps -q | head -1)' +``` + +Wait 60s, then verify via `curl http://34.26.70.60:3001/health`. + +**Future deploy.sh hardening** (not yet implemented): Step 7's retry loop should re-resolve the instance name on each attempt: +```bash +for attempt in 1 2 3 4 5; do + INSTANCE=$(gcloud compute instances list --filter='name~super-legal-staging AND status=RUNNING' --format='value(name)' | head -1) + gcloud compute instances add-access-config $INSTANCE --zone=us-east1-c --access-config-name=external-nat --address=34.26.70.60 --quiet 2>err && break + sleep 30 +done +``` +Filed as v7.0.x follow-up code change. + ### Docker push transient broken pipe **Observed on**: 2026-04-27 v6.7.0 deploy. From f2cea39401996237545d28c6f2d086a798934bd0 Mon Sep 17 00:00:00 2001 From: Number531 <120485065+Number531@users.noreply.github.com> Date: Wed, 6 May 2026 23:45:17 -0400 Subject: [PATCH 4/6] =?UTF-8?q?docs(skills):=20v7.0.1=20alignment=20?= =?UTF-8?q?=E2=80=94=20offboarding=20+=20backup-restore=20v7.0.0=20tables?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Phase B2 + B3 of v7.0.1 skill alignment. client-offboarding/SKILL.md (B2): - Step 6.5 archive scope expanded from 2 tables (access_log, human_interventions) to 5 tables, adding the v7.0.0 compliance tables that must survive DB drop: * transcript_events (~700KB-1MB per session, byte-faithful session reload audit) * citation_source_links (citation→source bridge with confidence scores; required for hallucination audit reproducibility) * code_execution_inputs (data lineage junction; required for EU AI Act Art. 15 reproducibility chain) - New paragraph on hook_audit_log.bridge_metadata JSONB preservation in archive (git_sha + sdk_version + container_id + system_prompt_hash = regulator-replay envelope, MUST be preserved) - Legacy clients predating each table gracefully skip via '2>/dev/null || warn' (existing pattern preserved) client-backup-restore/SKILL.md (B3): - Backup Types table size estimates updated for v7.0.0+: full: 150-300 MB → 200-400 MB database-only: 30-80 MB → 30-100 MB (transcript_events growth) - New paragraph "v7.0.0+ database-only scope" documenting: * 5 new compliance tables included in standard Cloud SQL export * transcript_events identified as largest growth vector * 13 new code_executions reproducibility columns * bridge_metadata JSONB column on hook_audit_log - Restore verification queries: 4 new row count checks for v7.0.0+ deployments (transcript_events, code_executions, citation_source_ links, bridge_metadata preservation) No code changes. Documentation only. Underlying offboard-client.sh and provision-client.sh scripts already export ALL tables via 'gcloud sql export sql' (full DB dump) — v7.0.0 tables are automatically included; this PR only updates the documentation to reflect that scope explicitly for compliance audit clarity. Co-Authored-By: Claude Opus 4.7 (1M context) --- .claude/skills/client-backup-restore/SKILL.md | 17 +++++++++++++++-- .claude/skills/client-offboarding/SKILL.md | 7 ++++++- 2 files changed, 21 insertions(+), 3 deletions(-) diff --git a/.claude/skills/client-backup-restore/SKILL.md b/.claude/skills/client-backup-restore/SKILL.md index 71b0dc052..4a769b277 100644 --- a/.claude/skills/client-backup-restore/SKILL.md +++ b/.claude/skills/client-backup-restore/SKILL.md @@ -187,10 +187,23 @@ gcloud sql backups restore {backup_id} \ | Type | What's included | Size estimate | Duration | |---|---|---|---| -| `full` | Database + reports directory | ~150-300 MB | 3-5 min | -| `database-only` | Cloud SQL export (all tables + data) | ~30-80 MB | 1-2 min | +| `full` | Database + reports directory | ~200-400 MB (v7.0.0+: includes transcript_events ~700KB-1MB/session) | 3-6 min | +| `database-only` | Cloud SQL export (all tables + data) | ~30-100 MB (larger on v7.0.0+ deployments due to transcript_events) | 1-3 min | | `reports-only` | Reports directory (sessions, raw sources) | ~100-250 MB | 2-4 min | +**v7.0.0+ database-only scope** — Cloud SQL export captures all tables, including the new compliance/observability tables introduced in v7.0.0: +- `transcript_events` — full SSE event history per session (~700KB-1MB per session × N sessions). **Largest growth vector** at 10K+ sessions. +- `code_executions` (now with 13 reproducibility columns: model_id, anthropic_request_id, system_prompt_hash, python_code, python_code_hash, container_id, etc.) — required for byte-replay envelope per EU AI Act Art. 15 +- `code_execution_inputs` — data lineage junction (small table, 1-5 rows per execution) +- `citation_source_links` — citation→source bridge with confidence scores (1 row per matched citation) +- `hook_audit_log` — now includes `bridge_metadata` JSONB column with `git_sha + sdk_version + container_id + system_prompt_hash` (regulator-replay envelope) + +Restore verification (Phase 4) should confirm these row counts post-restore for v7.0.0+ deployments: +- `SELECT COUNT(*) FROM transcript_events` matches pre-backup count +- `SELECT COUNT(*) FROM code_executions WHERE model_id IS NOT NULL` matches pre-backup count (NULL model_id = pre-v6.8.4 row, allowed) +- `SELECT COUNT(*) FROM citation_source_links` matches pre-backup count +- `SELECT event_data->'bridge_metadata' IS NOT NULL FROM hook_audit_log WHERE tool_name='run_python_analysis'` — bridge_metadata preserved on restore + ## Storage Locations All backups stored in the client's WORM bucket: diff --git a/.claude/skills/client-offboarding/SKILL.md b/.claude/skills/client-offboarding/SKILL.md index 49c76378d..8d14f5032 100644 --- a/.claude/skills/client-offboarding/SKILL.md +++ b/.claude/skills/client-offboarding/SKILL.md @@ -55,7 +55,12 @@ bash /Users/ej/Super-Legal/.claude/skills/client-offboarding/scripts/offboard-cl **Step 7**: Verify archives — checks that archive files exist in GCS and have non-zero size. Reports checksums. -**Step 6.5**: Archive Wave 3 audit tables as dedicated CSV artifacts — `access_log` (EU AI Act Article 12 read-side evidence) and `human_interventions` (EU AI Act Article 14 operator governance evidence) exported via `psql COPY TO STDOUT` + gzip to `gs://super-legal-worm-{client_id}/archive/{table}-{date}.csv.gz`. Cloud SQL's native `gcloud sql export csv` doesn't support table-level `--query` filtering, so the script uses psql directly against the connection string resolved from Secret Manager. Runs AFTER archive verification (Step 7) and BEFORE any destructive deletion (Phase 3) — these tables must survive the DB drop as standalone legal records. Legacy clients predating Wave 3 (tables don't exist) gracefully skip via `2>/dev/null || warn`. v6.5.1+ instances also have `hook_audit_log WHERE event_type = 'KGBuild'` entries — include in archive for KG build audit trail. Requires v6.6.0+ for complete telemetry (background task tracking, pool survival). +**Step 6.5**: Archive compliance audit tables as dedicated CSV artifacts. **Wave 3 tables**: `access_log` (EU AI Act Article 12 read-side evidence) and `human_interventions` (EU AI Act Article 14 operator governance evidence). **v7.0.0 tables** (added in this scope): +- `transcript_events` — full SSE event history per session (~700KB-1MB per session × N sessions; may be largest archive file). Required for byte-faithful session-reload audit if regulator queries any session history. +- `citation_source_links` — citation→raw-source bridge with confidence scores. Required for hallucination audit (any citation with `confidence_score < 0.85` flagged for QA review at session-time should be reproducible from this archive). +- `code_execution_inputs` — data lineage junction linking code executions to upstream subagent reports/embeddings/KG nodes. Required for EU AI Act Art. 15 reproducibility chain ("which subagent's output drove this DCF result?"). + +All exported via `psql COPY TO STDOUT` + gzip to `gs://super-legal-worm-{client_id}/archive/{table}-{date}.csv.gz`. Cloud SQL's native `gcloud sql export csv` doesn't support table-level `--query` filtering, so the script uses psql directly against the connection string resolved from Secret Manager. Runs AFTER archive verification (Step 7) and BEFORE any destructive deletion (Phase 3) — these tables must survive the DB drop as standalone legal records. Legacy clients predating each table gracefully skip via `2>/dev/null || warn`. v6.5.1+ instances also have `hook_audit_log WHERE event_type = 'KGBuild'` entries — include in archive for KG build audit trail. **v7.0.0+ instances** have `hook_audit_log` rows with `bridge_metadata` JSONB (`git_sha + sdk_version + container_id + system_prompt_hash`) — these are the regulator-replay envelope and MUST be preserved in the audit log archive. Requires v6.6.0+ for complete telemetry (background task tracking, pool survival). ### Phase 3: Resource Deletion (DESTRUCTIVE — requires --confirm) From dd4203b546974c874919fd5a65b142490bec38ba Mon Sep 17 00:00:00 2001 From: Number531 <120485065+Number531@users.noreply.github.com> Date: Wed, 6 May 2026 23:50:58 -0400 Subject: [PATCH 5/6] =?UTF-8?q?fix(skills):=20correct=20python=5Fcode=5Fha?= =?UTF-8?q?sh=20=E2=86=92=20system=5Fprompt=5Fhash=20(column=20doesn't=20e?= =?UTF-8?q?xist)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Test-driven correctness fix found by skill validation harness: referenced 'python_code_hash' column does NOT exist in code_executions schema. The actual reproducibility hash column is 'system_prompt_hash' (SHA-256 of system prompt for drift detection). 'python_code' itself is stored verbatim, no separate hash. Three skills had references to the non-existent column: - infrastructure-health/references/postgresql.md (column list) - session-diagnostics/SKILL.md (failure pattern + SQL query) - client-backup-restore/SKILL.md (column list) All three now reference the correct schema columns. Re-validated against src/db/postgres.js — 12 of 12 referenced columns confirmed present in the ALTER TABLE code_executions ADD COLUMN statements (model_id, llm_name, anthropic_request_id, anthropic_message_id, input/output/cache tokens, system_prompt_hash, python_code, container_id, tool_use_id, stop_reason, turn_count, pause_count) plus refusal_detected (BOOLEAN DEFAULT FALSE in CREATE TABLE block). Note: CHANGELOG.md and system-design.md still reference 'python_code_hash' — those are out of scope for this skill alignment PR. Will be corrected in a follow-up doc patch (logged as v7.0.x deferred work). No code changes. Documentation only. Co-Authored-By: Claude Opus 4.7 (1M context) --- .claude/skills/client-backup-restore/SKILL.md | 2 +- .claude/skills/infrastructure-health/references/postgresql.md | 2 +- .claude/skills/session-diagnostics/SKILL.md | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/.claude/skills/client-backup-restore/SKILL.md b/.claude/skills/client-backup-restore/SKILL.md index 4a769b277..3f4d1c12f 100644 --- a/.claude/skills/client-backup-restore/SKILL.md +++ b/.claude/skills/client-backup-restore/SKILL.md @@ -193,7 +193,7 @@ gcloud sql backups restore {backup_id} \ **v7.0.0+ database-only scope** — Cloud SQL export captures all tables, including the new compliance/observability tables introduced in v7.0.0: - `transcript_events` — full SSE event history per session (~700KB-1MB per session × N sessions). **Largest growth vector** at 10K+ sessions. -- `code_executions` (now with 13 reproducibility columns: model_id, anthropic_request_id, system_prompt_hash, python_code, python_code_hash, container_id, etc.) — required for byte-replay envelope per EU AI Act Art. 15 +- `code_executions` (now with 13+ reproducibility columns: model_id, llm_name, anthropic_request_id, anthropic_message_id, system_prompt_hash, python_code, container_id, tool_use_id, stop_reason, turn_count, pause_count, refusal_detected, etc.) — required for byte-replay envelope per EU AI Act Art. 15 - `code_execution_inputs` — data lineage junction (small table, 1-5 rows per execution) - `citation_source_links` — citation→source bridge with confidence scores (1 row per matched citation) - `hook_audit_log` — now includes `bridge_metadata` JSONB column with `git_sha + sdk_version + container_id + system_prompt_hash` (regulator-replay envelope) diff --git a/.claude/skills/infrastructure-health/references/postgresql.md b/.claude/skills/infrastructure-health/references/postgresql.md index 33a01ab50..b069f853c 100644 --- a/.claude/skills/infrastructure-health/references/postgresql.md +++ b/.claude/skills/infrastructure-health/references/postgresql.md @@ -14,7 +14,7 @@ | sessions | Session tracking | 1 row per pipeline run | | reports | Report versions | ~10-20 rows per session | | hook_audit_log | Agent activity audit | 100-500 rows per session; v7.0.0 adds `bridge_metadata` JSONB + `tool_use_id` columns | -| code_executions | Per-`run_python_analysis` execution audit | 1 row per code execution; **v7.0.0 adds 13 reproducibility columns** (model_id, llm_name, anthropic_request_id, input/output/cache tokens, system_prompt_hash, python_code, python_code_hash, container_id, tool_use_id, stop_reason, refusal_detected) | +| code_executions | Per-`run_python_analysis` execution audit | 1 row per code execution; **v7.0.0 adds reproducibility columns** (model_id, llm_name, anthropic_request_id, anthropic_message_id, input/output/cache tokens, system_prompt_hash, python_code, container_id, tool_use_id, stop_reason, turn_count, pause_count, refusal_detected) | | code_execution_inputs | **v7.0.0** — data lineage junction linking each code execution to upstream subagent reports/embeddings/KG nodes | 1-5 rows per code execution | | transcript_events | **v7.0.0** — full-fidelity SSE event capture (`migrations/012_transcript-events.up.sql`); buffered batch insert | ~4,000-6,000 rows per 30-50 min session; **~700KB-1MB storage per session** | | citation_source_links | **v7.0.0** — citation→source bridge with fuzzy matching (URL exact / URL fuzzy / title fuzzy / embedding cosine) + confidence score | 1 row per memo footnote matched | diff --git a/.claude/skills/session-diagnostics/SKILL.md b/.claude/skills/session-diagnostics/SKILL.md index 91e0f8e7f..cd8d79a05 100644 --- a/.claude/skills/session-diagnostics/SKILL.md +++ b/.claude/skills/session-diagnostics/SKILL.md @@ -105,7 +105,7 @@ See `references/failure-patterns.md` for the full catalog. Summary: | 9 | Hook audit gaps (audit_log row count < 0.5x reports count) | WARNING | | 10 | **Transcript replay gap** (v7.0.0): `transcript_events` row count = 0 for completed session OR sessions exists but flush failed (missing late events) | CRITICAL | | 11 | **Citation low-confidence** (v7.0.0): `citation_source_links.confidence_score < 0.85` for >20% of citations — fuzzy matches flagged for QA, possible hallucinated citations | WARNING | -| 12 | **Code-execution traceability NULL** (v7.0.0): `code_executions` rows with NULL `model_id`, `anthropic_request_id`, `python_code`, or `python_code_hash` — regulator audit gap, EU AI Act Art. 15 byte-replay envelope broken | CRITICAL | +| 12 | **Code-execution traceability NULL** (v7.0.0): `code_executions` rows with NULL `model_id`, `anthropic_request_id`, `python_code`, or `system_prompt_hash` — regulator audit gap, EU AI Act Art. 15 byte-replay envelope broken | CRITICAL | | 13 | **bridge_metadata corruption / missing git_sha** (v7.0.0): `hook_audit_log` rows where `event_data->'bridge_metadata'` is malformed OR `bridge_metadata.git_sha = 'unknown'` indicating COMMIT_SHA build arg missed | WARNING | | 14 | **Reconciliation stall** (v6.7.0): sessions with `kg_status='building'` >10 minutes OR `reconciliation_attempts >= 3` (parked, manual intervention required) | CRITICAL | | 15 | **FMP equity-analyst routing failure** (v7.0.0): orchestrator dispatched equity-analyst (visible in `hook_audit_log` SubagentStart with `agent_type='equity-analyst'`) but `claude_api_client_results_total{tool_name LIKE 'mcp__equities__%', fetch_source}` shows `exa_fallback` instead of `fmp_native` (FMP_API_KEY invalid or rate-limited); OR M46–M58 model rows show `success=false` | WARNING | @@ -134,7 +134,7 @@ GROUP BY match_method; SELECT COUNT(*) AS total, SUM(CASE WHEN model_id IS NULL THEN 1 ELSE 0 END) AS missing_model_id, SUM(CASE WHEN python_code IS NULL THEN 1 ELSE 0 END) AS missing_code, - SUM(CASE WHEN python_code_hash IS NULL THEN 1 ELSE 0 END) AS missing_hash, + SUM(CASE WHEN system_prompt_hash IS NULL THEN 1 ELSE 0 END) AS missing_prompt_hash, SUM(CASE WHEN anthropic_request_id IS NULL THEN 1 ELSE 0 END) AS missing_request_id FROM code_executions ce JOIN sessions s ON ce.session_id = s.id From 9a0323db0487cf12761680338bbfaaa46e34fc6b Mon Sep 17 00:00:00 2001 From: Number531 <120485065+Number531@users.noreply.github.com> Date: Thu, 7 May 2026 00:15:05 -0400 Subject: [PATCH 6/6] fix(skills): correct 8 column-name bugs found by independent audit MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The independent audit of PR #96 found that 3 of the 6 new SQL queries in session-diagnostics + the citation_source_links probe in the infrastructure-health prometheus-alerts.md reference 8 columns that do not exist in the production schema. Operators copy-pasting these queries would hit hard 'ERROR: column does not exist' on Patterns 11 and 14. Real schema (verified from src/db/postgres.js): citation_source_links columns: id, report_id, citation_marker, source_hash, confidence, matched_via, citation_text, created_at - confidence_score → confidence - match_method → matched_via - matched_at → created_at sessions reconciliation columns (added by SESSIONS_RECONCILIATION_DDL): kg_status, kg_build_attempts, kg_breaker_skipped_count, last_kg_build_attempt_at, kg_build_last_error, artifacts_status, artifacts_build_attempts, last_artifacts_build_attempt_at, artifacts_build_last_error - kg_error → kg_build_last_error - embedding_status → artifacts_status (no separate embedding pipeline) - embedding_error → artifacts_build_last_error - reconciliation_attempts → kg_build_attempts - kg_built_at → last_kg_build_attempt_at - embedding_built_at → DROP (does not exist; only kg + artifacts pipelines) - reconciliation_attempts >= 3 → kg_build_attempts >= 5 (real retry budget) Files corrected: - session-diagnostics/SKILL.md: Pattern 11 SQL + table row, Pattern 14 SQL + table row threshold. Added kg_breaker_skipped_count to Pattern 14 SELECT. - infrastructure-health/references/prometheus-alerts.md: citation probe SQL (3 columns). - client-offboarding/SKILL.md: descriptive 'confidence_score < 0.85' reference in Step 6.5 archive scope. Re-validated: all 8 corrections now reference real schema columns. 4/4 skill files clean of bogus column refs. This is the v7.0.1 deploy's third doc-correctness sweep — the 'measure twice, cut once' principle of dual-path schema discipline should extend to the operator-facing SQL docs as well. Co-Authored-By: Claude Opus 4.7 (1M context) --- .claude/skills/client-offboarding/SKILL.md | 2 +- .../references/prometheus-alerts.md | 10 +++++----- .claude/skills/session-diagnostics/SKILL.md | 20 +++++++++++-------- 3 files changed, 18 insertions(+), 14 deletions(-) diff --git a/.claude/skills/client-offboarding/SKILL.md b/.claude/skills/client-offboarding/SKILL.md index 8d14f5032..c7abe3594 100644 --- a/.claude/skills/client-offboarding/SKILL.md +++ b/.claude/skills/client-offboarding/SKILL.md @@ -57,7 +57,7 @@ bash /Users/ej/Super-Legal/.claude/skills/client-offboarding/scripts/offboard-cl **Step 6.5**: Archive compliance audit tables as dedicated CSV artifacts. **Wave 3 tables**: `access_log` (EU AI Act Article 12 read-side evidence) and `human_interventions` (EU AI Act Article 14 operator governance evidence). **v7.0.0 tables** (added in this scope): - `transcript_events` — full SSE event history per session (~700KB-1MB per session × N sessions; may be largest archive file). Required for byte-faithful session-reload audit if regulator queries any session history. -- `citation_source_links` — citation→raw-source bridge with confidence scores. Required for hallucination audit (any citation with `confidence_score < 0.85` flagged for QA review at session-time should be reproducible from this archive). +- `citation_source_links` — citation→raw-source bridge with confidence scores. Required for hallucination audit (any citation with `confidence < 0.85` flagged for QA review at session-time should be reproducible from this archive). - `code_execution_inputs` — data lineage junction linking code executions to upstream subagent reports/embeddings/KG nodes. Required for EU AI Act Art. 15 reproducibility chain ("which subagent's output drove this DCF result?"). All exported via `psql COPY TO STDOUT` + gzip to `gs://super-legal-worm-{client_id}/archive/{table}-{date}.csv.gz`. Cloud SQL's native `gcloud sql export csv` doesn't support table-level `--query` filtering, so the script uses psql directly against the connection string resolved from Secret Manager. Runs AFTER archive verification (Step 7) and BEFORE any destructive deletion (Phase 3) — these tables must survive the DB drop as standalone legal records. Legacy clients predating each table gracefully skip via `2>/dev/null || warn`. v6.5.1+ instances also have `hook_audit_log WHERE event_type = 'KGBuild'` entries — include in archive for KG build audit trail. **v7.0.0+ instances** have `hook_audit_log` rows with `bridge_metadata` JSONB (`git_sha + sdk_version + container_id + system_prompt_hash`) — these are the regulator-replay envelope and MUST be preserved in the audit log archive. Requires v6.6.0+ for complete telemetry (background task tracking, pool survival). diff --git a/.claude/skills/infrastructure-health/references/prometheus-alerts.md b/.claude/skills/infrastructure-health/references/prometheus-alerts.md index ed36f698d..0e734d157 100644 --- a/.claude/skills/infrastructure-health/references/prometheus-alerts.md +++ b/.claude/skills/infrastructure-health/references/prometheus-alerts.md @@ -98,12 +98,12 @@ GROUP BY s.session_key HAVING COUNT(t.id) = 0 -- sessions with no events = persistence broken ORDER BY s.created_at DESC LIMIT 10; --- citation_source_links: confidence score distribution -SELECT match_method, COUNT(*) AS total, - SUM(CASE WHEN confidence_score < 0.85 THEN 1 ELSE 0 END) AS low_confidence +-- citation_source_links: confidence distribution +SELECT matched_via, COUNT(*) AS total, + SUM(CASE WHEN confidence < 0.85 THEN 1 ELSE 0 END) AS low_confidence FROM citation_source_links -WHERE matched_at > NOW() - INTERVAL '24 hours' -GROUP BY match_method; +WHERE created_at > NOW() - INTERVAL '24 hours' +GROUP BY matched_via; -- code_execution_inputs: lineage row count per execution SELECT ce.id, ce.agent_type, COUNT(cei.id) AS lineage_rows diff --git a/.claude/skills/session-diagnostics/SKILL.md b/.claude/skills/session-diagnostics/SKILL.md index cd8d79a05..a49357e00 100644 --- a/.claude/skills/session-diagnostics/SKILL.md +++ b/.claude/skills/session-diagnostics/SKILL.md @@ -104,10 +104,10 @@ See `references/failure-patterns.md` for the full catalog. Summary: | 8 | Empty session (0 reports rows) | INFO | | 9 | Hook audit gaps (audit_log row count < 0.5x reports count) | WARNING | | 10 | **Transcript replay gap** (v7.0.0): `transcript_events` row count = 0 for completed session OR sessions exists but flush failed (missing late events) | CRITICAL | -| 11 | **Citation low-confidence** (v7.0.0): `citation_source_links.confidence_score < 0.85` for >20% of citations — fuzzy matches flagged for QA, possible hallucinated citations | WARNING | +| 11 | **Citation low-confidence** (v7.0.0): `citation_source_links.confidence < 0.85` for >20% of citations — fuzzy matches flagged for QA, possible hallucinated citations | WARNING | | 12 | **Code-execution traceability NULL** (v7.0.0): `code_executions` rows with NULL `model_id`, `anthropic_request_id`, `python_code`, or `system_prompt_hash` — regulator audit gap, EU AI Act Art. 15 byte-replay envelope broken | CRITICAL | | 13 | **bridge_metadata corruption / missing git_sha** (v7.0.0): `hook_audit_log` rows where `event_data->'bridge_metadata'` is malformed OR `bridge_metadata.git_sha = 'unknown'` indicating COMMIT_SHA build arg missed | WARNING | -| 14 | **Reconciliation stall** (v6.7.0): sessions with `kg_status='building'` >10 minutes OR `reconciliation_attempts >= 3` (parked, manual intervention required) | CRITICAL | +| 14 | **Reconciliation stall** (v6.7.0): sessions with `kg_status='building'` >10 minutes OR `kg_build_attempts >= 5` (kgBreaker retry budget exhausted, parked for manual intervention) | CRITICAL | | 15 | **FMP equity-analyst routing failure** (v7.0.0): orchestrator dispatched equity-analyst (visible in `hook_audit_log` SubagentStart with `agent_type='equity-analyst'`) but `claude_api_client_results_total{tool_name LIKE 'mcp__equities__%', fetch_source}` shows `exa_fallback` instead of `fmp_native` (FMP_API_KEY invalid or rate-limited); OR M46–M58 model rows show `success=false` | WARNING | ### v7.0.0 Diagnostic Queries (operator copy-paste) @@ -122,13 +122,13 @@ GROUP BY s.session_key, s.status; -- Expected: event_count > 0 (typical 4,000-6,000 events per memo session) -- Pattern 11: Citation low-confidence distribution -SELECT match_method, COUNT(*) AS total, - SUM(CASE WHEN confidence_score < 0.85 THEN 1 ELSE 0 END) AS low_conf +SELECT matched_via, COUNT(*) AS total, + SUM(CASE WHEN confidence < 0.85 THEN 1 ELSE 0 END) AS low_conf FROM citation_source_links csl JOIN reports r ON csl.report_id = r.id JOIN sessions s ON r.session_id = s.id WHERE s.session_key = '' -GROUP BY match_method; +GROUP BY matched_via; -- Pattern 12: Code-execution traceability NULL check SELECT COUNT(*) AS total, @@ -147,9 +147,13 @@ JOIN sessions s ON hal.session_id = s.id WHERE s.session_key = '' AND tool_name='run_python_analysis' GROUP BY git_sha; --- Pattern 14: Reconciliation stall -SELECT session_key, kg_status, embedding_status, kg_error, embedding_error, - reconciliation_attempts, kg_built_at, embedding_built_at, updated_at +-- Pattern 14: Reconciliation stall (kg + artifacts pipelines) +SELECT session_key, status, + kg_status, kg_build_attempts, kg_breaker_skipped_count, + last_kg_build_attempt_at, kg_build_last_error, + artifacts_status, artifacts_build_attempts, + last_artifacts_build_attempt_at, artifacts_build_last_error, + updated_at FROM sessions WHERE session_key = '';