Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions .claude/skills/client-backup-restore/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -198,9 +198,9 @@ gcloud sql backups restore {backup_id} \
- `citation_source_links` — citation→source bridge with confidence scores (1 row per matched citation)
- `citation_verdicts` — per-footnote G5 verdicts (v6.8.6 T1, PR #122). 1 row per verified footnote; ~300-500 rows per memo session that ran citation-websearch-verifier. FK ON DELETE CASCADE on reports + sessions — backed up automatically via pg_dump; no manual handling needed.
- `hook_audit_log` — now includes `bridge_metadata` JSONB column with `git_sha + sdk_version + container_id + system_prompt_hash` (regulator-replay envelope)
- **v7.x XLSX renderer (migrations 015 + 016)**:
- `human_interventions.metadata` JSONB column (added in 015) — carries Art. 17 cascade-erasure audit payloads + Art. 14 manual-override context
- `xlsx_renders` — one row per workbook render attempt (template_id, render_status, audit_results JSONB, artifact_id, xlsx_safe_flip_count). **Decision record per Art. 12 retention — preserved on GDPR erasure** (see `docs/compliance/xlsx-art17-scope.md`).
- **v7.x XLSX renderer (migrations 016 + 017 + 018)**:
- `human_interventions.metadata` JSONB column (added in 016 — was 015 pre-merge with main) — carries Art. 17 cascade-erasure audit payloads + Art. 14 manual-override context
- `xlsx_renders` — one row per workbook render **request** (template_id, render_status, audit_results JSONB, artifact_id, xlsx_safe_flip_count + 4 generated columns from migration 018: `audit_status`, `sheet_count`, `warnings_count`, `node_audit_ran`). Lifecycle: `'pending' → 'running' → 'completed'|'failed'|'built'|'reconciled_failed'`. Async-202 endpoint (Issue #88) inserts at `'pending'`; auto-trigger path identical. **Decision record per Art. 12 retention — preserved on GDPR erasure** (see `docs/compliance/xlsx-art17-scope.md`).
- `xlsx_render_inputs` — junction table linking each render to its consumed `code_executions` rows

Restore verification (Phase 4) should confirm these row counts post-restore for v7.0.0+ deployments:
Expand All @@ -209,7 +209,7 @@ Restore verification (Phase 4) should confirm these row counts post-restore for
- `SELECT COUNT(*) FROM citation_source_links` matches pre-backup count
- `SELECT COUNT(*) FROM citation_verdicts` matches pre-backup count (zero rows is acceptable for sessions that ran before v6.8.6 OR that never invoked citation-websearch-verifier)
- `SELECT event_data->'bridge_metadata' IS NOT NULL FROM hook_audit_log WHERE tool_name='run_python_analysis'` — bridge_metadata preserved on restore
- `SELECT COUNT(*) FROM xlsx_renders` matches pre-backup count (when `XLSX_RENDERER=true`)
- `SELECT COUNT(*) FROM xlsx_renders` matches pre-backup count (when `XLSX_RENDERER=true`). Post-restore: any `render_status IN ('pending','running')` rows will be picked up by reconciliation within `STUCK_BUILD_THRESHOLD_MIN`=60min — this is expected (renders resume from snapshot state)
- `SELECT COUNT(*) FROM xlsx_render_inputs` matches pre-backup count
- `SELECT COUNT(*) FROM human_interventions WHERE metadata != '{}'::jsonb` — Art. 17 / Art. 14 audit-trail metadata preserved

Expand Down
6 changes: 4 additions & 2 deletions .claude/skills/infrastructure-health/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,9 +114,11 @@ Timestamp: <ISO8601> | Target: <base_url>

### Reconciliation: <HEALTHY|WARNING|CRITICAL>
- KG: pending=<n> stuck=<n> | Artifacts: pending=<n> stuck=<n>
- XLSX renders: pending=<n> stuck=<n> (from `/health.reconciliation.pending_xlsx_renders` + `stuck_xlsx_renders` — only present when `XLSX_RENDERER=true`)
- XLSX renders backlog: pending=<n> stuck=<n> (from `/health.reconciliation.pending_xlsx_renders` + `stuck_xlsx_renders` — only present when `XLSX_RENDERER=true`)
- Last scan: <ISO8601>
- Threshold: pending_xlsx_renders > 10 sustained → WARNING (per `docs/pending-updates/excel-code-execution.md` §12.1); presence of `xlsx_renders_error` field → CRITICAL (schema_missing or query_failed bucket)
- **Normal in-flight signal** (post-Issue#88 async-202): renders move `'pending' → 'running' → 'completed'` over ~10 minutes typical. Brief `pending` spikes track manual-endpoint traffic, NOT stuck work. Threshold `pending > 10 sustained for >15 min` indicates real backlog.
- **Stuck render signal**: `stuck_xlsx_renders` (reconciliation_attempts ≥ 3) > 0 sustained → WARNING. Investigation query: `SELECT id, render_status, started_at, reconciliation_attempts, error_message FROM xlsx_renders WHERE render_status IN ('pending','running') AND started_at < NOW() - INTERVAL '15 minutes' ORDER BY started_at`.
- **Schema/query failure**: presence of `xlsx_renders_error` field → CRITICAL (schema_missing or query_failed bucket — see `docs/pending-updates/excel-code-execution.md` §12.1)

Overall: <n>/<total> healthy
```
Expand Down
2 changes: 1 addition & 1 deletion .claude/skills/post-deploy-verify/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ Embeds the verification protocol from `super-legal-mcp-refactored/docs/pending-u
| **Container env audit** | `OTEL_TRACES_SAMPLER`, `OTEL_TRACES_SAMPLER_ARG`, `FMP_ENABLED`, `COMMIT_SHA`, `BCRYPT_ROUNDS` all present in container env |
| **V5 (v7.6.1)**: Exa A3 telemetry + audit log | When `EXA_ADDITIONAL_QUERIES=true`: `/metrics` exposes `claude_exa_ab_latency_ms{outcome=...}` with ≥1 outcome value populated AND `hook_audit_log` has ≥1 row with `event_data ? 'exa_a3'` in last 1h after a session run. Otherwise: WARNING "no A3 traffic in window". Skip if flag off. |
| **V6 (v6.8.6 T1 + v6.8.7 T2)**: G5 citation-verifier observability | `/metrics` exposes all 4 `citation_verifier_*` series (HELP/TYPE lines registered). PASSED when 4/4 found regardless of value (gauge/counter values populate after first G5 run). WARNING if partial (stale image suspected) or zero (sdkMetrics export broken). Companion DB check via `queries/v6-citation-verdicts-presence.sql` — verifies `citation_verdicts` table shape + first-session population. Post-first-G5-run: query confirms ≥1 row per session. |
| **V7 (v7.x XLSX renderer)**: workbook deliverables + schema + metrics | When `XLSX_RENDERER=true`: (a) `xlsx_renders` table exists; (b) `SELECT COUNT(*) FROM xlsx_renders WHERE render_status='failed' AND started_at > NOW() - INTERVAL '1 day'` returns 0; (c) `/metrics` exposes `claude_xlsx_render_invocations_total` and `claude_xlsx_render_duration_seconds_bucket`; (d) `/health.reconciliation.pending_xlsx_renders` field is present (success path) OR `xlsx_renders_error` reports a bucketed code (table-missing OR query-failed during deploy-order window). Skip with WARNING if `XLSX_RENDERER=false`. |
| **V7 (v7.x XLSX renderer + Issue #88 async-202)**: workbook deliverables + schema + metrics + async-202 envelope | When `XLSX_RENDERER=true`: (a) `xlsx_renders` table exists with all 4 generated columns (`audit_status`, `sheet_count`, `warnings_count`, `node_audit_ran`); (b) `SELECT COUNT(*) FROM xlsx_renders WHERE render_status='failed' AND started_at > NOW() - INTERVAL '1 day'` returns 0 (terminal-state failures only — `'pending'`/`'running'` rows older than `STUCK_BUILD_THRESHOLD_MIN`=60min indicate reconciliation backlog, not deploy issues); (c) `/metrics` exposes `claude_xlsx_render_invocations_total` and `claude_xlsx_render_duration_seconds_bucket` AND `claude_xlsx_render_manual_calls_total{outcome="dispatched"}` is a registered series (proves async-202 envelope shipped — value may be 0 until first manual render); (d) `/health.reconciliation.pending_xlsx_renders` field is present (success path) OR `xlsx_renders_error` reports a bucketed code; (e) **smoke probe** (optional, requires a test session): `curl -X POST $URL/api/render-workbook/$SESSION` returns HTTP 202 with JSON keys `render_id` + `status` + `status_poll_url` + `sse_url`; calling `GET $URL/api/render-workbook/$render_id/status` returns `status ∈ {pending, running, completed, failed}`. Skip with WARNING if `XLSX_RENDERER=false`. |

## Tier 3 — Metrics + Reconciliation + Trace (~10 min)

Expand Down
24 changes: 24 additions & 0 deletions .claude/skills/session-diagnostics/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -213,3 +213,27 @@ These are surfaced in the Remediation Suggestions section as suggestions — nev
- **Cloud Trace not integrated** — OTel spans exist (`reconciliation.scan → kg.extract_full → kg.phase1_*`) but the codebase doesn't yet label spans with `session_key`. Skipping Cloud Trace integration in v1 of this skill.
- **Single-session scope** — one invocation per session_key. Cross-session aggregation deferred to a follow-up skill.
- **GCP auth required** — fetches DB credentials from Secret Manager. If you don't have `gcloud auth` set up, set `PG_CONNECTION_STRING` in the environment to bypass.

## XLSX render lifecycle (when XLSX_RENDERER=true) — Issue #88 async-202

When a session reports "workbook never arrived" or operators need to triage a manual-render call:

```sql
SELECT id AS render_id, render_status, template_id, audit_status, sheet_count,
warnings_count, node_audit_ran, started_at, completed_at,
reconciliation_attempts, error_message
FROM xlsx_renders
WHERE session_id = (SELECT id FROM sessions WHERE session_key = :session_key)
ORDER BY started_at DESC;
```

State interpretation (post-Issue #88):

- **`'pending'`** — row pre-created by the async-202 POST handler; the renderer's setImmediate has not yet started doing work. If `≥ 15 min` old, check server logs for `xlsx_manual_dispatch_failed` (setImmediate dispatcher errored). Reconciliation will sweep at `STUCK_BUILD_THRESHOLD_MIN = 60` min.
- **`'running'`** — renderer has called `transitionRenderToRunning(id)` and is actively executing. If `≥ 30 min` old, check the code-execution bridge logs; a container may have hung.
- **`'completed'`** — terminal success. Fetch artifact via `report_artifacts.id = xlsx_renders.artifact_id`. Audit verdict in `audit_status` (generated column).
- **`'failed'`** — terminal failure. `error_message` carries the reason; check `audit_results->'phase_audits'` for multi-turn forensic detail.
- **`'built'`** — reconciliation safe-flip; file on disk but SSE event never fired (caller may have disconnected pre-completion). Treat as successful delivery.
- **`'reconciled_failed'`** — reconciliation exhausted attempts; manual investigation required. Check the original code-execution bridge call logs.

For caller-side polling of an in-flight render: `GET /api/render-workbook/:renderId/status` returns the same shape (404 if `renderId` doesn't exist; 503 if `XLSX_RENDERER=false`).
29 changes: 29 additions & 0 deletions super-legal-mcp-refactored/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,35 @@ All notable changes to the Super Legal MCP Server are documented in this file.

## [Unreleased]

### ⚠️ Changed (BREAKING) — `POST /api/render-workbook/:sessionId` is now async-202 (Issue #88, PR [#133](https://github.com/Number531/Legal-API/pull/133))

The manual XLSX render endpoint previously returned `HTTP 200` with the full sync envelope `{ success, xlsxPath, auditResults, artifactId, durationMs }`, holding the request thread up to `OVERALL_TIMEOUT_MS` (1200s) — which caused client-side timeouts (browser ~5min, undici 30s, CI/CD 60–300s), proxy idle-timeouts (Cloud Run / nginx), and concurrency-cap connection hold.

**It now returns `HTTP 202 Accepted` immediately** with a fire-and-forget envelope:

```json
{ "render_id": "<uuid>", "status": "pending", "session_id": "<session_key>",
"template_id": "...", "created_at": "<ISO8601>",
"status_poll_url": "/api/render-workbook/<uuid>/status",
"sse_url": "/api/stream?sessionId=<session_key>" }
```

The render runs in the background; caller polls the new `GET /api/render-workbook/:renderId/status` for terminal state OR subscribes to the existing session SSE channel `/api/stream?sessionId=…` for live progress (events of type `xlsx_render` with status ∈ `{pending, complete, failed}`).

**State machine cleanup**: `xlsx_renders.render_status` rows now INSERT at `'pending'` (previously inserted at `'running'`). The renderer transitions `'pending' → 'running'` via the new `transitionRenderToRunning(id)` helper at the start of `_renderForSessionInner` — idempotent, `WHERE … AND render_status='pending'`. Reconciliation predicates (`WHERE render_status IN ('pending','running')`) already accept both states; auto-trigger path (`agentStreamHandler.js` SessionEnd) inherits the new state machine for free.

**New endpoint** `GET /api/render-workbook/:renderId/status` — same `cookieAuthMiddleware` auth as POST. Returns the full `xlsx_renders` row plus all 4 generated columns (`audit_status`, `sheet_count`, `warnings_count`, `node_audit_ran`).

**New metric outcomes** on `claude_xlsx_render_manual_calls_total`: `'dispatched'` (202 returned), `'dispatch_failed'` (pre-202 failure — DB INSERT failed, session not found, etc.). Existing `'rate_limited'` retained; `'accepted'` retired (no longer emitted by the new code).

**Refinement deferred**: `Idempotency-Key` header support is NOT in v1 — zero precedent in the codebase, inconsistent with the "consume existing infrastructure" framing of the refactor. Duplicate POSTs within the rate-limit window WILL create duplicate `xlsx_renders` rows. Clients that retry on transport blips should track their own `render_id` and avoid retrying after a successful 202. (Issue #88 deferred-refinement-3.)

**Behavior-neutral against current production**: `XLSX_RENDERER=false` in `flags.env` is the prod default; the endpoint returns `503 xlsx_renderer_disabled` regardless of POST/GET shape. The PR ships before the flag flips on.

**Files**: `src/server/claude-sdk-server.js`, `src/utils/xlsxRenderer/persist.js`, `src/utils/xlsxRenderer/index.js`, `src/utils/sdkMetrics.js`, `test/sdk/xlsx-renderer-integration.test.js` (new T27/T28/T29 — 18 assertions), `docs/api-reference.md` (new "Document Generation — Workbook Rendering" section), 4 docs/pending-updates + 4 .claude/skills updated for observability alignment.

**Test suite**: 185 pass / 0 fail / 2 skip (was 167 pre-#88).

### Added — Sonnet-deep vs Haiku-deep A/B experiment (test-only, 2026-05-12, PR forthcoming)

Empirical investigation of whether Haiku 4.5 could replace Sonnet 4.6 for `CITATION_DEEP_VERIFICATION=true` mode at ~4.4× cost reduction (measured, not 12× as agent-file comment estimated). Both arms ran with `EXA_WEB_TOOLS=true` for production parity; only the verifier subagent's model varied.
Expand Down
96 changes: 96 additions & 0 deletions super-legal-mcp-refactored/docs/api-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -484,6 +484,102 @@ The server does not currently impose application-level rate limits on operator/r

---

## Document Generation — Workbook Rendering

> **Flag-gated**: All endpoints below return `503 xlsx_renderer_disabled` when `XLSX_RENDERER=false` (the prod default).

### `POST /api/render-workbook/:sessionId`

Auth: `cookieAuthMiddleware`. **Async — returns `202 Accepted`** and dispatches the render fire-and-forget (Issue #88, v7.x). Caller subscribes to SSE for live progress OR polls the status endpoint for terminal-state confirmation. The pre-Issue#88 sync response shape (with full `auditResults` envelope inline) is **BREAKING**; clients must migrate.

**Path parameters**:
- `sessionId` — session key in `YYYY-MM-DD-NNNNNNN` format (NOT the UUID).

**Query parameters**:
- `template` (optional) — force a specific template (`session-models`, `full-deal-workbook`, …). Omit to auto-select.

**Per-user quota** (Phase 7 Issue #4): 10/hour, 50/day per `req.user.id`. Counted via `claude_xlsx_render_manual_calls_total{outcome}` Prometheus counter with outcome ∈ `{accepted, rate_limited, dispatched, dispatch_failed}`.

**Response 202**:
```json
{
"render_id": "01H8…-uuid",
"status": "pending",
"session_id": "2026-05-15-9600099",
"template_id": "session-models",
"created_at": "2026-05-15T18:30:00.000Z",
"status_poll_url": "/api/render-workbook/01H8…-uuid/status",
"sse_url": "/api/stream?sessionId=2026-05-15-9600099"
}
```

**Error responses**:

| Status | Code | Cause |
|---|---|---|
| 400 | `invalid_session_id` | Path param doesn't match `YYYY-MM-DD-NNNNNNN` |
| 400 | `template_not_found` | `?template=` doesn't match any registered template |
| 404 | `session_not_found` | No `sessions` row for the given session_key |
| 429 | `rate_limited` | Per-user quota exceeded |
| 500 | `dispatch_failed` | DB INSERT into `xlsx_renders` failed (pre-202) |
| 503 | `xlsx_renderer_disabled` | Feature flag off |
| 503 | `database_unavailable` | PG pool unreachable |

**Pre-202 audit writes**: Art. 14 `human_interventions` row + Art. 12 `access_log` row written BEFORE the 202 response, so traceability is honest even if the background render subsequently fails. Both writes are best-effort (log a warning on failure, don't block the 202).

**Idempotency (v1 — NONE)**: This endpoint has NO `Idempotency-Key` support in v1. Duplicate POSTs within the rate-limit window WILL create duplicate `xlsx_renders` rows. Clients that retry on transport blips should track their own `render_id` and avoid retrying after a successful 202. (Tracked as Issue #88 deferred-refinement-3.)

### `GET /api/render-workbook/:renderId/status`

Auth: `cookieAuthMiddleware`. Reads `xlsx_renders` by primary-key UUID.

**Path parameters**:
- `renderId` — UUID returned from the POST 202 response.

**Response 200**:
```json
{
"render_id": "01H8…-uuid",
"status": "pending|running|completed|failed|built|reconciled_failed",
"session_id": "2026-05-15-9600099",
"template_id": "session-models",
"started_at": "2026-05-15T18:30:00.000Z",
"completed_at": "2026-05-15T18:35:00.000Z",
"artifact_id": "…",
"audit_status": "PASS",
"sheet_count": 9,
"warnings_count": 0,
"node_audit_ran": true,
"error_message": null,
"reconciliation_attempts": 0
}
```

The 4 generated columns (`audit_status`, `sheet_count`, `warnings_count`, `node_audit_ran`) come from migration 018 — they're STORED projections of `audit_results` JSONB. `null` until the renderer's audit completes (multi-turn renders fill `sheet_count` only when `merge_info` is populated, etc.).

**Error responses**:
- `400 invalid_render_id` — path param isn't a valid UUID.
- `404 render_not_found` — no row with that UUID.
- `503 xlsx_renderer_disabled` — flag off.
- `503 xlsx_renders_table_missing` — migration 017 not applied.

### Live progress via SSE (preferred over polling)

Subscribe to `/api/stream?sessionId={session_key}` (NOT renderId — the channel is session-keyed). The renderer publishes events of type `xlsx_render` with the following payloads:

- **status=pending**: `{type:'xlsx_render', status:'pending', template_id, started_at}` (emitted at the start of `_renderForSessionInner`)
- **status=complete**: `{type:'xlsx_render', status:'complete', template_id, audit_status, file, size}`
- **status=failed**: `{type:'xlsx_render', status:'failed', error}`

> **Status string asymmetry** (preserved from pre-async-202 codebase): SSE event uses `'complete'`/`'failed'`; the polling endpoint reports `'completed'`/`'failed'` (matching the `xlsx_renders.render_status` enum). Clients should accept both spellings.

### Polling guidance

- **Preferred**: subscribe to SSE — push-based, no polling overhead.
- **If polling**: 5-second interval is reasonable for renders <10min; 30-second for renders >10min. Backoff to 60-second once `'pending'`/`'running'` exceeds 15 min — reconciliation will sweep at `STUCK_BUILD_THRESHOLD_MIN`=60 min.

---

**Reference docs**:
- v6.8.5 audit-export runbook: `docs/runbooks/v6.8.5-audit-export.md`
- v6.7.0 reconciliation runbook: `docs/runbooks/v6.7.0-session-reconciliation.md`
Expand Down
Loading