Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions super-legal-mcp-refactored/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,16 @@ All notable changes to the Super Legal MCP Server are documented in this file.

## [Unreleased]

### Fixed — KG risk layer empty for current-format sessions (#231, 2026-06-17)

The knowledge graph dropped the entire `risk` node layer on every current-format session. Two independent breaks, both required for a risk node to exist:

- **Persistence** — the risk-aggregator emits structured `review-outputs/risk-summary.json` (`risk-aggregator.js:25`), but the live PostToolUse hook (`hookDBBridge.js`) and the local backfill (`scripts/backfill-local-to-db.mjs`) only persisted `.md`, so it never reached the `reports` table and the KG `CRITICAL_REPORTS` gate (`risk-summary`) timed out. Fixed via a scoped `JSON_REPORT_FILENAMES` allowlist (exact basenames — `*-state.json` / `banker-*.json` / `entities.json` excluded) consulted by both persist paths; `persistReport` is content-agnostic so JSON content stores cleanly as `report_type=review, report_key=risk-summary`.
- **Parser schema drift** — Phase 7 (`kgPhases6to8.js`) keyed on the legacy `risk_categories` schema; the current producer emits `exposure_by_category` with **string** exposures (`"$433.75M"`) and **string** probability (`"8% fail"`). The JSON parser was extracted to a pure, exported `buildRiskBlocksFromJson()` (unit-tested) and extended to the current schema with the legacy numeric path preserved.
- **Extraction quality (adversarial-audit remediation)** — the node loop previously re-regexed the rendered block, leaking mitigation `$`-figures into `exposure_amounts` (the `$1,237,262,000` RRTF) and pulling a stray title `%` as the probability. The parser now emits structured `exposureAmounts`/`probability`; the node loop prefers them, with the Markdown fallback path unchanged.

Verified on the real `2026-06-16` (Fox/Roku) `risk-summary.json`: 11 risk blocks (was 0), zero mitigation-figure leaks, correct per-finding probabilities. Tests: `test/sdk/kg-phase7-risk-parser.test.js` (13 cases, legacy + current schema + the two audit regressions). Backfill also now scans `review-outputs/` for `*-state.json`. Plan + audit: `docs/pending-updates/kg-risk-layer-fix-231.md`. Recovery tooling: `scripts/restore-unpersisted-session.mjs`, `scripts/rebuild-kg-local.mjs`.

## [8.1.0] - 2026-06-08 — Forced banker intake phase + deterministic phase harness (live-validated on staging)

### Added — Forced banker intake phase + deterministic phase harness (2026-06-08)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# KG Risk-Layer Fix — Plan + Audit (issue #231)

## Problem
The KG drops the entire `risk` node layer for current-format sessions. Two independent breaks:
1. **Persistence** — `review-outputs/risk-summary.json` is never written to the `reports` table (live hook `hookDBBridge.js:1445` and backfill `walkMarkdown` persist `.md` only). KG `CRITICAL_REPORTS` gate (`hookDBBridge.js:1359`) then times out on `risk-summary`.
2. **Parser schema drift** — Phase 7 (`kgPhases6to8.js:380`) keys on `parsed.risk_categories || parsed.categories`; the current producer emits `exposure_by_category` with **string** exposures (`"$433.75M"`) and **string** probability (`"8% fail"`). Even when present (May-27 session), 0 risk nodes result.

Cardinal (May) worked because it emitted `risk-summary-narrative.md` (Markdown → `.md` persist path → Phase 7 Path B regex).

## Plan (additive, non-destructive)

### Fix #1 — Persistence (scoped allowlist, NOT all `.json`)
- `src/utils/hookDBBridge.js:1445` — persist `risk-summary.json` via a `JSON_REPORT_FILENAMES` allowlist (in `hookDBBridgeConfig.js`). **Do not broaden to all `.json`** (would pull in `*-state.json`, `banker-*.json`, `entities.json`).
- `scripts/backfill-local-to-db.mjs` — `walkMarkdown` additionally ingests files whose basename ∈ the same allowlist; also scan `review-outputs/` for `*-state.json` (pre-existing gap).
- `persistReport` is already content-agnostic; `extractReportKey` already strips `.json` → `review/risk-summary`. No change.

### Fix #2 — Parser (modularized, unit-tested)
- Extract the JSON risk-block parsing into a **pure exported function** `buildRiskBlocksFromJson(content)` in `kgPhases6to8.js` (refactor commit: byte-equivalent for the legacy schema).
- Extend it to accept `exposure_by_category` + string exposure fields (`weighted_exposure`, `exposure_low/high`) and string `probability` (passed through; already contains `%`).
- **Ordering constraint (from node-creation loop @ line 442 `amounts.slice(0,5)`):** the synth block must list `Exposure:` BEFORE `Mitigation:` so real exposure `$` amounts lead the extracted array (mitigation prose contains `$1,237,262,000`).
- Phase 13 (`kgPhase13ProbabilisticValue.js`) intentionally untouched — it requires numeric `p10/p50/p90` and correctly skips string-only findings.

### Tests (`node --test`, matching kg-phase13 style)
- `test/sdk/kg-phase7-risk-parser.test.js` — unit tests on `buildRiskBlocksFromJson` for: legacy `risk_categories` numeric, `categories` alias, current `exposure_by_category` + string exposures, exposure-before-mitigation ordering, malformed JSON → `[]`.

## Audit of the plan
| Dimension | Finding |
|---|---|
| **Blast radius** | Confined to observability/storage: `reports` table + KG `risk`/`closing_condition` nodes. Generative pipeline reads `risk-summary.json` from **disk** (`_promptConstants.js:3066`), unaffected. `documentConverter` already excludes `risk-summary.json` (no garbage DOCX). |
| **Best practices** | Allowlist (not blanket `.json`) avoids polluting `reports`. Parser extracted to a pure, testable function. Both persist paths fail-soft (`hookDBBridge.js:6`). |
| **Modularity** | One pure function (`buildRiskBlocksFromJson`) shared intent; one config constant (`JSON_REPORT_FILENAMES`) shared by live hook + backfill. No new tables, no schema migration. |
| **Seamless integration** | Legacy numeric schema preserved (refactor is byte-equivalent); new schema added as fallthrough. Existing 23 kg-phase13 tests must stay green. |
| **Anti-recurrence** | Producer↔consumer drift is the root cause → add the parser unit test built from the real `risk-summary.json` shape as the contract anchor. |

## Remediation for affected sessions
After fix: ingest `risk-summary.json` + re-run KG (upsert) + reapply embeddings for 2026-06-16 (and later 2026-06-08, 2026-05-27).
Loading