Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions super-legal-mcp-refactored/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,35 @@ All notable changes to the Super Legal MCP Server are documented in this file.

## [Unreleased]

### Changed — Avenue B Phase 1: `full-deal-workbook` sensitivity isolation (Issue #100, PR forthcoming)

The `full-deal-workbook` template's phase split has been rebalanced. Phase count is **unchanged at 5** — only the sheet routing within `phase4` and `phase5` changes:

| | Pre-Avenue-B (5 phases) | Post-Avenue-B Phase 1 (5 phases) |
|---|---|---|
| `phase4` | `['sensitivity', 'risk_register']` (220s est.) | **`['sensitivity']`** alone (180s est.) — matches `valuation-only.js` precedent |
| `phase5` | `['cover', 'exec_summary']` (130s est.) | **`['cover', 'exec_summary', 'risk_register']`** (160s est.) |
| All other phases (phase1, phase2, phase3) | unchanged | unchanged |
| Total sheets | 9 | 9 (unchanged) |
| Wall time | ~220s (gated by phase3 LBO) | ~220s (gated by phase3 LBO — **unchanged**) |
| `XLSX_PHASE_CONCURRENCY` budget | 5 | 5 (unchanged) |

**Why**: openpyxl's data-table API (used by `sensitivity` for 2D heatmaps with `formula1`/`formula2` cell-array constructs) is one of the trickiest model tasks. The pre-Avenue-B `phase4` combined it with `risk_register` (a narrative table) — two unrelated schemas in one container. Avenue B Phase 1 isolates sensitivity into its own phase to give the model focused context, matching the precedent in `valuation-only.js` (which already runs `phase3: { sheets: ['sensitivity'], estimated_seconds: 180 }` alone).

**Failure-rate-focused win, not wall-time-focused.** Phase3 (LBO, 220s) remains the wall-time gate; phase4's earlier completion is moot at the wall-time level. The expected efficacy materializes as **lower failure rate** in the sensitivity sheet — to be empirically validated via ≥30 days of post-flag-flip Prometheus data comparing `claude_xlsx_render_phase_failures_total{phase="phase4",template_id="full-deal-workbook"}` to its pre-Avenue-B baseline.

**Operator note — Prometheus history semantic shift**: per-phase metrics for `phase4` and `phase5` continue emitting on the same time-series, but the sheet content behind those labels shifts at the deploy boundary. Operators reading historical phase4 P95 duration will see a step-down (~220s → ~180s); historical phase5 P95 will see a step-up (~130s → ~160s). The total render wall time is unchanged.

**End-user note — workbook tab order shifts**: the rebalance changes the tab order of the produced `.xlsx`. Pre-Avenue-B: `assumptions, sources, dcf, comps, lbo, sensitivity, risk_register, cover, exec_summary` (risk_register at position 7). Post-Avenue-B Phase 1: `assumptions, sources, dcf, comps, lbo, sensitivity, cover, exec_summary, risk_register` (risk_register at position 9, end). Clients with bookmarks or cell references that hardcode sheet *positions* (rare — most users name-reference sheets) should update.

**L4 live-render evidence** (session `2026-05-15-9600101`, 2026-05-15): success=true, audit_status=PASS, 5/5 phases PASS, 9 sheets, 58 user-defined named ranges migrated, phase3 (LBO) audit shows 306 formulas / 0 errors / recalc PASS. Avenue B Phase 1 rebalance produces a correct end-to-end workbook.

**Files**: `src/config/xlsxTemplates/full-deal-workbook.js` (phaseSplit + comment cleanup), `test/sdk/xlsx-renderer-integration.test.js` (new T30 — 12 static-template assertions), `docs/pending-updates/excel-code-execution-phase9-plan.md` (§13 addendum), `docs/pending-updates/excel-code-execution-gate2-analysis.md` (append note). **No schema, no migration, no skill changes, no frontend changes, no API contract shift.**

**Avenue B Phase 2 (LBO sheet decomposition) deferred** — wall-time win requires splitting phase3 `['lbo']` (one sheet) across containers; architecturally larger; data-gated by ≥30 days of post-flag-flip metrics confirming phase3 is the dominant remaining bottleneck.

**Test suite**: 185 → 197 (185 baseline + 12 new from T30).

### ⚠️ Changed (BREAKING) — `POST /api/render-workbook/:sessionId` is now async-202 (Issue #88, PR [#133](https://github.com/Number531/Legal-API/pull/133))

The manual XLSX render endpoint previously returned `HTTP 200` with the full sync envelope `{ success, xlsxPath, auditResults, artifactId, durationMs }`, holding the request thread up to `OVERALL_TIMEOUT_MS` (1200s) — which caused client-side timeouts (browser ~5min, undici 30s, CI/CD 60–300s), proxy idle-timeouts (Cloud Run / nginx), and concurrency-cap connection hold.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -85,3 +85,17 @@ the failure class is closed with stronger evidence than an intermittent live rep
Task #88 — ✅ **SHIPPED** on `feature/xlsx-renderer-88-async-202`. The endpoint returns 202 + render_id envelope, dispatches via setImmediate, transitions `xlsx_renders.render_status` `'pending' → 'running' → 'completed'|'failed'`. Caller subscribes to `/api/stream?sessionId=…` (existing SSE channel) or polls `GET /api/render-workbook/:renderId/status`. Full contract in `docs/api-reference.md` → "Document Generation — Workbook Rendering". Refinement #3 (Idempotency-Key) deferred — zero precedent in codebase.

`XLSX_RENDERER` stays `false` in production until a deploy decision.

---

## Post-Gate-2 follow-up — Avenue B Phase 1 (Issue #100, shipped 2026-05-15)

Avenue B Phase 1 ships sensitivity isolation in `full-deal-workbook` via sheet redistribution within the existing 5-phase shape. **No phase-count change** (still 5 phases); **same parallel fan-out** via Gate 2's `Promise.allSettled`; **same `XLSX_PHASE_CONCURRENCY=5` budget**.

Concretely: `phase4` was `['sensitivity', 'risk_register']` (220s); now `['sensitivity']` alone (180s) matching `valuation-only.js`'s precedent. `risk_register` moves to `phase5` alongside `cover + exec_summary` (the "summary/narrative wrap-up" phase pattern shared across all 4 multi-turn templates).

**Failure-rate-focused win, not wall-time-focused.** Phase3 (LBO, 220s) remains the wall-time gate. Avenue B Phase 1's structural prediction — smaller per-phase LLM context → fewer model errors — to be empirically validated by ≥30 days of post-flag-flip `claude_xlsx_render_phase_failures_total{phase="phase4",template_id="full-deal-workbook"}` Prometheus data.

Avenue B Phase 2 (LBO sheet decomposition — splitting `phase3 = ['lbo']` across containers) deferred — architecturally larger; data-gated.

See: `docs/pending-updates/excel-code-execution-phase9-plan.md` §13 for the full rebalance specification; `CHANGELOG.md` `[Unreleased]` for the deploy-time operator note (per-phase Prometheus history semantic shift at the deploy boundary).
Original file line number Diff line number Diff line change
Expand Up @@ -558,3 +558,68 @@ These are deferred deliberately — Phase 9 addresses the template-budget gap ex
---

**End of Phase 9 plan.** Total scope: ~770 LOC + plan doc + ~$10 in live sandbox testing. Effort: 1.5 days. Single atomic commit per operator preference. Awaiting approval to implement.

---

## §13 — Avenue B Phase 1 (Issue #100, shipped 2026-05-15)

Sensitivity isolation in `full-deal-workbook` via sheet redistribution within the existing 5-phase shape. **No phase-count change** (still 5 phases); preserves `XLSX_PHASE_CONCURRENCY=5` budget.

### Pre-Avenue-B shape (Phase 9.6 original)

```js
phase4: { sheets: ['sensitivity', 'risk_register'], estimated_seconds: 220 }
phase5: { sheets: ['cover', 'exec_summary'], estimated_seconds: 130 }
```

### Post-Avenue-B Phase 1 shape

```js
phase4: { sheets: ['sensitivity'], estimated_seconds: 180 }
phase5: { sheets: ['cover', 'exec_summary', 'risk_register'], estimated_seconds: 160 }
```

### Rationale

`valuation-only.js:75-79` already runs `phase3: { sheets: ['sensitivity'], estimated_seconds: 180 }` alone — the architecturally-validated pattern for the openpyxl data-table API (`formula1`/`formula2` cell-array constructs for 2D sensitivity heatmaps). The pre-Avenue-B `phase4` combined this with `risk_register` (a narrative table), forcing the model to juggle two schemas in one container. Avenue B Phase 1 brings `full-deal-workbook` into alignment with the valuation-only precedent.

`risk_register` moves to `phase5` (joining `cover + exec_summary`) — the "summary/narrative wrap-up" phase pattern shared across all 4 multi-turn templates' final phases.

### What Avenue B Phase 1 IS and IS NOT

| | |
|---|---|
| **IS** | Failure-rate-focused win (smaller per-phase LLM context → less for the model to track → fewer formula errors in the sensitivity sheet) |
| **IS NOT** | Wall-time-focused (phase3 LBO at 220s still gates; phase4's earlier completion is moot at the wall-time level) |
| **IS NOT** | An LBO sheet decomposition (Avenue B Phase 2 — deferred; data-gated by ≥30 days of post-flag-flip `claude_xlsx_render_phase_duration_seconds_bucket{phase="phase3"}` metrics) |

### Validation

- Unit suite: 185 → **197** (185 baseline + 12 new from `testT30_FullDealWorkbookSensitivityIsolation`)
- L1 pre-flight baseline (`main` HEAD `a1e5fd45`): 185/0/2 ✓
- L2 smoke (syntax + Zod template-config inspection): clean, prints expected 5-phase shape ✓
- L3 integration (full suite, post-edit): 197/0/2 ✓ — T17 dispatches all 5 phases dynamically, T25 named-range invariant preserved, T26 generated-columns invariant preserved, T27-T29 async-202 endpoint unaffected
- L4 live efficacy (real Anthropic sandbox): single render to confirm correctness; failure-rate-reduction efficacy claim is **structural** (validated by Avenue D's revert + valuation-only.js precedent), to be empirically confirmed post-flag-flip

### Operator note — Prometheus history semantic shift

Per-phase metrics for `phase4` and `phase5` continue emitting on the same time-series, but the sheet content behind those labels shifts at the deploy boundary:
- `phase4`: ~220s P95 (sensitivity+risk_register) → ~180s P95 (sensitivity only) — **step-down**
- `phase5`: ~130s P95 (cover+exec_summary) → ~160s P95 (+risk_register) — **step-up**
- Total wall time: unchanged (gated by phase3 LBO)

Documented in `CHANGELOG.md` `[Unreleased]`.

### Files changed

- `src/config/xlsxTemplates/full-deal-workbook.js` — phaseSplit sheet redistribution + comment cleanup (Phase 9.6 → Phase 9.6 + Avenue B Phase 1; "sequential" → "parallel" reflecting post-Gate-2 reality)
- `test/sdk/xlsx-renderer-integration.test.js` — new `testT30_FullDealWorkbookSensitivityIsolation` (12 assertions)
- `CHANGELOG.md` — `[Unreleased]` entry under "Changed" with operator notes
- `docs/pending-updates/excel-code-execution-gate2-analysis.md` — post-Gate-2 follow-up note
- `docs/pending-updates/excel-code-execution-phase9-plan.md` — this §13 addendum

**No** schema changes, **no** migration, **no** orchestrator/PHASE_ORDER changes, **no** skill changes, **no** frontend changes, **no** alert.yml changes.

### Avenue B Phase 2 (deferred)

LBO sheet decomposition — splitting `phase3 = ['lbo']` across containers to attack the 220s wall-time bottleneck. Architecturally larger (cross-phase data passing for capital structure → debt schedule → returns waterfall references). Data-gated: revisit after ≥30 days of post-flag-flip Prometheus history confirms phase3 is the dominant remaining bottleneck AND identifies a clean intra-sheet split point.
Original file line number Diff line number Diff line change
Expand Up @@ -57,11 +57,22 @@ export const def = {
citationDiscipline: XLSX_TEMPLATE_BASE.SOURCES_SHEET_SPEC,
cellColoring: XLSX_TEMPLATE_BASE.CELL_COLORING,

// Phase 9.6: 9-sheet template split across 5 sequential 5-min containers.
// Largest template — phase 9.5 retest had phase2 (dcf + comps + lbo +
// sensitivity) timing out at 300s after Turn 2 retry attempt. Splitting
// into smaller phases gives each financial model its own container.
// 5 phases × 5 min = 25 min worst-case wall, ~$0.75 sandbox cost.
// Phase 9.6: 9-sheet template split across 5 PARALLEL containers
// (Gate 2 parallelized the fan-out, commit b8baddfe). Largest template —
// phase 9.5 retest had phase2 (dcf + comps + lbo + sensitivity) timing out
// at 300s after Turn 2 retry attempt. Splitting into smaller phases gives
// each financial model its own container.
// Wall time = max(phase_durations) ≈ 220s (gated by phase3 LBO);
// ~$0.75 sandbox cost (5 parallel containers × 5-min budget cap per Anthropic).
//
// Avenue B Phase 1 (Issue #100, 2026-05-15): sensitivity isolated into its
// own phase4 to match valuation-only.js's precedent (one of the trickiest
// model tasks — openpyxl data-table API with formula1/formula2 cell-array
// constructs — gets a focused context window). risk_register joins phase5
// alongside cover + exec_summary, aligning with the "summary/narrative
// wrap-up" pattern seen in all 4 multi-turn templates' final phases.
// Phase count UNCHANGED at 5; XLSX_PHASE_CONCURRENCY=5 budget preserved.
// Failure-rate-focused win, not wall-time-focused (phase3 LBO still gates).
phaseSplit: {
phase1: {
sheets: ['assumptions', 'sources'],
Expand All @@ -79,14 +90,14 @@ export const def = {
estimated_seconds: 220,
},
phase4: {
sheets: ['sensitivity', 'risk_register'],
label: '2D sensitivity heatmaps + risk register',
estimated_seconds: 220,
sheets: ['sensitivity'],
label: '2D sensitivity heatmaps (matches valuation-only.js precedent — sensitivity alone)',
estimated_seconds: 180,
},
phase5: {
sheets: ['cover', 'exec_summary'],
label: 'Cover + executive summary + comprehensive audit',
estimated_seconds: 130,
sheets: ['cover', 'exec_summary', 'risk_register'],
label: 'Cover + executive summary + risk register + comprehensive audit',
estimated_seconds: 160,
},
},
};
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1604,6 +1604,74 @@ async function testT29_RenderForSessionAdoptsPreCreatedRow() {
await seeded.cleanup();
}

async function testT30_FullDealWorkbookSensitivityIsolation() {
console.log('\n─── T30: Issue #100 Avenue B Phase 1 — full-deal-workbook sensitivity isolation ───\n');
// Static template validation — no DB, no live render. Asserts that the
// Avenue B Phase 1 rebalance landed correctly:
// - 5 phases declared (unchanged from pre-rebalance)
// - phase4 contains sensitivity ALONE (matches valuation-only.js precedent)
// - phase5 contains cover + exec_summary + risk_register (summary phase)
// - all 9 sheets covered exactly once (no missing, no duplicates)
// - all estimated_seconds in [60, 300] range (sanity bounds)
const { XLSX_TEMPLATES } = await import('../../src/config/xlsxTemplates/index.js');
const t = XLSX_TEMPLATES['full-deal-workbook'];

// 1. phaseSplit declared
assert(t.phaseSplit !== undefined, `T30: full-deal-workbook has phaseSplit declared`);

// 2. Exactly 5 phases (Avenue B Phase 1 preserves phase count to stay within
// XLSX_PHASE_CONCURRENCY=5 budget — see multiTurnOrchestrator.js:86)
const phaseKeys = Object.keys(t.phaseSplit).filter((k) => k.startsWith('phase')).sort();
assert(
phaseKeys.length === 5,
`T30 (#100): exactly 5 phases declared (got ${phaseKeys.length}: ${phaseKeys.join(',')})`,
);

// 3. Sequence is phase1..phase5 (no non-numeric stragglers like phase3b)
assert(
phaseKeys.join(',') === 'phase1,phase2,phase3,phase4,phase5',
`T30 (#100): phase sequence is phase1..phase5 (got ${phaseKeys.join(',')})`,
);

// 4. phase4 = sensitivity ALONE (the core rebalance — matches valuation-only.js)
assert(
JSON.stringify(t.phaseSplit.phase4.sheets) === JSON.stringify(['sensitivity']),
`T30 (#100): phase4 is sensitivity-only — matches valuation-only.js precedent (got ${JSON.stringify(t.phaseSplit.phase4.sheets)})`,
);

// 5. phase5 contains cover + exec_summary + risk_register (summary/wrap-up phase)
const phase5Sheets = [...t.phaseSplit.phase5.sheets].sort();
assert(
JSON.stringify(phase5Sheets) === JSON.stringify(['cover', 'exec_summary', 'risk_register']),
`T30 (#100): phase5 contains cover + exec_summary + risk_register (got ${JSON.stringify(phase5Sheets)})`,
);

// 6. All 9 sheets covered exactly once. Pre-rebalance sheet set must equal
// post-rebalance sheet set (no sheet added/removed in Avenue B Phase 1).
const allAssigned = phaseKeys.flatMap((k) => t.phaseSplit[k].sheets);
const uniqueAssigned = new Set(allAssigned);
const expectedSheets = ['assumptions', 'sources', 'dcf', 'comps', 'lbo', 'sensitivity', 'risk_register', 'cover', 'exec_summary'];
assert(
allAssigned.length === 9 && uniqueAssigned.size === 9,
`T30 (#100): all 9 sheets covered exactly once (got ${allAssigned.length} assignments, ${uniqueAssigned.size} unique)`,
);
const missing = expectedSheets.filter((s) => !uniqueAssigned.has(s));
assert(
missing.length === 0,
`T30 (#100): no expected sheet missing (missing: ${JSON.stringify(missing)})`,
);

// 7. All estimated_seconds in [60, 300] range (sanity bounds — catches typos
// like an accidental 1800)
for (const phaseKey of phaseKeys) {
const sec = t.phaseSplit[phaseKey].estimated_seconds;
assert(
typeof sec === 'number' && sec >= 60 && sec <= 300,
`T30 (#100): ${phaseKey}.estimated_seconds in [60, 300] (got ${sec})`,
);
}
}

async function testT24_CitationSeededSpec() {
console.log('\n─── T24: seedTestSession citation opts → matcher-ready spec (Issue 2) ───\n');
const { dbAvailable, seedTestSession, SAMPLE_CITATION_SEED, buildMinimalWorkbook } =
Expand Down Expand Up @@ -1690,6 +1758,7 @@ async function main() {
await testT27_AsyncEndpointPendingInsert();
await testT28_StatusEndpointQueryShape();
await testT29_RenderForSessionAdoptsPreCreatedRow();
await testT30_FullDealWorkbookSensitivityIsolation();

console.log('\n=== Summary ===');
console.log(`${PASS} Passed: ${passed}`);
Expand Down