Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions super-legal-mcp-refactored/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,21 @@ All notable changes to the Super Legal MCP Server are documented in this file.

## [Unreleased]

### Documented — Skills' parallel code-execution path: Avenue A v2 deliberately NOT applied (PR forthcoming)

Closes the third explicit defer from PR #135 (Avenue A v2). Explore-agent investigation traced `src/server/legacyStreamHandler.js:78` (which registers `code_execution_20250825` directly via Anthropic SDK tool registration when `SKILLS_ENABLED=true`) and reached a **DOCUMENT-NO-CHANGE** verdict.

**Why no code change**:
- Skills has NO envelope contract — output is free-form text + tool-result pass-through to frontend
- `legacyStreamHandler.js` has NO retry logic, NO envelope validation, NO downstream `parsed_output` consumer
- Adding `output_config` would CONFLICT with the existing `outputFormatSchema` logic (both set `format`)
- Skills is in deprecation track: `SKILLS_ENABLED=false` (default), `USE_AGENT_SDK=true` (default — `legacyStreamHandler` isn't even invoked in prod), xlsx skill content already absorbed into bridge (commit `12000487`)
- If forced, structured-output enforcement on Skills would re-introduce the L4 v1 b64-in-text truncation failure mode with NO secondary fallback channel (Skills has no stdout-equivalent like xlsx's bridge path)

**Full investigation + architectural posture documented** in `docs/code-execution-enhancements/anthropic-sdk-best-practices-research.md` §12 — includes the bridge-vs-Skills output-contract comparison table for future reference.

**Bottom line**: bridge is the canonical structured-output surface; Skills is deprecated pass-through. Avenue A v2 applies only to the bridge.

### Added — Avenue A v2: Structured-output enforcement on code-execution bridge (PR forthcoming)

Eliminates the "missing envelope on turn 1 → corrective retry on turn 2" pattern observed in production renders (PR #134 L4 logs: 1-of-5 phases retried). Adds Anthropic's `output_config: { format: { type: 'json_schema', schema: {...} } }` enforcement on the model's text-block output, validated empirically against the `code_execution_20260120` tool + streaming + `pause_turn` continuations.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -641,3 +641,90 @@ sum by (envelope_source) (rate(claude_xlsx_render_turn1_envelope_success_total{s
```

Target after `STRUCTURED_OUTPUT_ENFORCEMENT=true` deployment: `structured_output="on"` rate ≥ `structured_output="off"` baseline rate (validates Avenue A v2 doesn't regress retry behavior).

---

## §12 — Skills' parallel code-execution path: why Avenue A v2 does NOT apply (2026-05-16 investigation)

This section closes the third explicit defer from PR #135 (Avenue A v2). An Explore-agent investigation traced Skills' code-execution registration path and reached a decisive **DOCUMENT-NO-CHANGE** verdict.

### The two parallel code-execution paths in the codebase

The codebase has **two independent surfaces** for code execution against Anthropic Messages API:

| Surface | File | Tool | Use case |
|---|---|---|---|
| **Bridge** (Avenue A v2's target) | `src/tools/codeExecutionBridge.js:30` | `code_execution_20260120` | Orchestrator-driven multi-turn renders (xlsx renderer phases, MCP `run_python_analysis`, Agent SDK subagents). Has envelope contract + retry logic + structured-output enforcement via `output_config`. |
| **Skills' native path** | `src/server/legacyStreamHandler.js:78` | `code_execution_20250825` | Pass-through streaming for Anthropic-managed Skills (pdf, xlsx, docx). NO envelope contract, NO retry logic, NO downstream parsing of structured output. |

### Why Skills doesn't have the failure mode Avenue A v2 fixed

Avenue A v2 targeted the bridge's "missing envelope on turn 1 → corrective retry on turn 2" pattern. That pattern exists in the bridge because:

1. The bridge has an **envelope contract** (`{b64_xlsx, audit_results, ...}` for xlsx; `{success, analysis, data, ...}` for general)
2. The bridge has **envelope validation** (`isCompleteXlsxB64()`, etc.)
3. The bridge has **retry logic** (`MAX_TURNS=3` loop with corrective prompts)

**Skills has NONE of these**:
- Skills output is **free-form text + tool-result pass-through** to the frontend streaming endpoint
- `legacyStreamHandler.js` does NOT validate envelope shape — text accumulates and streams to caller as-is
- No retry loop exists; if Skills' model output is malformed, the failure is terminal (caller sees raw text including any malformed JSON)

**Conclusion**: Skills doesn't exhibit the envelope-miss-retry pattern because **no envelope is expected** in the first place. There's nothing for Avenue A v2 to optimize.

### Why `output_config` enforcement would actively HARM Skills

Adding `output_config: { format: { type: 'json_schema', schema: ... } }` to the Skills' Messages API call (`legacyStreamHandler.js:108-115`) would:

1. **Conflict with the existing `outputFormatSchema` logic** at the same lines:
```js
output_config: {
effort: 'high',
...(outputFormatSchema ? { format: outputFormatSchema } : {}),
...(skills ? { format: someSkillsSchema } : {}) // ← DUPLICATE KEY: silently overwrites
}
```
JavaScript would use the last `format` key only — user-requested structured output and Skills enforcement cannot coexist.

2. **Produce `parsed_output` that no handler consumes**:
- The bridge reads `response.parsed_output` (codeExecutionBridge.js:`extractResults`)
- `legacyStreamHandler.js` does NOT read it — only streams `text` and `tool_use_result` blocks
- Structured output would be generated by the API + billed in tokens, but ignored

3. **Risk truncation cliff worse than xlsx's**:
- Skills outputs (PDF text extracts, document parses, image OCR) are often LARGE
- Skills has NO secondary fallback channel like stdout (the xlsx case)
- Forced text-channel JSON enforcement could hit `max_tokens` mid-output with no fallback path
- The Avenue A v2 L4 v1 failure mode (b64 in text → truncation → corrupt) would re-emerge for Skills with no architectural mitigation available

### Skills is in deprecation track

Four independent signals confirm Skills' deprecated status:

1. **Default flag state**: `SKILLS_ENABLED=false` in `flags.env:62` — production never activates the path
2. **Architectural pivot**: commit `12000487` ("Step 3 — absorb Anthropic xlsx Skill content into bridge prompt") deliberately moved xlsx-specific skill content INTO the bridge, replacing the Skills-API approach
3. **Code comment** (`src/config/featureFlags.js:14-17`): "Custom skills disabled: Non-SDK-compliant format with no-op code. Subagents handle all domain expertise. Managed skills (pdf, xlsx, docx) still work. Files preserved in /src/skills/customSkills/ for future SDK-compliant conversion."
4. **Route gating**: `featureFlags.USE_AGENT_SDK=true` (default, in `flags.env`) routes the main `/api/stream` endpoint to `handleAgentStream` instead of `handleLegacyStream` — even if `SKILLS_ENABLED` were true, the legacy handler wouldn't be invoked in production

### Recommendation

**DOCUMENT-NO-CHANGE**: do not apply Avenue A v2 to Skills' parallel path.

The right architectural posture is:
- **Bridge** (`code_execution_20260120`) is the canonical code-execution surface; `output_config` enforcement via Avenue A v2 lives there
- **Skills** (`code_execution_20250825`) is a deprecated pass-through path; no envelope enforcement needed because no envelope is expected
- **Future direction**: if custom skills become SDK-compliant, re-evaluate under Programmatic Tool Calling (PTC) using `code_execution_20260120` + `allowed_callers` — at which point Avenue A v2-style enforcement could be applied at that distinct surface

### Output contract comparison (canonical reference)

| Aspect | Bridge | Skills |
|---|---|---|
| Envelope contract | `ENVELOPE_SCHEMA_XLSX` / `ENVELOPE_SCHEMA_GENERAL` | None |
| Validation | `selectEnvelopeWithFallback`, `isCompleteXlsxB64` | None — pass-through |
| Retry logic | `MAX_TURNS=3` corrective loop | None (terminal failure) |
| Structured-output enforcement (Avenue A v2) | YES — `output_config` with json_schema | N/A — no contract to enforce |
| Tool type | `code_execution_20260120` | `code_execution_20250825` |
| Default state in prod | Active (`CODE_EXECUTION_BRIDGE=true`) | Inactive (`SKILLS_ENABLED=false` + `USE_AGENT_SDK=true`) |
| Reads `response.parsed_output` | YES (Avenue A v2 path) | NO |

**For structured, multi-turn analysis with envelope validation → use the bridge.** Skills are appropriate only for stateless, single-turn pass-through operations against Anthropic-managed domain expertise (pdf, xlsx, docx).