From 2ce981cd0cbfecd2030c3c63fba1240b9728f916 Mon Sep 17 00:00:00 2001 From: Number531 <120485065+Number531@users.noreply.github.com> Date: Sat, 16 May 2026 00:15:19 -0400 Subject: [PATCH] docs: Skills' parallel code-execution path posture (Avenue A v2 deferral closure) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes the third explicit defer from PR #135 (Avenue A v2). An Explore-agent investigation traced Skills' code-execution registration path at src/server/legacyStreamHandler.js:78 and reached a decisive DOCUMENT-NO-CHANGE verdict: Avenue A v2 deliberately does NOT apply to Skills. ═══════════════════════════════════════════════════════════════════════ RATIONALE ═══════════════════════════════════════════════════════════════════════ The codebase has TWO independent code-execution surfaces against Anthropic Messages API: Bridge (src/tools/codeExecutionBridge.js:30) ─────────────────────────────────────────────────── Tool: code_execution_20260120 Pattern: Orchestrator-driven multi-turn with envelope contract Validate: isCompleteXlsxB64, selectEnvelopeWithFallback Retry: MAX_TURNS=3 corrective loop Avenue A v2: APPLIED (output_config + json_schema enforcement) Skills (native) (src/server/legacyStreamHandler.js:78) ─────────────────────────────────────────────────────── Tool: code_execution_20250825 Pattern: Single-turn streaming pass-through to frontend Validate: None — free-form text + tool_use_result blocks Retry: None (terminal failure on malformed output) Avenue A v2: NOT APPLICABLE Skills doesn't exhibit the envelope-miss failure mode Avenue A v2 fixed because Skills HAS NO ENVELOPE CONTRACT to enforce. Skills output is free-form text accumulated and streamed to caller; there's no schema, no validation, no retry. ═══════════════════════════════════════════════════════════════════════ WHY output_config ENFORCEMENT WOULD ACTIVELY HARM SKILLS ═══════════════════════════════════════════════════════════════════════ 1. CONFLICT with existing outputFormatSchema logic at lines 112-115: output_config: { effort: 'high', ...(outputFormatSchema ? {format}: {}) } Adding ...(skills ? {format: skillsSchema} : {}) creates duplicate `format` key — JS uses the LAST one, silently overwriting user-requested structured output enforcement. 2. NO downstream parsed_output consumer: - Bridge reads response.parsed_output (codeExecutionBridge.js:extractResults) - legacyStreamHandler streams text + tool_use_result only — never inspects parsed_output - Structured output would be generated + billed in tokens but ignored 3. TRUNCATION RISK worse than xlsx's: - Skills outputs (PDF extracts, document parses, OCR) are often LARGE - Skills has NO secondary fallback channel (xlsx had stdout to fall back on) - Forced text-channel JSON enforcement could hit max_tokens cliff with NO architectural mitigation — re-introduces the L4 v1 failure mode ═══════════════════════════════════════════════════════════════════════ SKILLS IS ARCHITECTURAL DEAD CODE IN PROD ═══════════════════════════════════════════════════════════════════════ Four independent signals: 1. SKILLS_ENABLED=false in flags.env:62 (production default) 2. USE_AGENT_SDK=true in flags.env (production default) — routes /api/stream to handleAgentStream, NOT handleLegacyStream. Skills path never invoked in production even hypothetically. 3. Architectural pivot commit 12000487: "Step 3 — absorb Anthropic xlsx Skill content into bridge prompt" — deliberately moved xlsx skill functionality INTO the bridge, replacing Skills-API approach 4. Code comment in featureFlags.js:14-17 explicitly states "Custom skills disabled: Non-SDK-compliant format with no-op code" ═══════════════════════════════════════════════════════════════════════ DELIVERABLE ═══════════════════════════════════════════════════════════════════════ This PR is documentation-only: docs/code-execution-enhancements/anthropic-sdk-best-practices-research.md +§12 "Skills' parallel code-execution path: why Avenue A v2 does NOT apply (2026-05-16 investigation)" — includes bridge-vs-Skills output- contract comparison table for future reference CHANGELOG.md [Unreleased] entry under "Documented" — closes follow-up #3 with the architectural posture rationale ═══════════════════════════════════════════════════════════════════════ FUTURE DIRECTION ═══════════════════════════════════════════════════════════════════════ If custom skills become SDK-compliant in the future, re-evaluate under Programmatic Tool Calling (PTC) using code_execution_20260120 + allowed_callers. At that point, Avenue A v2-style enforcement could be applied at that distinct surface. Until then, Skills remains a deprecated pass-through; bridge remains the canonical structured-output surface. ═══════════════════════════════════════════════════════════════════════ STACKED ON PR #136 ═══════════════════════════════════════════════════════════════════════ This branch is stacked on feature/codeexec-avenue-a-v2-telemetry-docs (PR #136) so the §12 addition cleanly follows the §11 (empirical constraints) content added there. When PR #136 merges to main, this PR will need to be rebased onto main; the rebase is conflict-free since §12 is purely additive to the end of the doc file. VERIFICATION: Suite: 202/0/2 (PR #136's baseline maintained; no code change in this PR) ROLLBACK: trivial — `git revert ` removes the doc additions. No code, schema, or behavior implications. Plan: /Users/ej/.claude/plans/glittery-toasting-stardust.md Co-Authored-By: Claude Opus 4.7 (1M context) --- super-legal-mcp-refactored/CHANGELOG.md | 15 ++++ .../anthropic-sdk-best-practices-research.md | 87 +++++++++++++++++++ 2 files changed, 102 insertions(+) diff --git a/super-legal-mcp-refactored/CHANGELOG.md b/super-legal-mcp-refactored/CHANGELOG.md index 1e53d4d74..12c063993 100644 --- a/super-legal-mcp-refactored/CHANGELOG.md +++ b/super-legal-mcp-refactored/CHANGELOG.md @@ -4,6 +4,21 @@ All notable changes to the Super Legal MCP Server are documented in this file. ## [Unreleased] +### Documented — Skills' parallel code-execution path: Avenue A v2 deliberately NOT applied (PR forthcoming) + +Closes the third explicit defer from PR #135 (Avenue A v2). Explore-agent investigation traced `src/server/legacyStreamHandler.js:78` (which registers `code_execution_20250825` directly via Anthropic SDK tool registration when `SKILLS_ENABLED=true`) and reached a **DOCUMENT-NO-CHANGE** verdict. + +**Why no code change**: +- Skills has NO envelope contract — output is free-form text + tool-result pass-through to frontend +- `legacyStreamHandler.js` has NO retry logic, NO envelope validation, NO downstream `parsed_output` consumer +- Adding `output_config` would CONFLICT with the existing `outputFormatSchema` logic (both set `format`) +- Skills is in deprecation track: `SKILLS_ENABLED=false` (default), `USE_AGENT_SDK=true` (default — `legacyStreamHandler` isn't even invoked in prod), xlsx skill content already absorbed into bridge (commit `12000487`) +- If forced, structured-output enforcement on Skills would re-introduce the L4 v1 b64-in-text truncation failure mode with NO secondary fallback channel (Skills has no stdout-equivalent like xlsx's bridge path) + +**Full investigation + architectural posture documented** in `docs/code-execution-enhancements/anthropic-sdk-best-practices-research.md` §12 — includes the bridge-vs-Skills output-contract comparison table for future reference. + +**Bottom line**: bridge is the canonical structured-output surface; Skills is deprecated pass-through. Avenue A v2 applies only to the bridge. + ### Added — Avenue A v2: Structured-output enforcement on code-execution bridge (PR forthcoming) Eliminates the "missing envelope on turn 1 → corrective retry on turn 2" pattern observed in production renders (PR #134 L4 logs: 1-of-5 phases retried). Adds Anthropic's `output_config: { format: { type: 'json_schema', schema: {...} } }` enforcement on the model's text-block output, validated empirically against the `code_execution_20260120` tool + streaming + `pause_turn` continuations. diff --git a/super-legal-mcp-refactored/docs/code-execution-enhancements/anthropic-sdk-best-practices-research.md b/super-legal-mcp-refactored/docs/code-execution-enhancements/anthropic-sdk-best-practices-research.md index ccfad0b53..04afe2c57 100644 --- a/super-legal-mcp-refactored/docs/code-execution-enhancements/anthropic-sdk-best-practices-research.md +++ b/super-legal-mcp-refactored/docs/code-execution-enhancements/anthropic-sdk-best-practices-research.md @@ -641,3 +641,90 @@ sum by (envelope_source) (rate(claude_xlsx_render_turn1_envelope_success_total{s ``` Target after `STRUCTURED_OUTPUT_ENFORCEMENT=true` deployment: `structured_output="on"` rate ≥ `structured_output="off"` baseline rate (validates Avenue A v2 doesn't regress retry behavior). + +--- + +## §12 — Skills' parallel code-execution path: why Avenue A v2 does NOT apply (2026-05-16 investigation) + +This section closes the third explicit defer from PR #135 (Avenue A v2). An Explore-agent investigation traced Skills' code-execution registration path and reached a decisive **DOCUMENT-NO-CHANGE** verdict. + +### The two parallel code-execution paths in the codebase + +The codebase has **two independent surfaces** for code execution against Anthropic Messages API: + +| Surface | File | Tool | Use case | +|---|---|---|---| +| **Bridge** (Avenue A v2's target) | `src/tools/codeExecutionBridge.js:30` | `code_execution_20260120` | Orchestrator-driven multi-turn renders (xlsx renderer phases, MCP `run_python_analysis`, Agent SDK subagents). Has envelope contract + retry logic + structured-output enforcement via `output_config`. | +| **Skills' native path** | `src/server/legacyStreamHandler.js:78` | `code_execution_20250825` | Pass-through streaming for Anthropic-managed Skills (pdf, xlsx, docx). NO envelope contract, NO retry logic, NO downstream parsing of structured output. | + +### Why Skills doesn't have the failure mode Avenue A v2 fixed + +Avenue A v2 targeted the bridge's "missing envelope on turn 1 → corrective retry on turn 2" pattern. That pattern exists in the bridge because: + +1. The bridge has an **envelope contract** (`{b64_xlsx, audit_results, ...}` for xlsx; `{success, analysis, data, ...}` for general) +2. The bridge has **envelope validation** (`isCompleteXlsxB64()`, etc.) +3. The bridge has **retry logic** (`MAX_TURNS=3` loop with corrective prompts) + +**Skills has NONE of these**: +- Skills output is **free-form text + tool-result pass-through** to the frontend streaming endpoint +- `legacyStreamHandler.js` does NOT validate envelope shape — text accumulates and streams to caller as-is +- No retry loop exists; if Skills' model output is malformed, the failure is terminal (caller sees raw text including any malformed JSON) + +**Conclusion**: Skills doesn't exhibit the envelope-miss-retry pattern because **no envelope is expected** in the first place. There's nothing for Avenue A v2 to optimize. + +### Why `output_config` enforcement would actively HARM Skills + +Adding `output_config: { format: { type: 'json_schema', schema: ... } }` to the Skills' Messages API call (`legacyStreamHandler.js:108-115`) would: + +1. **Conflict with the existing `outputFormatSchema` logic** at the same lines: + ```js + output_config: { + effort: 'high', + ...(outputFormatSchema ? { format: outputFormatSchema } : {}), + ...(skills ? { format: someSkillsSchema } : {}) // ← DUPLICATE KEY: silently overwrites + } + ``` + JavaScript would use the last `format` key only — user-requested structured output and Skills enforcement cannot coexist. + +2. **Produce `parsed_output` that no handler consumes**: + - The bridge reads `response.parsed_output` (codeExecutionBridge.js:`extractResults`) + - `legacyStreamHandler.js` does NOT read it — only streams `text` and `tool_use_result` blocks + - Structured output would be generated by the API + billed in tokens, but ignored + +3. **Risk truncation cliff worse than xlsx's**: + - Skills outputs (PDF text extracts, document parses, image OCR) are often LARGE + - Skills has NO secondary fallback channel like stdout (the xlsx case) + - Forced text-channel JSON enforcement could hit `max_tokens` mid-output with no fallback path + - The Avenue A v2 L4 v1 failure mode (b64 in text → truncation → corrupt) would re-emerge for Skills with no architectural mitigation available + +### Skills is in deprecation track + +Four independent signals confirm Skills' deprecated status: + +1. **Default flag state**: `SKILLS_ENABLED=false` in `flags.env:62` — production never activates the path +2. **Architectural pivot**: commit `12000487` ("Step 3 — absorb Anthropic xlsx Skill content into bridge prompt") deliberately moved xlsx-specific skill content INTO the bridge, replacing the Skills-API approach +3. **Code comment** (`src/config/featureFlags.js:14-17`): "Custom skills disabled: Non-SDK-compliant format with no-op code. Subagents handle all domain expertise. Managed skills (pdf, xlsx, docx) still work. Files preserved in /src/skills/customSkills/ for future SDK-compliant conversion." +4. **Route gating**: `featureFlags.USE_AGENT_SDK=true` (default, in `flags.env`) routes the main `/api/stream` endpoint to `handleAgentStream` instead of `handleLegacyStream` — even if `SKILLS_ENABLED` were true, the legacy handler wouldn't be invoked in production + +### Recommendation + +**DOCUMENT-NO-CHANGE**: do not apply Avenue A v2 to Skills' parallel path. + +The right architectural posture is: +- **Bridge** (`code_execution_20260120`) is the canonical code-execution surface; `output_config` enforcement via Avenue A v2 lives there +- **Skills** (`code_execution_20250825`) is a deprecated pass-through path; no envelope enforcement needed because no envelope is expected +- **Future direction**: if custom skills become SDK-compliant, re-evaluate under Programmatic Tool Calling (PTC) using `code_execution_20260120` + `allowed_callers` — at which point Avenue A v2-style enforcement could be applied at that distinct surface + +### Output contract comparison (canonical reference) + +| Aspect | Bridge | Skills | +|---|---|---| +| Envelope contract | `ENVELOPE_SCHEMA_XLSX` / `ENVELOPE_SCHEMA_GENERAL` | None | +| Validation | `selectEnvelopeWithFallback`, `isCompleteXlsxB64` | None — pass-through | +| Retry logic | `MAX_TURNS=3` corrective loop | None (terminal failure) | +| Structured-output enforcement (Avenue A v2) | YES — `output_config` with json_schema | N/A — no contract to enforce | +| Tool type | `code_execution_20260120` | `code_execution_20250825` | +| Default state in prod | Active (`CODE_EXECUTION_BRIDGE=true`) | Inactive (`SKILLS_ENABLED=false` + `USE_AGENT_SDK=true`) | +| Reads `response.parsed_output` | YES (Avenue A v2 path) | NO | + +**For structured, multi-turn analysis with envelope validation → use the bridge.** Skills are appropriate only for stateless, single-turn pass-through operations against Anthropic-managed domain expertise (pdf, xlsx, docx).