From 2ce981cd0cbfecd2030c3c63fba1240b9728f916 Mon Sep 17 00:00:00 2001
From: Number531 <120485065+Number531@users.noreply.github.com>
Date: Sat, 16 May 2026 00:15:19 -0400
Subject: [PATCH] docs: Skills' parallel code-execution path posture (Avenue A
 v2 deferral closure)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Closes the third explicit defer from PR #135 (Avenue A v2). An Explore-agent
investigation traced Skills' code-execution registration path at
src/server/legacyStreamHandler.js:78 and reached a decisive DOCUMENT-NO-CHANGE
verdict: Avenue A v2 deliberately does NOT apply to Skills.

═══════════════════════════════════════════════════════════════════════
RATIONALE
═══════════════════════════════════════════════════════════════════════

The codebase has TWO independent code-execution surfaces against Anthropic
Messages API:

  Bridge        (src/tools/codeExecutionBridge.js:30)
  ───────────────────────────────────────────────────
    Tool:      code_execution_20260120
    Pattern:   Orchestrator-driven multi-turn with envelope contract
    Validate:  isCompleteXlsxB64, selectEnvelopeWithFallback
    Retry:     MAX_TURNS=3 corrective loop
    Avenue A v2: APPLIED (output_config + json_schema enforcement)

  Skills (native)  (src/server/legacyStreamHandler.js:78)
  ───────────────────────────────────────────────────────
    Tool:      code_execution_20250825
    Pattern:   Single-turn streaming pass-through to frontend
    Validate:  None — free-form text + tool_use_result blocks
    Retry:     None (terminal failure on malformed output)
    Avenue A v2: NOT APPLICABLE

Skills doesn't exhibit the envelope-miss failure mode Avenue A v2 fixed because
Skills HAS NO ENVELOPE CONTRACT to enforce. Skills output is free-form text
accumulated and streamed to caller; there's no schema, no validation, no retry.

═══════════════════════════════════════════════════════════════════════
WHY output_config ENFORCEMENT WOULD ACTIVELY HARM SKILLS
═══════════════════════════════════════════════════════════════════════

1. CONFLICT with existing outputFormatSchema logic at lines 112-115:
   output_config: { effort: 'high', ...(outputFormatSchema ? {format}: {}) }
   Adding ...(skills ? {format: skillsSchema} : {}) creates duplicate `format`
   key — JS uses the LAST one, silently overwriting user-requested structured
   output enforcement.

2. NO downstream parsed_output consumer:
   - Bridge reads response.parsed_output (codeExecutionBridge.js:extractResults)
   - legacyStreamHandler streams text + tool_use_result only — never inspects
     parsed_output
   - Structured output would be generated + billed in tokens but ignored

3. TRUNCATION RISK worse than xlsx's:
   - Skills outputs (PDF extracts, document parses, OCR) are often LARGE
   - Skills has NO secondary fallback channel (xlsx had stdout to fall back on)
   - Forced text-channel JSON enforcement could hit max_tokens cliff with NO
     architectural mitigation — re-introduces the L4 v1 failure mode

═══════════════════════════════════════════════════════════════════════
SKILLS IS ARCHITECTURAL DEAD CODE IN PROD
═══════════════════════════════════════════════════════════════════════

Four independent signals:

  1. SKILLS_ENABLED=false in flags.env:62 (production default)
  2. USE_AGENT_SDK=true in flags.env (production default) — routes /api/stream
     to handleAgentStream, NOT handleLegacyStream. Skills path never invoked
     in production even hypothetically.
  3. Architectural pivot commit 12000487: "Step 3 — absorb Anthropic xlsx
     Skill content into bridge prompt" — deliberately moved xlsx skill
     functionality INTO the bridge, replacing Skills-API approach
  4. Code comment in featureFlags.js:14-17 explicitly states "Custom skills
     disabled: Non-SDK-compliant format with no-op code"

═══════════════════════════════════════════════════════════════════════
DELIVERABLE
═══════════════════════════════════════════════════════════════════════

This PR is documentation-only:

  docs/code-execution-enhancements/anthropic-sdk-best-practices-research.md
    +§12 "Skills' parallel code-execution path: why Avenue A v2 does NOT
    apply (2026-05-16 investigation)" — includes bridge-vs-Skills output-
    contract comparison table for future reference

  CHANGELOG.md
    [Unreleased] entry under "Documented" — closes follow-up #3 with the
    architectural posture rationale

═══════════════════════════════════════════════════════════════════════
FUTURE DIRECTION
═══════════════════════════════════════════════════════════════════════

If custom skills become SDK-compliant in the future, re-evaluate under
Programmatic Tool Calling (PTC) using code_execution_20260120 +
allowed_callers. At that point, Avenue A v2-style enforcement could be
applied at that distinct surface. Until then, Skills remains a deprecated
pass-through; bridge remains the canonical structured-output surface.

═══════════════════════════════════════════════════════════════════════
STACKED ON PR #136
═══════════════════════════════════════════════════════════════════════

This branch is stacked on feature/codeexec-avenue-a-v2-telemetry-docs (PR
#136) so the §12 addition cleanly follows the §11 (empirical constraints)
content added there. When PR #136 merges to main, this PR will need to be
rebased onto main; the rebase is conflict-free since §12 is purely additive
to the end of the doc file.

VERIFICATION:
  Suite: 202/0/2 (PR #136's baseline maintained; no code change in this PR)

ROLLBACK: trivial — `git revert <merge-sha>` removes the doc additions.
No code, schema, or behavior implications.

Plan: /Users/ej/.claude/plans/glittery-toasting-stardust.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 super-legal-mcp-refactored/CHANGELOG.md       | 15 ++++
 .../anthropic-sdk-best-practices-research.md  | 87 +++++++++++++++++++
 2 files changed, 102 insertions(+)

diff --git a/super-legal-mcp-refactored/CHANGELOG.md b/super-legal-mcp-refactored/CHANGELOG.md
index 1e53d4d74..12c063993 100644
--- a/super-legal-mcp-refactored/CHANGELOG.md
+++ b/super-legal-mcp-refactored/CHANGELOG.md
@@ -4,6 +4,21 @@ All notable changes to the Super Legal MCP Server are documented in this file.
 
 ## [Unreleased]
 
+### Documented — Skills' parallel code-execution path: Avenue A v2 deliberately NOT applied (PR forthcoming)
+
+Closes the third explicit defer from PR #135 (Avenue A v2). Explore-agent investigation traced `src/server/legacyStreamHandler.js:78` (which registers `code_execution_20250825` directly via Anthropic SDK tool registration when `SKILLS_ENABLED=true`) and reached a **DOCUMENT-NO-CHANGE** verdict.
+
+**Why no code change**:
+- Skills has NO envelope contract — output is free-form text + tool-result pass-through to frontend
+- `legacyStreamHandler.js` has NO retry logic, NO envelope validation, NO downstream `parsed_output` consumer
+- Adding `output_config` would CONFLICT with the existing `outputFormatSchema` logic (both set `format`)
+- Skills is in deprecation track: `SKILLS_ENABLED=false` (default), `USE_AGENT_SDK=true` (default — `legacyStreamHandler` isn't even invoked in prod), xlsx skill content already absorbed into bridge (commit `12000487`)
+- If forced, structured-output enforcement on Skills would re-introduce the L4 v1 b64-in-text truncation failure mode with NO secondary fallback channel (Skills has no stdout-equivalent like xlsx's bridge path)
+
+**Full investigation + architectural posture documented** in `docs/code-execution-enhancements/anthropic-sdk-best-practices-research.md` §12 — includes the bridge-vs-Skills output-contract comparison table for future reference.
+
+**Bottom line**: bridge is the canonical structured-output surface; Skills is deprecated pass-through. Avenue A v2 applies only to the bridge.
+
 ### Added — Avenue A v2: Structured-output enforcement on code-execution bridge (PR forthcoming)
 
 Eliminates the "missing envelope on turn 1 → corrective retry on turn 2" pattern observed in production renders (PR #134 L4 logs: 1-of-5 phases retried). Adds Anthropic's `output_config: { format: { type: 'json_schema', schema: {...} } }` enforcement on the model's text-block output, validated empirically against the `code_execution_20260120` tool + streaming + `pause_turn` continuations.
diff --git a/super-legal-mcp-refactored/docs/code-execution-enhancements/anthropic-sdk-best-practices-research.md b/super-legal-mcp-refactored/docs/code-execution-enhancements/anthropic-sdk-best-practices-research.md
index ccfad0b53..04afe2c57 100644
--- a/super-legal-mcp-refactored/docs/code-execution-enhancements/anthropic-sdk-best-practices-research.md
+++ b/super-legal-mcp-refactored/docs/code-execution-enhancements/anthropic-sdk-best-practices-research.md
@@ -641,3 +641,90 @@ sum by (envelope_source) (rate(claude_xlsx_render_turn1_envelope_success_total{s
 ```
 
 Target after `STRUCTURED_OUTPUT_ENFORCEMENT=true` deployment: `structured_output="on"` rate ≥ `structured_output="off"` baseline rate (validates Avenue A v2 doesn't regress retry behavior).
+
+---
+
+## §12 — Skills' parallel code-execution path: why Avenue A v2 does NOT apply (2026-05-16 investigation)
+
+This section closes the third explicit defer from PR #135 (Avenue A v2). An Explore-agent investigation traced Skills' code-execution registration path and reached a decisive **DOCUMENT-NO-CHANGE** verdict.
+
+### The two parallel code-execution paths in the codebase
+
+The codebase has **two independent surfaces** for code execution against Anthropic Messages API:
+
+| Surface | File | Tool | Use case |
+|---|---|---|---|
+| **Bridge** (Avenue A v2's target) | `src/tools/codeExecutionBridge.js:30` | `code_execution_20260120` | Orchestrator-driven multi-turn renders (xlsx renderer phases, MCP `run_python_analysis`, Agent SDK subagents). Has envelope contract + retry logic + structured-output enforcement via `output_config`. |
+| **Skills' native path** | `src/server/legacyStreamHandler.js:78` | `code_execution_20250825` | Pass-through streaming for Anthropic-managed Skills (pdf, xlsx, docx). NO envelope contract, NO retry logic, NO downstream parsing of structured output. |
+
+### Why Skills doesn't have the failure mode Avenue A v2 fixed
+
+Avenue A v2 targeted the bridge's "missing envelope on turn 1 → corrective retry on turn 2" pattern. That pattern exists in the bridge because:
+
+1. The bridge has an **envelope contract** (`{b64_xlsx, audit_results, ...}` for xlsx; `{success, analysis, data, ...}` for general)
+2. The bridge has **envelope validation** (`isCompleteXlsxB64()`, etc.)
+3. The bridge has **retry logic** (`MAX_TURNS=3` loop with corrective prompts)
+
+**Skills has NONE of these**:
+- Skills output is **free-form text + tool-result pass-through** to the frontend streaming endpoint
+- `legacyStreamHandler.js` does NOT validate envelope shape — text accumulates and streams to caller as-is
+- No retry loop exists; if Skills' model output is malformed, the failure is terminal (caller sees raw text including any malformed JSON)
+
+**Conclusion**: Skills doesn't exhibit the envelope-miss-retry pattern because **no envelope is expected** in the first place. There's nothing for Avenue A v2 to optimize.
+
+### Why `output_config` enforcement would actively HARM Skills
+
+Adding `output_config: { format: { type: 'json_schema', schema: ... } }` to the Skills' Messages API call (`legacyStreamHandler.js:108-115`) would:
+
+1. **Conflict with the existing `outputFormatSchema` logic** at the same lines:
+   ```js
+   output_config: {
+     effort: 'high',
+     ...(outputFormatSchema ? { format: outputFormatSchema } : {}),
+     ...(skills ? { format: someSkillsSchema } : {})  // ← DUPLICATE KEY: silently overwrites
+   }
+   ```
+   JavaScript would use the last `format` key only — user-requested structured output and Skills enforcement cannot coexist.
+
+2. **Produce `parsed_output` that no handler consumes**:
+   - The bridge reads `response.parsed_output` (codeExecutionBridge.js:`extractResults`)
+   - `legacyStreamHandler.js` does NOT read it — only streams `text` and `tool_use_result` blocks
+   - Structured output would be generated by the API + billed in tokens, but ignored
+
+3. **Risk truncation cliff worse than xlsx's**:
+   - Skills outputs (PDF text extracts, document parses, image OCR) are often LARGE
+   - Skills has NO secondary fallback channel like stdout (the xlsx case)
+   - Forced text-channel JSON enforcement could hit `max_tokens` mid-output with no fallback path
+   - The Avenue A v2 L4 v1 failure mode (b64 in text → truncation → corrupt) would re-emerge for Skills with no architectural mitigation available
+
+### Skills is in deprecation track
+
+Four independent signals confirm Skills' deprecated status:
+
+1. **Default flag state**: `SKILLS_ENABLED=false` in `flags.env:62` — production never activates the path
+2. **Architectural pivot**: commit `12000487` ("Step 3 — absorb Anthropic xlsx Skill content into bridge prompt") deliberately moved xlsx-specific skill content INTO the bridge, replacing the Skills-API approach
+3. **Code comment** (`src/config/featureFlags.js:14-17`): "Custom skills disabled: Non-SDK-compliant format with no-op code. Subagents handle all domain expertise. Managed skills (pdf, xlsx, docx) still work. Files preserved in /src/skills/customSkills/ for future SDK-compliant conversion."
+4. **Route gating**: `featureFlags.USE_AGENT_SDK=true` (default, in `flags.env`) routes the main `/api/stream` endpoint to `handleAgentStream` instead of `handleLegacyStream` — even if `SKILLS_ENABLED` were true, the legacy handler wouldn't be invoked in production
+
+### Recommendation
+
+**DOCUMENT-NO-CHANGE**: do not apply Avenue A v2 to Skills' parallel path.
+
+The right architectural posture is:
+- **Bridge** (`code_execution_20260120`) is the canonical code-execution surface; `output_config` enforcement via Avenue A v2 lives there
+- **Skills** (`code_execution_20250825`) is a deprecated pass-through path; no envelope enforcement needed because no envelope is expected
+- **Future direction**: if custom skills become SDK-compliant, re-evaluate under Programmatic Tool Calling (PTC) using `code_execution_20260120` + `allowed_callers` — at which point Avenue A v2-style enforcement could be applied at that distinct surface
+
+### Output contract comparison (canonical reference)
+
+| Aspect | Bridge | Skills |
+|---|---|---|
+| Envelope contract | `ENVELOPE_SCHEMA_XLSX` / `ENVELOPE_SCHEMA_GENERAL` | None |
+| Validation | `selectEnvelopeWithFallback`, `isCompleteXlsxB64` | None — pass-through |
+| Retry logic | `MAX_TURNS=3` corrective loop | None (terminal failure) |
+| Structured-output enforcement (Avenue A v2) | YES — `output_config` with json_schema | N/A — no contract to enforce |
+| Tool type | `code_execution_20260120` | `code_execution_20250825` |
+| Default state in prod | Active (`CODE_EXECUTION_BRIDGE=true`) | Inactive (`SKILLS_ENABLED=false` + `USE_AGENT_SDK=true`) |
+| Reads `response.parsed_output` | YES (Avenue A v2 path) | NO |
+
+**For structured, multi-turn analysis with envelope validation → use the bridge.** Skills are appropriate only for stateless, single-turn pass-through operations against Anthropic-managed domain expertise (pdf, xlsx, docx).