From 85cb369abeb510ec0cd1e38c5abbea10ecccf6fd Mon Sep 17 00:00:00 2001 From: Number531 <120485065+Number531@users.noreply.github.com> Date: Fri, 24 Apr 2026 14:34:18 -0400 Subject: [PATCH] docs(sdk): align stale 0.2.97 references with v6.3.0 bump MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Follow-up to #79 (SDK bump PR). The v6.3.0 CHANGELOG entry listed these two docs under "Updated — Documentation" but they weren't included in the main PR due to bundling concerns with unrelated pending changes. Landing them here so the CHANGELOG claim matches repo state. - docs/pending-updates/execution-gke-migration.md — single-line surgical update to the package.json version reference (line 1474). Unrelated pending content in the working tree was preserved and not bundled. - docs/citation-chat-router-sdk-alignment.md — previously untracked research doc (created 2026-03-17, last updated 2026-03-19). Committing as a new file with the version reference already bumped to 0.2.119. No functional changes. Docs-only. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../citation-chat-router-sdk-alignment.md | 541 ++++++++++++++++++ .../execution-gke-migration.md | 2 +- 2 files changed, 542 insertions(+), 1 deletion(-) create mode 100644 super-legal-mcp-refactored/docs/citation-chat-router-sdk-alignment.md diff --git a/super-legal-mcp-refactored/docs/citation-chat-router-sdk-alignment.md b/super-legal-mcp-refactored/docs/citation-chat-router-sdk-alignment.md new file mode 100644 index 000000000..02a017733 --- /dev/null +++ b/super-legal-mcp-refactored/docs/citation-chat-router-sdk-alignment.md @@ -0,0 +1,541 @@ +# Citation Chat Router — Claude Agent SDK Alignment Research + +**Created**: 2026-03-17 +**Updated**: 2026-03-19 (version alignment, endpoint path correction) +**Topic**: Agent SDK vs Messages API for interactive conversational chat +**Versions**: `@anthropic-ai/claude-agent-sdk` 0.2.119, `@anthropic-ai/sdk` ^0.86.1 +**Sources verified as of**: March 2026 + +--- + +## Table of Contents + +1. [Executive Summary](#1-executive-summary) +2. [Claude Agent SDK — Conversational Patterns](#2-claude-agent-sdk--conversational-patterns) +3. [Messages API Streaming — Current Recommended Path](#3-messages-api-streaming--current-recommended-path) +4. [Anthropic Cookbook Coverage](#4-anthropic-cookbook-coverage) +5. [Agent SDK vs Messages API — Official Distinction](#5-agent-sdk-vs-messages-api--official-distinction) +6. [Prompt Caching for Multi-Turn Citation Context](#6-prompt-caching-for-multi-turn-citation-context) +7. [Design Alignment Assessment for citationChatRouter.js](#7-design-alignment-assessment-for-citationchatrouterjs) +8. [Recommended Implementation Pattern](#8-recommended-implementation-pattern) +9. [References](#9-references) + +--- + +## 1. Executive Summary + +**The proposed citationChatRouter.js design — an Express router using the Messages API directly with streaming — is fully aligned with Anthropic's recommended patterns for interactive conversational use cases.** The Claude Agent SDK is explicitly not the right tool for this feature. The Messages API with `client.messages.stream()` (not `client.beta.messages.stream()`) is the current stable recommended path for multi-turn chat. Prompt caching with a 10K-token citation context block will yield approximately 90% cost reduction on cache hits and is straightforward to implement. + +Key findings: + +| Question | Answer | +|----------|--------| +| Should a chat router use Agent SDK? | No — Agent SDK is for autonomous tool-executing agents | +| Is `.stream()` still on beta path? | No — `client.messages.stream()` is the stable path for chat. Note: the existing `/api/stream` endpoint uses `anthropic.beta.messages.stream()` for `context_management` beta features, but the chat router does not need those. | +| Is there an official Agent SDK "chat mode"? | V2 session API (`unstable_v2_createSession`) exists but is marked unstable and has higher overhead than Messages API | +| Should citation context be prompt-cached? | Yes — inject as system prompt block with `cache_control: { type: "ephemeral" }` | +| Is Express SSE streaming compatible? | Yes — pipe `content_block_delta` text deltas directly to `res.write()` | + +--- + +## 2. Claude Agent SDK — Conversational Patterns + +**Source**: [TypeScript V2 interface preview](https://platform.claude.com/docs/en/agent-sdk/typescript-v2-preview), [Agent SDK overview](https://platform.claude.com/docs/en/agent-sdk/overview), [TypeScript SDK reference (V1)](https://platform.claude.com/docs/en/agent-sdk/typescript) +**Verified**: March 2026 + +### 2.1 What the Agent SDK Is Designed For + +The Agent SDK gives you Claude with **built-in autonomous tool execution**. The overview page opens with: + +> "Build AI agents that autonomously read files, run commands, search the web, edit code, and more." + +Every example in the overview is an autonomous task: "Find and fix the bug in auth.py", "Find all TODO comments", "Use the code-reviewer agent to review this codebase". The tool list (`Read`, `Write`, `Edit`, `Bash`, `Glob`, `Grep`, `WebSearch`, `WebFetch`, `AskUserQuestion`) makes this purpose explicit. + +### 2.2 The V1 `query()` Mode + +The primary V1 API is `query()`, an async generator over `SDKMessage` events. For multi-turn conversations in V1 you must either: + +- Use `options.resume` with a session ID to continue a previous session +- Pass `prompt` as an `AsyncIterable` and coordinate yields yourself + +The V1 multi-turn approach (feeding messages via `AsyncIterable`) is described as requiring "yield coordination" and the docs note it requires "restructuring" even for basic multi-turn use. This is not suited to a request/response HTTP router pattern where each turn is a separate HTTP request. + +### 2.3 The V2 Session API (Unstable Preview) + +The V2 API (`unstable_v2_createSession`, `unstable_v2_resumeSession`, `session.send()`, `session.stream()`) provides a cleaner multi-turn interface: + +```typescript +// V2 multi-turn (unstable preview) +import { unstable_v2_createSession } from "@anthropic-ai/claude-agent-sdk"; + +await using session = unstable_v2_createSession({ model: "claude-opus-4-6" }); + +await session.send("What is 5 + 3?"); +for await (const msg of session.stream()) { + if (msg.type === "assistant") { + const text = msg.message.content + .filter((block) => block.type === "text") + .map((block) => block.text) + .join(""); + console.log(text); + } +} +``` + +**Critical caveat**: The V2 page carries a prominent warning: + +> "The V2 interface is an **unstable preview**. APIs may change based on feedback before becoming stable." + +**Additional concern**: Session forking, some advanced streaming input patterns, and other V1 features are unavailable in V2. + +**Why V2 sessions are not the right choice for a citation chat router**: + +1. Sessions are **process-backed** — they spawn a Claude Code subprocess. This has substantially higher per-session overhead than an HTTP call to the Messages API. +2. The Agent SDK is designed for single-session long-running autonomous work. Using it for short request/response chat turns per HTTP request is architecturally mismatched. +3. The context injection model (citation chunks injected per-turn) is cleanly handled by the Messages API `system` parameter but requires awkward pre-turn `send()` calls in the Session API. +4. V2 is marked unstable with no stability commitment date. + +### 2.4 Official Guidance: Agent SDK vs Client SDK + +The Agent SDK overview page includes an explicit comparison tab titled "Agent SDK vs Client SDK": + +> **Client SDK**: You send prompts and implement tool execution yourself. Gives you direct API access. +> **Agent SDK**: Gives you Claude with built-in tool execution. Claude handles tools autonomously. + +The docs show this contrast: + +```typescript +// Client SDK (Messages API): You control the loop +let response = await client.messages.create({ ...params }); +while (response.stop_reason === "tool_use") { + const result = yourToolExecutor(response.tool_use); + response = await client.messages.create({ tool_result: result, ...params }); +} + +// Agent SDK: Claude handles tools autonomously +for await (const message of query({ prompt: "Fix the bug in auth.py" })) { + console.log(message); +} +``` + +A citation chat router has **no tools to execute**. It takes user questions, injects document context, and returns text. The Messages API is the correct abstraction. + +--- + +## 3. Messages API Streaming — Current Recommended Path + +**Source**: [Streaming Messages docs](https://platform.claude.com/docs/en/docs/build-with-claude/streaming) +**Verified**: March 2026 + +### 3.1 `.stream()` Has Graduated from Beta + +`client.messages.stream()` is documented on the main (non-beta) streaming page with examples using `client.messages.stream()` (not `client.beta.messages.stream()`). The canonical TypeScript example: + +```typescript +import Anthropic from "@anthropic-ai/sdk"; + +const client = new Anthropic(); + +await client.messages + .stream({ + messages: [{ role: "user", content: "Hello" }], + model: "claude-opus-4-6", + max_tokens: 1024 + }) + .on("text", (text) => { + console.log(text); + }); +``` + +This is `@anthropic-ai/sdk` 0.86.1's stable API surface. The `client.beta.messages.stream()` path is not mentioned in current docs for standard text streaming. The beta path is now used only for beta-gated features (e.g., code execution with `code_execution_20250825` tool — as noted in the project memory for `codeExecutionBridge.js`). + +**Confirmed in project memory (2026-02-25)**: `codeExecutionBridge.js` was migrated away from `client.beta.messages.create` to `client.messages.create` (standard path) precisely because "the beta path silently reused containers" and had undocumented behavior. The principle generalizes: prefer the standard path. + +### 3.2 SSE Event Flow + +Each stream emits these events in order: + +1. `message_start` — contains a `Message` object with empty `content` +2. For each content block: `content_block_start` → one or more `content_block_delta` → `content_block_stop` +3. One or more `message_delta` — top-level changes, cumulative token counts +4. `message_stop` + +Plus optional `ping` events (keep-alive) and `error` events. + +**Text delta wire format**: +``` +event: content_block_delta +data: {"type": "content_block_delta","index": 0,"delta": {"type": "text_delta", "text": "ello frien"}} +``` + +**Tool use delta wire format** (when tools are present): +``` +event: content_block_delta +data: {"type": "content_block_delta","index": 1,"delta": {"type": "input_json_delta","partial_json": "{\"location\": \"San Fra"}} +``` + +### 3.3 TypeScript Stream Helpers + +The SDK's `MessageStream` object (returned by `.stream()`) exposes these methods: + +| Method | Description | +|--------|-------------| +| `.on("text", cb)` | Fires on each text delta with the text string | +| `.on("message", cb)` | Fires on each raw `MessageStreamEvent` | +| `.on("finalMessage", cb)` | Fires once with complete accumulated `Message` | +| `.on("error", cb)` | Fires on stream errors | +| `.finalMessage()` | Promise resolving to complete `Message` | +| `.finalText()` | Promise resolving to complete text string | +| `.abort()` | Cancel the stream | + +For Express SSE delivery, use `.on("text", ...)` to pipe deltas and `.on("finalMessage", ...)` to send the `done` event: + +```typescript +// Express route handler pattern +res.setHeader("Content-Type", "text/event-stream"); +res.setHeader("Cache-Control", "no-cache"); +res.setHeader("Connection", "keep-alive"); + +const stream = client.messages.stream({ + model: "claude-sonnet-4-6", + max_tokens: 2048, + system: buildSystemPromptWithCitations(citations), + messages: conversationHistory +}); + +stream.on("text", (text) => { + res.write(`data: ${JSON.stringify({ type: "text_delta", text })}\n\n`); +}); + +stream.on("finalMessage", () => { + res.write(`data: ${JSON.stringify({ type: "done" })}\n\n`); + res.end(); +}); + +stream.on("error", (err) => { + res.write(`data: ${JSON.stringify({ type: "error", message: err.message })}\n\n`); + res.end(); +}); +``` + +--- + +## 4. Anthropic Cookbook Coverage + +**Source**: [anthropics/anthropic-cookbook](https://github.com/anthropics/anthropic-cookbook) +**Verified**: March 2026 (35.2k stars, MIT licensed) + +### 4.1 Available RAG Examples + +| Example | Path | Relevance | +|---------|------|-----------| +| RAG with Pinecone | `third_party/Pinecone/rag_using_pinecone.ipynb` | Vector DB retrieval + Claude Q&A | +| Wikipedia search | `third_party/Wikipedia/wikipedia-search-cookbook.ipynb` | Real-time knowledge retrieval | +| Embeddings (Voyage AI) | `third_party/VoyageAI/how_to_create_embeddings.md` | Generating embeddings for RAG | +| PDF upload + summarization | `misc/pdf_upload_summarization.ipynb` | Document Q&A | +| Customer service agent | `tool_use/customer_service_agent.ipynb` | Conversational tool-use agent | + +### 4.2 Gap: No Node.js / Express Examples + +> "While the code examples are primarily written in Python, the concepts can be adapted to any language that supports interaction with the Claude API." + +All cookbook examples are Jupyter notebooks (95.9% notebooks, 4.1% Python scripts). There are **no** Express/Node.js server integration examples in the official cookbook. This is a documentation gap but does not indicate incompatibility — it simply means the adapter pattern must be implemented from the streaming SSE documentation directly. + +### 4.3 RAG Pattern from Cookbook + +The Pinecone cookbook establishes the canonical pattern: + +1. Embed user query → vector similarity search → retrieve top-k chunks +2. Concatenate chunks into context block +3. Inject context into Claude system prompt or user message +4. Stream Claude's response + +For a citation-based Q&A router, the retrieval step is replaced by explicit citation injection (citations already known from the report), making this simpler than general RAG. + +--- + +## 5. Agent SDK vs Messages API — Official Distinction + +**Source**: [Agent SDK overview — "Compare the Agent SDK to other Claude tools"](https://platform.claude.com/docs/en/agent-sdk/overview) +**Verified**: March 2026 + +### 5.1 The Official Decision Matrix + +From the "Agent SDK vs Client SDK" comparison tab: + +| Use when | Tool choice | +|----------|-------------| +| Your use case is text generation: chatbots, summarization, classification, content generation, Q&A | **Messages API (Client SDK)** | +| You want autonomous tool execution (file read/write, shell commands, web search) without implementing a tool loop | **Agent SDK** | +| CI/CD pipelines, production automation, custom applications | Agent SDK | +| Interactive development, one-off tasks | CLI | + +The phrase "chatbots" and "Q&A" are explicitly listed as Messages API use cases. A citation Q&A router maps directly to this category. + +### 5.2 The Tool Loop Distinction + +The defining architectural difference: with the Messages API you **implement the tool loop yourself** if you need tools. With the Agent SDK, Claude handles tools autonomously. + +For a citation chat router: +- **No tools are needed** — citations are pre-fetched and injected as context +- The "tool loop" concern is irrelevant +- The Messages API is the minimal correct abstraction + +### 5.3 Sessions in Agent SDK vs Stateless HTTP + +The Agent SDK session model is designed for **long-running stateful sessions** where Claude accumulates file-system context across many turns of autonomous work. It stores session transcripts to disk (`listSessions()`, `getSessionMessages()` return filesystem-backed data). + +A chat router serving many concurrent users needs **stateless per-request handling** where conversation history is managed by the caller (client sends full history on each turn, or server stores it in Redis/memory). The Messages API is inherently stateless per-request, which is the correct model for an HTTP router. + +--- + +## 6. Prompt Caching for Multi-Turn Citation Context + +**Source**: [Prompt caching docs](https://platform.claude.com/docs/en/docs/build-with-claude/prompt-caching), [Automatic prompt caching announcement (Feb 2026)](https://medium.com/ai-software-engineer/anthropic-just-fixed-the-biggest-hidden-cost-in-ai-agents-using-automatic-prompt-caching-9d47c95903c5) +**Verified**: March 2026 + +### 6.1 Why This Matters for Citation Chat + +A citation context block of ~10K tokens, sent on every turn of a multi-turn conversation, accumulates significant cost. With Sonnet 4.6 at $3/MTok input: + +- Without caching, 10 turns × 10K context tokens = 100K tokens = $0.30 per conversation +- With caching (90% savings on cache hits): cache write at 1.25x ($0.0375 per 10K) + 9 cache reads at 0.1x ($0.003 each) = $0.064 total — a **79% reduction** + +For Sonnet 4.6 specifically, the minimum cacheable prompt length is **2,048 tokens**, meaning a 10K-token citation block comfortably qualifies. + +### 6.2 Workspace Isolation Update (February 5, 2026) + +**Important operational note**: As of February 5, 2026, prompt caching uses workspace-level isolation (not organization-level). Caches are isolated per workspace. This means cache hits only occur within the same API workspace — verify the citation chat router uses the same `ANTHROPIC_API_KEY` workspace as other services if cross-service cache sharing is desired. + +### 6.3 Two Caching Strategies for Citation Chat + +**Strategy A — Static citation block in system prompt (recommended)** + +When the same set of citations is used throughout a conversation, inject them once as a cached system prompt block: + +```typescript +// Build system prompt with cached citation context +function buildSystemWithCitations( + baseInstructions: string, + citationText: string // ~10K tokens of citation content +): Array<{ type: "text"; text: string; cache_control?: object }> { + return [ + { + type: "text", + text: baseInstructions + // No cache_control: instructions may change per request + }, + { + type: "text", + text: citationText, + cache_control: { type: "ephemeral" } // Cache the large citation block + } + ]; +} + +// Usage in Messages API call +const response = await client.messages.create({ + model: "claude-sonnet-4-6", + max_tokens: 2048, + system: buildSystemWithCitations(CHAT_SYSTEM_INSTRUCTIONS, citationBlock), + messages: conversationHistory +}); +``` + +Cache hit behavior: +- First call: `cache_creation_input_tokens` = ~10K, `cache_read_input_tokens` = 0 +- Subsequent calls (within 5 min): `cache_read_input_tokens` = ~10K, cost = 10% of base + +**Strategy B — Automatic caching for growing conversation history** + +Use top-level `cache_control` to progressively cache the conversation history as it grows: + +```typescript +const response = await client.messages.create({ + model: "claude-sonnet-4-6", + max_tokens: 2048, + cache_control: { type: "ephemeral" }, // Automatic: caches up to last cacheable block + system: citationSystemPrompt, + messages: conversationHistory +}); +``` + +Cache behavior advances automatically: +- Turn 1: `system + user[1]` cached +- Turn 2: `system + user[1] + asst[1] + user[2]` read from cache; `asst[2] + user[3]` written +- Turn 3: all prior turns read from cache; only new turn written + +**Strategy C — Combined (best for long conversations with static citations)** + +Cache the citation block explicitly, let automatic caching handle conversation history: + +```typescript +const response = await client.messages.create({ + model: "claude-sonnet-4-6", + max_tokens: 2048, + cache_control: { type: "ephemeral" }, // Auto-cache conversation turns + system: [ + { type: "text", text: CHAT_SYSTEM_INSTRUCTIONS }, + { + type: "text", + text: citationBlock, + cache_control: { type: "ephemeral" } // Explicit cache for citations + } + ], + messages: conversationHistory +}); +``` + +> Note: Maximum 4 cache breakpoints per request. Combined strategy uses 2, leaving headroom. + +### 6.4 Extended TTL for Citation Cache + +If a conversation spans more than 5 minutes (common for document review sessions), use the 1-hour TTL: + +```typescript +cache_control: { type: "ephemeral", ttl: "1h" } +// Cost: 2x base input price for cache write, still 0.1x for cache reads +// Break-even: if citation block is reused more than twice within an hour +``` + +### 6.5 Monitoring Cache Performance + +```typescript +// After each API call, log cache stats +const { usage } = response; +console.log({ + cache_created: usage.cache_creation_input_tokens, + cache_read: usage.cache_read_input_tokens, + uncached_input: usage.input_tokens, + total_effective_input: + (usage.cache_read_input_tokens ?? 0) + + (usage.cache_creation_input_tokens ?? 0) + + usage.input_tokens +}); +``` + +--- + +## 7. Design Alignment Assessment for citationChatRouter.js + +The proposed design (Express router, Messages API direct, streaming) maps cleanly to all official patterns: + +### 7.1 Aligned Design Decisions + +| Decision | Alignment | +|----------|-----------| +| Express router module (not Agent SDK) | Correct — chatbots and Q&A are explicitly Messages API use cases | +| `client.messages.stream()` (not beta path) | Correct — stable API, matches current documentation | +| SSE streaming to frontend | Correct — standard pattern, matches Anthropic's SSE event format | +| Stateless per-request (caller manages history) | Correct — matches Messages API's stateless design | +| Citation context injected as system prompt | Correct — matches the large-context caching cookbook pattern | +| No tool execution needed | Correct — citations are pre-injected, no tool loop required | + +### 7.2 Potential Gaps to Address + +| Gap | Recommendation | +|-----|----------------| +| No prompt caching on citation block | Add `cache_control: { type: "ephemeral" }` to citation system prompt block — immediate ~79% cost reduction on multi-turn conversations | +| Conversation history management | Define whether client sends full history each turn or server maintains session store. Full-history-from-client is simpler and stateless. | +| SSE keepalive for long responses | Anthropic sends `ping` events; ensure the Express SSE transport does not time out on long model turns | +| `message_stop` vs `done` event naming | Decide whether to forward Anthropic's `message_stop` event directly or wrap in a custom `done` event for frontend compatibility | +| Error handling in SSE stream | Anthropic may send `error` events mid-stream (e.g., `overloaded_error`). The `.on("error", ...)` handler must write an error SSE event before closing the connection. | + +### 7.3 What Does Not Apply + +| Agent SDK Feature | Applicability | +|-------------------|---------------| +| `query()` / V1 agent loop | Not needed — no autonomous tool execution | +| `unstable_v2_createSession()` | Not needed — adds subprocess overhead for no benefit in a chat router | +| Hooks (`PreToolUse`, `PostToolUse`, `SubagentStart`) | Not needed — citation chat has no tools or subagents | +| MCP servers | Not needed — citations are already retrieved before the chat turn | +| `settingSources`, `permissionMode` | Not applicable — Messages API, not Agent SDK | +| Session resume/fork | Not applicable — use stateless conversation history pattern | + +--- + +## 8. Recommended Implementation Pattern + +Based on the research, the minimal correct implementation for a citation chat router in this codebase: + +```typescript +// citationChatRouter.js — recommended structure +import Anthropic from "@anthropic-ai/sdk"; +import express from "express"; + +const router = express.Router(); +const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY }); + +const CITATION_CHAT_SYSTEM = `You are a legal research assistant... +Answer questions using only the provided citation context.`; + +router.post("/chat", async (req, res) => { + const { messages, citationContext } = req.body; + // citationContext: string of ~10K tokens from retrieved citations + + // Set SSE headers + res.setHeader("Content-Type", "text/event-stream"); + res.setHeader("Cache-Control", "no-cache"); + res.setHeader("Connection", "keep-alive"); + + try { + const stream = client.messages.stream({ + model: process.env.CHAT_MODEL ?? "claude-sonnet-4-6", + max_tokens: 2048, + system: [ + { type: "text", text: CITATION_CHAT_SYSTEM }, + { + type: "text", + text: citationContext, + cache_control: { type: "ephemeral" } // Cache the citation block + } + ], + messages // Full conversation history from client + }); + + stream.on("text", (text) => { + res.write(`data: ${JSON.stringify({ type: "text_delta", text })}\n\n`); + }); + + stream.on("finalMessage", (message) => { + res.write(`data: ${JSON.stringify({ type: "done", stop_reason: message.stop_reason })}\n\n`); + res.end(); + }); + + stream.on("error", (err) => { + res.write(`data: ${JSON.stringify({ type: "error", message: err.message })}\n\n`); + res.end(); + }); + + req.on("close", () => stream.abort()); + + } catch (err) { + if (!res.headersSent) { + res.status(500).json({ error: err.message }); + } + } +}); + +export default router; +``` + +Key implementation notes: + +1. `req.on("close", () => stream.abort())` — essential for aborting the Anthropic stream when the client disconnects (prevents runaway token spend) +2. The `cache_control` on `citationContext` block requires the context to exceed 2,048 tokens for Sonnet 4.6 to qualify for caching +3. `messages` array is passed in full from the client each turn — no server-side session storage required +4. Use `model: "claude-sonnet-4-6"` (the existing orchestrator model) for consistency; add a feature flag if the chat model should be independently configurable + +--- + +## 9. References + +- [Agent SDK overview](https://platform.claude.com/docs/en/agent-sdk/overview) — official "Compare the Agent SDK to other Claude tools" section, March 2026 +- [TypeScript V2 interface preview](https://platform.claude.com/docs/en/agent-sdk/typescript-v2-preview) — unstable session API documentation, March 2026 +- [TypeScript SDK reference (V1)](https://platform.claude.com/docs/en/agent-sdk/typescript) — `query()` function, `Options` type, `SDKMessage` types, March 2026 +- [Streaming Messages](https://platform.claude.com/docs/en/docs/build-with-claude/streaming) — SSE event format, `content_block_delta` wire format, TypeScript stream helpers, March 2026 +- [Prompt caching](https://platform.claude.com/docs/en/docs/build-with-claude/prompt-caching) — multi-turn caching strategy, workspace isolation update (Feb 5 2026), TTL options, pricing table, March 2026 +- [anthropic-cookbook on GitHub](https://github.com/anthropics/anthropic-cookbook) — RAG examples, Pinecone integration, embedding patterns (Python only), March 2026 +- [Anthropic Just Fixed the Biggest Hidden Cost in AI Agents (Medium, Feb 2026)](https://medium.com/ai-software-engineer/anthropic-just-fixed-the-biggest-hidden-cost-in-ai-agents-using-automatic-prompt-caching-9d47c95903c5) — automatic prompt caching announcement and workspace isolation change +- [Prompt caching — Anthropic blog](https://www.anthropic.com/news/prompt-caching) — original announcement, up to 90% cost reduction, 85% latency reduction +- [Streaming Messages API reference (docs.anthropic.com)](https://docs.anthropic.com/en/api/messages-streaming) — raw SSE protocol specification diff --git a/super-legal-mcp-refactored/docs/pending-updates/execution-gke-migration.md b/super-legal-mcp-refactored/docs/pending-updates/execution-gke-migration.md index ffae5ed56..e6b8d6e9e 100644 --- a/super-legal-mcp-refactored/docs/pending-updates/execution-gke-migration.md +++ b/super-legal-mcp-refactored/docs/pending-updates/execution-gke-migration.md @@ -1471,7 +1471,7 @@ ARMO's March 31, 2026 analysis ([armosec.io/blog/sandboxing-ai-agents-gke-worklo - `src/utils/circuitBreaker.js` — reusable for the new router client - `src/api-clients/BaseHybridClient.js` L48–57 (metrics), L377–425 (`parallelStrategy` pattern to copy for shadow mode) - `Dockerfile` — `node:22-slim`, has `python3` in APT deps, non-root `app` user -- `package.json` — `@anthropic-ai/sdk` ^0.86.1, `@google/genai` ^1.45.0, `@google/generative-ai` ^0.21.0, `@google-cloud/secret-manager` ^6.1.1, `@anthropic-ai/claude-agent-sdk` 0.2.97 +- `package.json` — `@anthropic-ai/sdk` ^0.86.1, `@google/genai` ^1.45.0, `@google/generative-ai` ^0.21.0, `@google-cloud/secret-manager` ^6.1.1, `@anthropic-ai/claude-agent-sdk` 0.2.119 ### External docs - [GKE Agent Sandbox](https://docs.cloud.google.com/kubernetes-engine/docs/how-to/agent-sandbox) — bootstrap & operation