Summary
Replace the Agent SDK's built-in WebFetch tool with a custom MCP tool (fetch_document) that tries direct HTTP fetch first, then falls back to Exa /contents on 403/timeout/empty responses. Eliminates the entire class of SEC.gov 403 failures observed in trial runs.
Plan: docs/pending-updates/websearch-conversion.md (v1.2.0)
Problem
- SEC.gov and other government sites block direct
fetch() from cloud IPs → 403
- SDK
WebFetch retries the same URL, gets 403 again → content lost
- Exa
/contents already crawls these URLs successfully (proven by 27 hybrid clients)
Architecture
fetch_document(url, prompt)
→ Phase 1: Direct HTTP fetch (free, fast)
→ Phase 2: Exa /contents fallback (on 403/timeout/empty)
→ _hybrid_metadata: { source, fallback_reason, confidence }
Key distinction: fetch_document uses Exa /contents (extract text from known URL), NOT Exa /search (find URLs by query). A "direct hit" means raw fetch() with zero Exa involvement.
Scope
- 10 files modified, 1 new file, 2 new test files
- Feature-flagged:
HYBRID_WEBFETCH=true (default off, zero behavior change when off)
- Rollback: single env var flip, zero code revert
Implementation Roadmap
Phase A: Wire the New Tool (blocking prerequisite)
Phase B: Replace WebFetch (requires Phase A complete)
Phase C: Observability
Phase D: Testing & Frontend
Phase E: Rollback Verification
Related
🤖 Generated with Claude Code
Summary
Replace the Agent SDK's built-in
WebFetchtool with a custom MCP tool (fetch_document) that tries direct HTTP fetch first, then falls back to Exa/contentson 403/timeout/empty responses. Eliminates the entire class of SEC.gov 403 failures observed in trial runs.Plan:
docs/pending-updates/websearch-conversion.md(v1.2.0)Problem
fetch()from cloud IPs → 403WebFetchretries the same URL, gets 403 again → content lost/contentsalready crawls these URLs successfully (proven by 27 hybrid clients)Architecture
Key distinction:
fetch_documentuses Exa/contents(extract text from known URL), NOT Exa/search(find URLs by query). A "direct hit" means rawfetch()with zero Exa involvement.Scope
HYBRID_WEBFETCH=true(default off, zero behavior change when off)Implementation Roadmap
Phase A: Wire the New Tool (blocking prerequisite)
DirectFetchHybridClient.js(direct fetch → Exa/contentsfallback)HYBRID_WEBFETCHfeature flag (featureFlags.js)fetch_documenttool definition (toolDefinitions.js)toolImplementations.js)direct-fetchdomain mapping (domainMcpServers.js— 20 research subagents)claude-sdk-server.js)Phase B: Replace WebFetch (requires Phase A complete)
STANDARD_TOOLS(legalSubagents.js)STANDARD_TOOLS(_standardTools.js)ORCHESTRATOR_ALLOWED_TOOLSPhase C: Observability
_hybrid_metadataextraction (sdkHooks.js)Phase D: Testing & Frontend
app.js— green/blue/red tags)HYBRID_WEBFETCH=trueand full validationPhase E: Rollback Verification
HYBRID_WEBFETCH=falserestores all original behaviorRelated
🤖 Generated with Claude Code