Skip to content

WebFetch → Hybrid fetch_document Conversion (v1.2.0) #38

Description

@Number531

Summary

Replace the Agent SDK's built-in WebFetch tool with a custom MCP tool (fetch_document) that tries direct HTTP fetch first, then falls back to Exa /contents on 403/timeout/empty responses. Eliminates the entire class of SEC.gov 403 failures observed in trial runs.

Plan: docs/pending-updates/websearch-conversion.md (v1.2.0)

Problem

  • SEC.gov and other government sites block direct fetch() from cloud IPs → 403
  • SDK WebFetch retries the same URL, gets 403 again → content lost
  • Exa /contents already crawls these URLs successfully (proven by 27 hybrid clients)

Architecture

fetch_document(url, prompt)
  → Phase 1: Direct HTTP fetch (free, fast)
  → Phase 2: Exa /contents fallback (on 403/timeout/empty)
  → _hybrid_metadata: { source, fallback_reason, confidence }

Key distinction: fetch_document uses Exa /contents (extract text from known URL), NOT Exa /search (find URLs by query). A "direct hit" means raw fetch() with zero Exa involvement.

Scope

  • 10 files modified, 1 new file, 2 new test files
  • Feature-flagged: HYBRID_WEBFETCH=true (default off, zero behavior change when off)
  • Rollback: single env var flip, zero code revert

Implementation Roadmap

Phase A: Wire the New Tool (blocking prerequisite)

  • A1. Create DirectFetchHybridClient.js (direct fetch → Exa /contents fallback)
  • A2. Add HYBRID_WEBFETCH feature flag (featureFlags.js)
  • A3. Add fetch_document tool definition (toolDefinitions.js)
  • A4. Register handler (toolImplementations.js)
  • A5. Add direct-fetch domain mapping (domainMcpServers.js — 20 research subagents)
  • A6. Instantiate client in server (claude-sdk-server.js)
  • A7. Smoke test — verify tool accessible (147 tools)

Phase B: Replace WebFetch (requires Phase A complete)

  • B1. Update legacy STANDARD_TOOLS (legalSubagents.js)
  • B2. Update modular STANDARD_TOOLS (_standardTools.js)
  • B3. Update ORCHESTRATOR_ALLOWED_TOOLS
  • B4. Update MCP fallback instructions

Phase C: Observability

  • C1. PostToolUse _hybrid_metadata extraction (sdkHooks.js)
  • C2. PostToolUseFailure dual-failure extraction
  • C3. Verify hookSSEBridge auto-forwards new fields

Phase D: Testing & Frontend

  • D1. Unit tests (~29 assertions)
  • D1b. Hook observability tests
  • D2. Domain MCP server tests (28 → 29)
  • D3. Regression (existing suites unchanged)
  • D4. Live integration test (SEC.gov, Wikipedia)
  • D5. Frontend timeline rendering (app.js — green/blue/red tags)
  • D6. Enable HYBRID_WEBFETCH=true and full validation

Phase E: Rollback Verification

  • E1. Confirm HYBRID_WEBFETCH=false restores all original behavior

Related

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestinfrastructureBackend/infrastructure changesroadmapPlanned feature on the project roadmap

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions