fastCRW — Open Source Web Scraping API for AI Agents

The web scraper built for AI agents. Single binary. Zero config.

Works with: Claude Code · Cursor · Windsurf · Cline · Copilot · Continue.dev · Codex · Gemini CLI

Quick Start • AI Agents • Benchmarks • API Reference • Cloud • Discord

English | 中文

What's New

0.9.1 (2026-05-16)

Bug Fixes

release: sync crw-cli internal dep versions with workspace (26c528e)

Full changelog →

fastCRW — Open Source Web Scraping API for AI Agents

Power AI agents with clean web data. Single Rust binary, zero config, Firecrawl-compatible API. The open-source Firecrawl alternative you can self-host for free — or use our managed cloud.

Don't want to self-host? Sign up free → — managed cloud with global proxy network, web search, and dashboard. Same API, zero infra. 500 free credits, no credit card required.

Why CRW? — Firecrawl & Crawl4AI Alternative

Single binary, 6 MB RAM — no Redis, no Node.js, no containers. Firecrawl needs 5 containers and 4 GB+. Crawl4AI requires Python + Playwright
5.5x faster than Firecrawl — 833ms avg vs 4,600ms (see benchmarks). P50 at 446ms
73/100 search win rate — beats Firecrawl (25/100) and Tavily (2/100) in head-to-head benchmarks
Free self-hosting — $0/1K scrapes vs Firecrawl's $0.83–5.33. No infra, no cold starts (85ms). No API key required for local mode
Agent ready — add to any MCP client in one command. Embedded mode: no server needed
Firecrawl-compatible API — drop-in replacement. Same /v1/scrape, /v1/crawl, /v1/map endpoints. HTML to markdown, structured data extraction, website crawler — all built-in
Built for RAG pipelines — clean LLM-ready markdown output for vector databases and AI data ingestion
Open source — AGPL-3.0, developed transparently. Join our community

Metric	CRW (self-hosted)	fastcrw.com (cloud)	Firecrawl	Tavily	Crawl4AI
Coverage (1K URLs)	92.0%	92.0%	77.2%	—	—
Avg Scrape Latency	833ms	833ms	4,600ms	—	—
Avg Search Latency	880ms	880ms	954ms	2,000ms	—
Search Win Rate	73/100	73/100	25/100	2/100	—
Idle RAM	6.6 MB	0 (managed)	~500 MB+	— (cloud)	—
Cold start	85 ms	0 (always-on)	30–60 s	—	—
Self-hosting	Single binary	—	Multi-container	No	Python + Playwright
Cost / 1K scrapes	$0 (self-hosted)	From $13/mo	$0.83–5.33	—	$0
License	AGPL-3.0	Managed	AGPL-3.0	Proprietary	Apache-2.0

Web Scraping & Crawling Features

Core

Feature	Description
Scrape	Convert any URL to markdown, HTML, JSON, or links
Crawl	Async BFS website crawler with rate limiting
Map	Discover all URLs on a site instantly
Search	Web search + content scraping — bundled SearXNG sidecar, free Tavily alternative

More

Feature	Description
LLM Extraction	Send a JSON schema, get validated structured data back
LLM Summary	`formats: ["summary"]` on `/v1/scrape` — clean prose digest of any page. BYOK (Anthropic / OpenAI / Azure / DeepSeek / any OpenAI-compatible).
LLM Search Answer	`answer: true` / `summarizeResults: true` on `/v1/search` — synthesized answer with citations or per-result summaries
JS Rendering	Auto-detect SPAs, render via LightPanda or Chrome
CLI	Scrape any URL from your terminal — no server needed
MCP Server	Built-in stdio + HTTP transport for any AI agent

Use Cases: RAG pipelines · AI agent web access · content monitoring · data extraction · HTML to markdown conversion · web archiving

🚀 Quick Start

# Install:
curl -fsSL https://raw.githubusercontent.com/us/crw/main/install.sh | CRW_BINARY=crw sh

# Interactive setup wizard (recommended):
crw setup

# Scrape:
crw example.com

# Add to Claude Code (local):
claude mcp add crw -- npx crw-mcp
# Add to Claude Code (cloud — includes web search, 500 free credits at fastcrw.com):
claude mcp add -e CRW_API_URL=https://fastcrw.com/api -e CRW_API_KEY=your-key crw -- npx crw-mcp

Or: pip install crw (Python SDK) · npx crw-mcp (zero install) · brew install us/crw/crw (Homebrew) · All install options →

Scrape

Convert any URL to clean markdown, HTML, or structured JSON.

from crw import CrwClient

client = CrwClient(api_url="https://fastcrw.com/api", api_key="YOUR_API_KEY")  # local: CrwClient()
result = client.scrape("https://example.com")
print(result["markdown"])

Local mode: CrwClient() with no arguments runs a self-contained scraping engine — no server, no API key, no setup. The SDK automatically downloads the crw-mcp binary on first use.

CLI / cURL

CLI:

crw example.com
crw example.com --format html
crw example.com --js --css 'article'

Self-hosted (crw-server running on :3000):

curl -X POST http://localhost:3000/v1/scrape \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

Cloud:

curl -X POST https://fastcrw.com/api/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

With LLM summary (BYOK — bring your own provider key):

curl -X POST http://localhost:3000/v1/scrape \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://en.wikipedia.org/wiki/Tokio_(software)",
    "formats": ["markdown", "summary"],
    "summaryPrompt": "Answer in two sentences.",
    "maxContentChars": 50000,
    "llmProvider": "openai-compatible",
    "llmModel": "deepseek-chat",
    "baseUrl": "https://api.deepseek.com/v1",
    "llmApiKey": "YOUR_PROVIDER_KEY"
  }'

Output:

# Example Domain

This domain is for use in illustrative examples in documents.
You may use this domain in literature without prior coordination.

Renderer selection & response metadata

CRW picks between three rendering backends per request:

http (1 credit) — plain HTTP fetch. Used for static pages.
lightpanda (1 credit) — lightweight JS renderer for most SPAs.
chrome (2 credits) — full Chromium for sites where LightPanda's hydration crashes (e.g. some Next.js App Router pages).

By default the engine auto-selects, learns per-host preferences after repeated failures, and falls over chrome → lightpanda → http transparently. Pass "renderer" to pin one of auto | http | lightpanda | chrome (Firecrawl's engine is also accepted as an alias).

Every successful response includes routing metadata so callers can audit and debug:

{
  "data": {
    "markdown": "...",
    "renderDecision": {
      "kind": "failover",                 // autoDefault | autoPromoted | userPinned | failover | breakerSkipped
      "chain": ["lightpanda", "chrome"],  // renderers actually attempted
      "reason": "nextJsClientError"       // why the chain advanced
    },
    "creditCost": 2,
    "warnings": [
      "lightpanda returned a failed render (nextjs_client_error)"
    ],
    "metadata": { "renderedWith": "chrome", /* … */ }
  }
}

When you hard-pin a renderer that fails (e.g. "renderer":"lightpanda" on a hydration-crashing page), success stays true for protocol compatibility — but data.warnings[] carries an actionable hint suggesting renderer="chrome" or auto mode. Clients should surface the warnings array.

Crawl

Scrape all pages of a website asynchronously.

from crw import CrwClient

client = CrwClient(api_url="https://fastcrw.com/api", api_key="YOUR_API_KEY")  # local: CrwClient()
pages = client.crawl("https://docs.example.com", max_depth=2, max_pages=50)
for page in pages:
    print(page["metadata"]["sourceURL"], page["markdown"][:80])

CLI / cURL

# Start crawl
curl -X POST http://localhost:3000/v1/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://docs.example.com", "maxDepth": 2, "maxPages": 50}'

# Check status (use job ID from above)
curl http://localhost:3000/v1/crawl/JOB_ID

Map

Discover all URLs on a site instantly.

from crw import CrwClient

client = CrwClient(api_url="https://fastcrw.com/api", api_key="YOUR_API_KEY")  # local: CrwClient()
urls = client.map("https://example.com")
print(urls)  # ["https://example.com", "https://example.com/about", ...]

cURL

curl -X POST http://localhost:3000/v1/map \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

URL Filtering (default-on)

/v1/map filters its output by default to keep content URLs and drop transactional action URLs (e.g. WooCommerce ?add-to-cart=…, ?add_to_wishlist=…&_wpnonce=…). Tracking parameters (utm_*, gclid, fbclid, …) are stripped from the URLs that remain. Counts come back in droppedActionCount and strippedTrackingCount.

Per-request fields (all optional):

Field	Effect
`ignoreQueryParameters: true`	Coarse Firecrawl-compatible: strip ALL params from kept URLs (action drop still runs).
`ignoreQueryParameters: false`	Disable both filters — raw URLs.
`stripTrackingParams: bool`	Toggle Tier B only.
`dropActionUrls: bool`	Toggle Tier A only.
`extraTrackingParams: [string]`	Additive (≤64 keys).
`extraActionParams: [string]`	Additive (≤64 keys).
`preserveParams: [string]`	Never strip / never drop these keys (≤64 keys).

Precedence: per-request value > TOML ([map.url_filter]) > compile-time default (both true). To restore pre-filter behaviour globally, set strip_tracking_params = false and drop_action_urls = false in config.default.toml.

Search

Search the web and get full page content from results. Self-hosted CRW ships with a SearXNG sidecar (started automatically by docker compose up), so search works out of the box — no Tavily/Serper/Brave API key needed, $0/month. Tavily-style endpoints with a 30-line migration adapter — see the Tavily compatibility matrix for the field-by-field diff.

from crw import CrwClient

# Self-hosted (default Docker compose stack)
client = CrwClient(api_url="http://localhost:3000")
results = client.search("open source web scraper 2026", limit=10)

# Or cloud
# client = CrwClient(api_url="https://fastcrw.com/api", api_key="YOUR_KEY")

cURL

# Self-hosted
curl -X POST http://localhost:3000/v1/search \
  -H "Content-Type: application/json" \
  -d '{"query": "open source web scraper 2026", "limit": 10}'

# With grouped sources + scrape enrichment
curl -X POST http://localhost:3000/v1/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "rust async runtime",
    "sources": ["web", "news"],
    "scrapeOptions": {"formats": ["markdown"]}
  }'

# With LLM answer synthesis over the top results (BYOK)
curl -X POST http://localhost:3000/v1/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "what is tokio rust",
    "limit": 3,
    "answer": true,
    "answerTopN": 3,
    "answerPrompt": "Respond in Turkish in two sentences.",
    "scrapeOptions": {"formats": ["markdown"]},
    "llmApiKey": "sk-...",
    "llmProvider": "openai",
    "llmModel": "gpt-4o-mini"
  }'

The sidecar uses the upstream searxng/searxng image with config mounted read-only from config/searxng/settings.yml. To point at an existing SearXNG instance instead, set CRW_SEARCH__SEARXNG_URL=http://your-host:8080 and remove the searxng service from the compose file. To disable search entirely, set [search].enabled = false in your config — the route will return a clear search_disabled error.

API Endpoints

Method	Endpoint	Description
`POST`	`/v1/scrape`	Scrape a single URL, optionally with LLM extraction
`POST`	`/v1/crawl`	Start async BFS crawl (returns job ID)
`GET`	`/v1/crawl/:id`	Check crawl status and retrieve results
`DELETE`	`/v1/crawl/:id`	Cancel a running crawl job
`POST`	`/v1/map`	Discover all URLs on a site
`POST`	`/v1/search`	Web search via SearXNG sidecar, with optional content scraping
`GET`	`/health`	Health check (no auth required)
`POST`	`/mcp`	Streamable HTTP MCP transport

Full API reference →

🤖 Connect to AI Agents

Add CRW to any AI agent or MCP client in seconds.

Skill

Install the CRW skill to all detected agents with one command:

npx crw-mcp init --all

Restart your agent after installing. Works with Claude Code, Cursor, Gemini CLI, Codex, OpenCode, and Windsurf.

MCP Server for AI Agents

Add CRW to any MCP-compatible client:

{
  "mcpServers": {
    "crw": {
      "command": "npx",
      "args": ["crw-mcp"]
    }
  }
}

Works with Claude Desktop, Cursor, Windsurf, Cline, Continue.dev, and any MCP client.

Config file locations: Claude Code — claude mcp add (no file edit). Claude Desktop — ~/Library/Application Support/Claude/claude_desktop_config.json. Cursor — .cursor/mcp.json. Windsurf — ~/.codeium/windsurf/mcp_config.json. All clients →

Cloud mode — point at fastcrw.com for global proxy network and dashboard:

{
  "mcpServers": {
    "crw": {
      "command": "npx",
      "args": ["crw-mcp"],
      "env": {
        "CRW_API_URL": "https://fastcrw.com/api",
        "CRW_API_KEY": "your-api-key"
      }
    }
  }
}

MCP Tools

Mode	Tool	Description
Embedded + Cloud	`crw_scrape`	Scrape a URL → markdown, HTML, JSON, links
Embedded + Cloud	`crw_crawl`	Start async BFS crawl (returns job ID)
Embedded + Cloud	`crw_check_crawl_status`	Poll crawl job status (use with `crw_crawl`)
Embedded + Cloud	`crw_map`	Discover all URLs on a site
Server + Cloud	`crw_search`	Web search + optional content scraping (server uses bundled SearXNG sidecar)

Embedded mode (default): no server, no API key, no setup — the MCP binary runs a self-contained scraping engine (search not available — needs the SearXNG sidecar from docker compose up). Server mode (CRW_API_URL=http://localhost:3000): full API including crw_search. Cloud mode (CRW_API_URL=https://fastcrw.com/api): adds global proxy network and dashboard.

Full MCP docs →

Claude Code shortcut:

# Local (embedded — no server, no API key):
claude mcp add crw -- npx crw-mcp

# Cloud (+ web search):
claude mcp add -e CRW_API_URL=https://fastcrw.com/api -e CRW_API_KEY=your-key crw -- npx crw-mcp

Listed on the MCP Registry

Agent Onboarding

Are you an AI agent? Fetch this skill file to configure CRW for the user's coding agents. The SKILL.md contains tool descriptions, usage examples, authentication setup, and edge case handling:

curl -s https://fastcrw.com/agent-onboarding/SKILL.md

📊 Benchmark

Search — CRW vs Firecrawl vs Tavily (100 queries, concurrent)

Metric	CRW	Firecrawl	Tavily
Avg Latency	880ms	954ms	2,000ms
Median Latency	785ms	932ms	1,724ms
Win Rate	73/100	25/100	2/100

CRW is 2.3x faster than Tavily and won 73% of latency races. Full search benchmark →

Scrape — CRW vs Firecrawl (1,000 URLs, JS rendering enabled)

Tested on Firecrawl's scrape-content-dataset-v1:

Metric	CRW	Firecrawl v2.5
Coverage	92.0%	77.2%
Avg Latency	833ms	4,600ms
P50 Latency	446ms	—
Noise Rejection	88.4%	noise 6.8%
Idle RAM	6.6 MB	~500 MB+
Cost / 1K scrapes	$0 (self-hosted)	$0.83–5.33

Resource comparison

Metric	CRW	Firecrawl
Min RAM	~7 MB	4 GB
Recommended RAM	~64 MB (under load)	8–16 GB
Docker images	single ~8 MB binary	~2–3 GB total
Cold start	85 ms	30–60 seconds
Containers needed	1 (+optional sidecar)	5

Full benchmark details →

Run the benchmark yourself:

pip install datasets aiohttp
python bench/run_bench.py

📦 Install

MCP Server (`crw-mcp`) — recommended for AI agents

npx crw-mcp                           # zero install (npm)
pip install crw                        # Python SDK (auto-downloads binary)
brew install us/crw/crw-mcp            # Homebrew
cargo install crw-mcp                  # Cargo
docker run -i ghcr.io/us/crw crw-mcp  # Docker

CLI (`crw`) — scrape URLs from your terminal

brew install us/crw/crw

# One-line install (auto-detects OS & arch):
curl -fsSL https://raw.githubusercontent.com/us/crw/main/install.sh | CRW_BINARY=crw sh

# APT (Debian/Ubuntu):
curl -fsSL https://apt.fastcrw.com/gpg.key | sudo gpg --dearmor -o /usr/share/keyrings/crw.gpg
echo "deb [signed-by=/usr/share/keyrings/crw.gpg] https://apt.fastcrw.com stable main" | sudo tee /etc/apt/sources.list.d/crw.list
sudo apt update && sudo apt install crw

cargo install crw-cli

API Server (`crw-server`) — Firecrawl-compatible REST API

For serving multiple apps, other languages (Node.js, Go, Java), or as a shared microservice.

brew install us/crw/crw-server

# One-line install:
curl -fsSL https://raw.githubusercontent.com/us/crw/main/install.sh | CRW_BINARY=crw-server sh

# Docker:
docker run -p 3000:3000 ghcr.io/us/crw

Custom port:

CRW_SERVER__PORT=8080 crw-server                                       # env var
docker run -p 8080:8080 -e CRW_SERVER__PORT=8080 ghcr.io/us/crw       # Docker

Docker Compose ships with lightpanda enabled by default; chrome is opt-in to keep small VPS deploys lean (~500MB image + 1GB resident):

# baseline — http + lightpanda
docker compose up -d

# add chrome failover (recommended for production)
docker compose --profile heavy up -d

# stealth tier — browserless/chromium with anti-fingerprint plugin
# (+2.5pt bench success on bot-defended sites; SSPL-3.0, see warning below)
echo "BROWSERLESS_TOKEN=$(openssl rand -hex 24)" >> .env
docker compose -f docker-compose.yml -f docker-compose.stealth.yml \
  --profile stealth up -d

Without --profile heavy or --profile stealth, the engine still serves all endpoints — chrome-required URLs will exhaust their lightpanda failover and surface data.warnings[] instead of routing to chrome.

⚠️ Stealth profile licensing — compliance risk to review. --profile stealth pulls ghcr.io/browserless/chromium, which is SSPL-3.0. SSPL §13 obliges anyone who makes the functionality of the Program available to third parties as a service (commercial or otherwise) to release the Service Source Code — the full management/automation/hosting stack around it. CRW (AGPL-3) connects over a network socket only, so the opencore CRW source is most likely outside §13's reach — but the boundary is fact-specific and we are not lawyers. Get legal review before exposing this stack to third parties. The default --profile heavy (chromedp/headless-shell, Apache-2/BSD) carries none of this risk.

When do you need crw-server? Only if you want a REST API endpoint. The Python SDK (CrwClient()) and MCP binary (crw-mcp) both run a self-contained engine — no server required.

SDKs

Python

pip install crw

from crw import CrwClient

# Cloud (fastcrw.com — includes web search):
client = CrwClient(api_url="https://fastcrw.com/api", api_key="YOUR_API_KEY")
# Local (embedded, no server needed):
# client = CrwClient()

# Scrape
result = client.scrape("https://example.com", formats=["markdown", "links"])
print(result["markdown"])

# Crawl (blocks until complete)
pages = client.crawl("https://docs.example.com", max_depth=2, max_pages=50)

# Map
urls = client.map("https://example.com")

# Search (self-hosted via bundled SearXNG, or cloud)
results = client.search("AI news", limit=10, sources=["web", "news"])

Requires: Python 3.9+. Local mode auto-downloads the crw-mcp binary on first use — no manual setup.

Community SDKs

crewai-crw — CRW scraping tools for CrewAI agents
langchain-crw — CRW document loader for LangChain

Node.js: No official SDK yet — use the REST API directly or npx crw-mcp for MCP. SDK examples →

Integrations

Frameworks: CrewAI · LangChain · Agno · Dify

Platforms: n8n · Flowise

Missing your favorite tool? Open an issue → · All integrations →

LLM Structured Extraction

Send a JSON schema, get validated structured data back using LLM function calling. Full extraction docs →

curl -X POST http://localhost:3000/v1/scrape \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/product",
    "formats": ["json"],
    "jsonSchema": {
      "type": "object",
      "properties": {
        "name": { "type": "string" },
        "price": { "type": "number" }
      },
      "required": ["name", "price"]
    }
  }'

Configure the LLM provider:

[extraction.llm]
provider = "anthropic"        # "anthropic" or "openai"
api_key = "sk-..."            # or CRW_EXTRACTION__LLM__API_KEY env var
model = "claude-sonnet-4-20250514"

JS Rendering

CRW auto-detects SPAs and renders them via a headless browser. Full JS rendering docs →

crw setup          # interactive wizard (recommended)
# or
crw-server setup   # downloads LightPanda, creates config.local.toml

Renderer	Protocol	Best for
LightPanda	CDP over WebSocket	Low-resource environments (default); simple sites
Chrome (chromedp/headless-shell)	CDP over WebSocket	Modern React/Vite/Next SPAs; recommended for production
Chrome (browserless/chromium, opt-in `stealth` profile)	CDP over WebSocket	Bot-defended sites (Cloudflare Turnstile, DataDome) — SSPL-3.0, see compose notes
Playwright	CDP over WebSocket	Full browser compatibility

Renderer choice matters for SPAs. LightPanda is fast and cheap but its JS runtime does not fully cover every modern bundle format. For React / Vite / Next sites whose content appears only after hydration, configure Chrome (or Playwright) alongside LightPanda — CRW will fall back to Chrome automatically when LightPanda returns a loading placeholder. Leaving LightPanda as the only renderer may silently return "Loading..."-style shell content for these sites.

With Docker Compose, LightPanda runs as a sidecar automatically:

docker compose up

CLI

Scrape any URL from your terminal — no server, no config. Full CLI docs →

A short tour, from "URL → clean markdown" to "search the web, scrape the top hit, and let an LLM summarize it" — the exact loop an AI agent runs.

1. Any page → clean, LLM-ready markdown

crw en.wikipedia.org/wiki/Web_scraping

Boilerplate stripped, just the article as markdown on stdout.

2. Structured JSON for pipelines

crw en.wikipedia.org/wiki/Web_scraping --format json | jq '.metadata'

Title, description, status code, the renderer that was used — everything a pipeline needs.

3. Search the web

crw search "rust async runtime tokio"

Title + URL per result. Add --format json to pipe it somewhere.

4. Scrape + AI summary

Steps 4–6 use an LLM. Run crw setup once to store a provider key, or pass --llm-provider/--llm-key/--llm-model per call.

crw en.wikipedia.org/wiki/Web_scraping --summary --prompt "in 3 bullet points"

--summary prints the summary text only — no markdown wrapper, ready to pipe.

5. The agent loop: search → scrape → summarize

crw "$(crw search 'what is retrieval augmented generation' --format json | jq -r '.[0].url')" \
  --summary --prompt "Explain it to a backend engineer in 4 sentences."

One line: search the web, take the top result, fetch it, and hand the LLM a clean copy.

6. Structured extraction with a JSON Schema

crw news.ycombinator.com --extract '{
  "type": "object",
  "properties": {
    "stories": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": { "title": {"type": "string"}, "points": {"type": "integer"} }
      }
    }
  }
}'

Typed data straight off the page — no brittle selectors, ready to feed an LLM pipeline. Pass a file with --extract @schema.json.

More flags:

crw example.com --format html          # cleaned HTML
crw example.com --format links         # extract all links
crw example.com --js                   # JS rendering for SPAs
crw example.com --js --css 'article'   # JS render + CSS selector
crw example.com --stealth              # stealth mode (rotate UAs)
crw example.com -o page.md             # write to file

🏠 Self-Hosting

Once installed, start the server and optionally enable JS rendering:

crw-server                    # start REST API on :3000
crw-server setup              # optional: downloads LightPanda for JS rendering
docker compose up             # alternative: Docker with LightPanda sidecar

Setup Wizard (CLI)

The CLI includes an interactive setup wizard for easy configuration:

crw setup

The wizard guides you through:

Cloud vs Local mode selection
Browser engine setup (LightPanda or Chrome for JS rendering)
Search engine setup (SearXNG via Docker)
LLM provider configuration (BYOK for AI features)
Shell configuration (auto-adds to .zshrc/.bashrc)

Options:

crw setup --cloud — skip to cloud setup
crw setup --local — skip to local setup
crw setup --no-color — disable colored output (accessibility)
Press ESC to cancel gracefully at any prompt

See the self-hosting guide for production hardening, auth, reverse proxy, and resource tuning.

Open Source vs Cloud

	Self-hosted (free)	fastcrw.com Cloud
Core scraping	✅	✅
JS rendering	✅ (LightPanda/Chrome)	✅
Web search	✅ (bundled SearXNG sidecar)	✅ (managed)
Global proxy network	❌	✅
Dashboard	❌	✅
Commercial use without open-sourcing	Requires AGPL compliance	✅ Included
Cost	$0	From $13/mo

Sign up free → — 500 free credits, no credit card required.

Architecture

┌─────────────────────────────────────────────┐
│                 crw-server                  │
│         Axum HTTP API + Auth + MCP          │
├──────────┬──────────┬───────────────────────┤
│ crw-crawl│crw-extract│    crw-renderer      │
│ BFS crawl│ HTML→MD   │  HTTP + CDP(WS)      │
│ robots   │ LLM/JSON  │  LightPanda/Chrome   │
│ sitemap  │ clean/read│  auto-detect SPA     │
├──────────┴──────────┴───────────────────────┤
│                 crw-core                    │
│        Types, Config, Errors                │
└─────────────────────────────────────────────┘

Crate	Description
`crw-core`	Core types, config, and error handling
`crw-renderer`	HTTP + CDP browser rendering engine
`crw-extract`	HTML → markdown/plaintext extraction
`crw-crawl`	Async BFS crawler with robots.txt & sitemap
`crw-server`	Axum API server (Firecrawl-compatible)
`crw-mcp`	MCP stdio server (embedded + proxy mode)
`crw-cli`	Standalone CLI (`crw` binary, no server)

Full architecture docs →

Configuration

Layered TOML config with environment variable overrides:

config.default.toml — built-in defaults
config.local.toml — local overrides (or CRW_CONFIG=myconfig)
Environment variables — CRW_ prefix, __ separator (e.g. CRW_SERVER__PORT=8080)

[server]
host = "0.0.0.0"
port = 3000
rate_limit_rps = 10

[renderer]
mode = "auto"  # auto | lightpanda | playwright | chrome | none

[crawler]
max_concurrency = 10
requests_per_second = 10.0
respect_robots_txt = true

[auth]
# api_keys = ["fc-key-1234"]

See full configuration reference.

Security

SSRF protection — blocks loopback, private IPs, cloud metadata (169.254.x.x), IPv6 mapped addresses, and non-HTTP schemes (file://, data:)
Auth — optional Bearer token with constant-time comparison
robots.txt — RFC 9309 compliant with wildcard patterns
Rate limiting — token-bucket algorithm, returns 429 with error_code
Resource limits — max body 1 MB, max crawl depth 10, max pages 1000

Full security docs →

Resources

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

Fork the repository
Install pre-commit hooks: make hooks
Create your feature branch (git checkout -b feat/my-feature)
Commit your changes (git commit -m 'feat: add my feature')
Push to the branch (git push origin feat/my-feature)
Open a Pull Request

The pre-commit hook runs the same checks as CI (cargo fmt, cargo clippy, cargo test). Run manually with make check.

Contributors

License

CRW is open-source under AGPL-3.0. For a managed version without AGPL obligations, see fastcrw.com.

Get Started

Self-host free: curl -fsSL https://raw.githubusercontent.com/us/crw/main/install.sh | sh — works in 30 seconds
Cloud: Sign up free → — 500 free credits, no credit card required
Questions? Join our Discord

It is the sole responsibility of end users to respect websites' policies when scraping. Users are advised to adhere to applicable privacy policies and terms of use. By default, CRW respects robots.txt directives.

↑ Back to Top ↑

Name		Name	Last commit message	Last commit date
Latest commit History 307 Commits
.githooks		.githooks
.github		.github
bench		bench
config/searxng		config/searxng
crates		crates
docs		docs
npm		npm
python		python
scripts		scripts
tests/fixtures		tests/fixtures
.DS_Store		.DS_Store
.dockerignore		.dockerignore
.gitignore		.gitignore
.release-please-manifest.json		.release-please-manifest.json
CHANGELOG.md		CHANGELOG.md
COMPATIBILITY-firecrawl.md		COMPATIBILITY-firecrawl.md
COMPATIBILITY.md		COMPATIBILITY.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README.zh-CN.md		README.zh-CN.md
RELEASE.md		RELEASE.md
ROADMAP.md		ROADMAP.md
config.default.toml		config.default.toml
config.docker.toml		config.docker.toml
config.stealth.toml		config.stealth.toml
docker-compose.stealth.yml		docker-compose.stealth.yml
docker-compose.yml		docker-compose.yml
install.sh		install.sh
package.json		package.json
release-please-config.json		release-please-config.json
server.json		server.json
test_api_pipeline.py		test_api_pipeline.py
version.txt		version.txt

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

What's New

0.9.1 (2026-05-16)

Bug Fixes

fastCRW — Open Source Web Scraping API for AI Agents

Why CRW? — Firecrawl & Crawl4AI Alternative

Web Scraping & Crawling Features

🚀 Quick Start

Scrape

Renderer selection & response metadata

Crawl

Map

URL Filtering (default-on)

Search

API Endpoints

🤖 Connect to AI Agents

Skill

MCP Server for AI Agents

MCP Tools

Agent Onboarding

📊 Benchmark

Search — CRW vs Firecrawl vs Tavily (100 queries, concurrent)

Scrape — CRW vs Firecrawl (1,000 URLs, JS rendering enabled)

📦 Install

MCP Server (crw-mcp) — recommended for AI agents

CLI (crw) — scrape URLs from your terminal

API Server (crw-server) — Firecrawl-compatible REST API

SDKs

Python

Community SDKs

Integrations

LLM Structured Extraction

JS Rendering

CLI

🏠 Self-Hosting

Setup Wizard (CLI)

Open Source vs Cloud

Architecture

Configuration

Security

Resources

Contributing

Contributors

License

Get Started

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 39

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

MCP Server (`crw-mcp`) — recommended for AI agents

CLI (`crw`) — scrape URLs from your terminal

API Server (`crw-server`) — Firecrawl-compatible REST API

Packages