CCR-Rust is a local LLM router for Claude Code, Codex, OpenCode, and other OpenAI-compatible clients.
The easiest way to think about it:
- your client keeps talking to one local URL,
- CCR-Rust decides which provider/model should handle the request,
- and you keep the same workflow even when you need to switch away from Claude.
For many people, the main use case is simple: when your Claude plan runs out, keep using Claude Code with a GLM-5.1/MiniMax 2.7 instead of changing tools.
- Automatic failover — tiered provider cascade on 5xx/timeouts; 429s pass through to client
- Multi-protocol — Anthropic and OpenAI APIs behind one endpoint
- Cost routing — send traffic classes (default/think/background) to different models
- Observability — Prometheus metrics, live TUI dashboard, token/latency tracking
- MCP aggregation — optional tool server proxying
- Compression — response and tool-output compression for long-running client sessions
~15 MB binary, <50 ms P99 routing overhead, designed to stay out of your way.
If you are opening this repository to make changes and you do not normally work in Rust repos, read AGENTS.md first. It explains the expected workflow, validation steps, and local conventions.
cargo build --release
cargo install --path . --forcemkdir -p ~/.claude-code-router
cp config.example.json ~/.claude-code-router/config.jsonEdit the config to add your provider API keys. See Configuration guide for the full schema.
If your main goal is “keep working after Claude usage runs out”, start with the step-by-step guide:
ccr-rust start
ccr-rust status # verify it's running# Claude Code
export ANTHROPIC_BASE_URL=http://127.0.0.1:3456
claude
# Codex
export OPENAI_BASE_URL=http://127.0.0.1:3456/v1
codex
# OpenCode
export OPENAI_BASE_URL=http://127.0.0.1:3456/v1
opencodeAny OpenAI-compatible client works the same way — just set the base URL.
If Claude Code complains about a missing ANTHROPIC_API_KEY, keep that environment variable set locally as well. CCR-Rust still uses the upstream provider keys from its own config file.
If you have ever looked at CCR-Rust and thought, “what in the world is this thing doing?”, here is the short version:
- Your client sends one request to CCR-Rust.
- Claude Code talks to the Anthropic-style endpoint.
- Codex and most SDKs talk to the OpenAI-style endpoint.
- CCR-Rust picks a configured provider/model using your routing rules.
- If the upstream provider uses a different API format, CCR-Rust translates the request.
- It sends the request upstream, collects the response, and translates it back if needed.
- Your client still sees the format it expects.
That means you can keep using Claude Code as your interface even when the actual model behind it changes.
| Endpoint | Method | Purpose |
|---|---|---|
/v1/messages |
POST | Anthropic messages API |
/v1/chat/completions |
POST | OpenAI chat completions API |
/v1/responses |
POST | Stream batch responses |
/v1/models |
GET | List configured models |
/health |
GET | Health check |
/metrics |
GET | Prometheus metrics |
CCR-Rust reads ~/.claude-code-router/config.json. Supports ${ENV_VAR} substitution.
{
"Providers": [
{
"name": "deepseek",
"api_base_url": "https://api.deepseek.com",
"api_key": "${DEEPSEEK_API_KEY}",
"models": ["deepseek-chat", "deepseek-reasoner"]
},
{
"name": "openrouter",
"api_base_url": "https://openrouter.ai/api/v1",
"api_key": "${OPENROUTER_API_KEY}",
"models": [
"inclusionai/ling-2.6-flash:free",
"minimax/minimax-m2.5:free"
],
"transformer": {
"use": ["anthropic", "openrouter"]
}
}
],
"Router": {
"default": "deepseek,deepseek-chat",
"tiers": [
"deepseek,deepseek-chat",
"openrouter,inclusionai/ling-2.6-flash:free"
]
},
"PORT": 3456,
"HOST": "127.0.0.1"
}For full schema and provider setup, see Configuration guide. For common presets (Claude-only, multi-tier, cost-optimized), see Presets.
If you only remember one thing, remember this:
- Claude Code stays the interface.
- CCR-Rust becomes the local switchboard.
- Your actual model provider can change underneath without changing your day-to-day CLI flow.
That is why it is useful when Claude usage limits kick in: you keep your editor, prompts, and habits, and only swap the engine behind the curtain.
Rate limiting is handled with transparency for orchestrators and clients:
- 429 responses are passed through with a normalized error body (
type: "rate_limit_error",code: "rate_limited") and anx-ccr-tierheader identifying which provider was rate-limited. The rate limit is still tracked internally for future tier-skipping decisions. - 5xx/timeout errors cascade to the next tier automatically. The client only sees an error if all tiers are exhausted.
- Informational headers (
X-RateLimit-Remaining: 0on 200 responses) trigger proactive tier-skipping by default. Set"honor_ratelimit_headers": falseper provider for those (like Z.AI) that send these as informational warnings without actual enforcement.
This design works well with external orchestrators and retry-aware clients that need accurate rate-limit signal for intelligent routing. Standalone users should implement client-side retry logic for 429s when they want automatic recovery from rate limits.
# Prometheus metrics
curl http://localhost:3456/metrics
# Live TUI dashboard
ccr-rust dashboard
# Remote dashboard target via environment
CCR_DASHBOARD_HOST=10.0.0.5 CCR_DASHBOARD_PORT=3456 ccr-rust dashboardTracks: token counts (in/out), latencies (p50/p90/p99), provider success rates, circuit-breaker states, cost per tier.
See docs/index.md for the full documentation index:
- Setup: CLI reference · Configuration · Presets · Deployment
- Integrations: Claude Code fallback how-to · Codex · OpenAI SDK · Kimi · Gemini
- Operations: Observability · Debug capture · Streaming · Token optimization
- Troubleshooting: Common issues
AGPL-3.0-or-later. See LICENSE.
Network service clause: Modified versions of CCR-Rust offered as a network service must provide source code to users of that service.
Built for reliability. Made for scale. Join the discussions.