CCR-Rust

CCR-Rust is a local LLM router for Claude Code, Codex, OpenCode, and other OpenAI-compatible clients.

The easiest way to think about it:

your client keeps talking to one local URL,
CCR-Rust decides which provider/model should handle the request,
and you keep the same workflow even when you need to switch away from Claude.

For many people, the main use case is simple: when your Claude plan runs out, keep using Claude Code with a GLM-5.1/MiniMax 2.7 instead of changing tools.

Features

Automatic failover — tiered provider cascade on 5xx/timeouts; 429s pass through to client
Multi-protocol — Anthropic and OpenAI APIs behind one endpoint
Cost routing — send traffic classes (default/think/background) to different models
Observability — Prometheus metrics, live TUI dashboard, token/latency tracking
MCP aggregation — optional tool server proxying
Compression — response and tool-output compression for long-running client sessions

~15 MB binary, <50 ms P99 routing overhead, designed to stay out of your way.

Getting Started

If you are opening this repository to make changes and you do not normally work in Rust repos, read AGENTS.md first. It explains the expected workflow, validation steps, and local conventions.

1. Build and install

cargo build --release
cargo install --path . --force

2. Create a config

mkdir -p ~/.claude-code-router
cp config.example.json ~/.claude-code-router/config.json

Edit the config to add your provider API keys. See Configuration guide for the full schema.

If your main goal is “keep working after Claude usage runs out”, start with the step-by-step guide:

Claude Code fallback how-to

3. Start the router

ccr-rust start
ccr-rust status       # verify it's running

4. Point your client at CCR

# Claude Code
export ANTHROPIC_BASE_URL=http://127.0.0.1:3456
claude

# Codex
export OPENAI_BASE_URL=http://127.0.0.1:3456/v1
codex

# OpenCode
export OPENAI_BASE_URL=http://127.0.0.1:3456/v1
opencode

Any OpenAI-compatible client works the same way — just set the base URL.

If Claude Code complains about a missing ANTHROPIC_API_KEY, keep that environment variable set locally as well. CCR-Rust still uses the upstream provider keys from its own config file.

What it is doing

If you have ever looked at CCR-Rust and thought, “what in the world is this thing doing?”, here is the short version:

Your client sends one request to CCR-Rust.

Claude Code talks to the Anthropic-style endpoint.
Codex and most SDKs talk to the OpenAI-style endpoint.

CCR-Rust picks a configured provider/model using your routing rules.
If the upstream provider uses a different API format, CCR-Rust translates the request.
It sends the request upstream, collects the response, and translates it back if needed.
Your client still sees the format it expects.

That means you can keep using Claude Code as your interface even when the actual model behind it changes.

API Surface

Endpoint	Method	Purpose
`/v1/messages`	POST	Anthropic messages API
`/v1/chat/completions`	POST	OpenAI chat completions API
`/v1/responses`	POST	Stream batch responses
`/v1/models`	GET	List configured models
`/health`	GET	Health check
`/metrics`	GET	Prometheus metrics

Configuration

CCR-Rust reads ~/.claude-code-router/config.json. Supports ${ENV_VAR} substitution.

{
  "Providers": [
    {
      "name": "deepseek",
      "api_base_url": "https://api.deepseek.com",
      "api_key": "${DEEPSEEK_API_KEY}",
      "models": ["deepseek-chat", "deepseek-reasoner"]
    },
    {
      "name": "openrouter",
      "api_base_url": "https://openrouter.ai/api/v1",
      "api_key": "${OPENROUTER_API_KEY}",
      "models": [
        "inclusionai/ling-2.6-flash:free",
        "minimax/minimax-m2.5:free"
      ],
      "transformer": {
        "use": ["anthropic", "openrouter"]
      }
    }
  ],
  "Router": {
    "default": "deepseek,deepseek-chat",
    "tiers": [
      "deepseek,deepseek-chat",
      "openrouter,inclusionai/ling-2.6-flash:free"
    ]
  },
  "PORT": 3456,
  "HOST": "127.0.0.1"
}

For full schema and provider setup, see Configuration guide. For common presets (Claude-only, multi-tier, cost-optimized), see Presets.

A simple mental model for Claude Code users

If you only remember one thing, remember this:

Claude Code stays the interface.
CCR-Rust becomes the local switchboard.
Your actual model provider can change underneath without changing your day-to-day CLI flow.

That is why it is useful when Claude usage limits kick in: you keep your editor, prompts, and habits, and only swap the engine behind the curtain.

Rate Limiting

Rate limiting is handled with transparency for orchestrators and clients:

429 responses are passed through with a normalized error body (type: "rate_limit_error", code: "rate_limited") and an x-ccr-tier header identifying which provider was rate-limited. The rate limit is still tracked internally for future tier-skipping decisions.
5xx/timeout errors cascade to the next tier automatically. The client only sees an error if all tiers are exhausted.
Informational headers (X-RateLimit-Remaining: 0 on 200 responses) trigger proactive tier-skipping by default. Set "honor_ratelimit_headers": false per provider for those (like Z.AI) that send these as informational warnings without actual enforcement.

This design works well with external orchestrators and retry-aware clients that need accurate rate-limit signal for intelligent routing. Standalone users should implement client-side retry logic for 429s when they want automatic recovery from rate limits.

Observability

# Prometheus metrics
curl http://localhost:3456/metrics

# Live TUI dashboard
ccr-rust dashboard

# Remote dashboard target via environment
CCR_DASHBOARD_HOST=10.0.0.5 CCR_DASHBOARD_PORT=3456 ccr-rust dashboard

Tracks: token counts (in/out), latencies (p50/p90/p99), provider success rates, circuit-breaker states, cost per tier.

Documentation

See docs/index.md for the full documentation index:

Setup: CLI reference · Configuration · Presets · Deployment
Integrations: Claude Code fallback how-to · Codex · OpenAI SDK · Kimi · Gemini
Operations: Observability · Debug capture · Streaming · Token optimization
Troubleshooting: Common issues

License

AGPL-3.0-or-later. See LICENSE.

Network service clause: Modified versions of CCR-Rust offered as a network service must provide source code to users of that service.

Built for reliability. Made for scale. Join the discussions.

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
benches		benches
benchmarks		benchmarks
deploy		deploy
docs		docs
k8s		k8s
monitoring		monitoring
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.sloc-guard.toml		.sloc-guard.toml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
build.rs		build.rs
config.example.json		config.example.json
config.gemini.json		config.gemini.json
deny.toml		deny.toml
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CCR-Rust

Features

Getting Started

1. Build and install

2. Create a config

3. Start the router

4. Point your client at CCR

What it is doing

API Surface

Configuration

A simple mental model for Claude Code users

Rate Limiting

Observability

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CCR-Rust

Features

Getting Started

1. Build and install

2. Create a config

3. Start the router

4. Point your client at CCR

What it is doing

API Surface

Configuration

A simple mental model for Claude Code users

Rate Limiting

Observability

Documentation

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages