# User Guide ๐ŸŒ **Languages:** ๐Ÿ‡บ๐Ÿ‡ธ [English](USER_GUIDE.md) | ๐Ÿ‡ง๐Ÿ‡ท [Portuguรชs (Brasil)](i18n/pt-BR/USER_GUIDE.md) | ๐Ÿ‡ช๐Ÿ‡ธ [Espaรฑol](i18n/es/USER_GUIDE.md) | ๐Ÿ‡ซ๐Ÿ‡ท [Franรงais](i18n/fr/USER_GUIDE.md) | ๐Ÿ‡ฎ๐Ÿ‡น [Italiano](i18n/it/USER_GUIDE.md) | ๐Ÿ‡ท๐Ÿ‡บ [ะ ัƒััะบะธะน](i18n/ru/USER_GUIDE.md) | ๐Ÿ‡จ๐Ÿ‡ณ [ไธญๆ–‡ (็ฎ€ไฝ“)](i18n/zh-CN/USER_GUIDE.md) | ๐Ÿ‡ฉ๐Ÿ‡ช [Deutsch](i18n/de/USER_GUIDE.md) | ๐Ÿ‡ฎ๐Ÿ‡ณ [เคนเคฟเคจเฅเคฆเฅ€](i18n/in/USER_GUIDE.md) | ๐Ÿ‡น๐Ÿ‡ญ [เน„เธ—เธข](i18n/th/USER_GUIDE.md) | ๐Ÿ‡บ๐Ÿ‡ฆ [ะฃะบั€ะฐั—ะฝััŒะบะฐ](i18n/uk-UA/USER_GUIDE.md) | ๐Ÿ‡ธ๐Ÿ‡ฆ [ุงู„ุนุฑุจูŠุฉ](i18n/ar/USER_GUIDE.md) | ๐Ÿ‡ฏ๐Ÿ‡ต [ๆ—ฅๆœฌ่ชž](i18n/ja/USER_GUIDE.md) | ๐Ÿ‡ป๐Ÿ‡ณ [Tiแบฟng Viแป‡t](i18n/vi/USER_GUIDE.md) | ๐Ÿ‡ง๐Ÿ‡ฌ [ะ‘ัŠะปะณะฐั€ัะบะธ](i18n/bg/USER_GUIDE.md) | ๐Ÿ‡ฉ๐Ÿ‡ฐ [Dansk](i18n/da/USER_GUIDE.md) | ๐Ÿ‡ซ๐Ÿ‡ฎ [Suomi](i18n/fi/USER_GUIDE.md) | ๐Ÿ‡ฎ๐Ÿ‡ฑ [ืขื‘ืจื™ืช](i18n/he/USER_GUIDE.md) | ๐Ÿ‡ญ๐Ÿ‡บ [Magyar](i18n/hu/USER_GUIDE.md) | ๐Ÿ‡ฎ๐Ÿ‡ฉ [Bahasa Indonesia](i18n/id/USER_GUIDE.md) | ๐Ÿ‡ฐ๐Ÿ‡ท [ํ•œ๊ตญ์–ด](i18n/ko/USER_GUIDE.md) | ๐Ÿ‡ฒ๐Ÿ‡พ [Bahasa Melayu](i18n/ms/USER_GUIDE.md) | ๐Ÿ‡ณ๐Ÿ‡ฑ [Nederlands](i18n/nl/USER_GUIDE.md) | ๐Ÿ‡ณ๐Ÿ‡ด [Norsk](i18n/no/USER_GUIDE.md) | ๐Ÿ‡ต๐Ÿ‡น [Portuguรชs (Portugal)](i18n/pt/USER_GUIDE.md) | ๐Ÿ‡ท๐Ÿ‡ด [Romรขnฤƒ](i18n/ro/USER_GUIDE.md) | ๐Ÿ‡ต๐Ÿ‡ฑ [Polski](i18n/pl/USER_GUIDE.md) | ๐Ÿ‡ธ๐Ÿ‡ฐ [Slovenฤina](i18n/sk/USER_GUIDE.md) | ๐Ÿ‡ธ๐Ÿ‡ช [Svenska](i18n/sv/USER_GUIDE.md) | ๐Ÿ‡ต๐Ÿ‡ญ [Filipino](i18n/phi/USER_GUIDE.md) Complete guide for configuring providers, creating combos, integrating CLI tools, and deploying OmniRoute. --- ## Table of Contents - [Pricing at a Glance](#-pricing-at-a-glance) - [Use Cases](#-use-cases) - [Provider Setup](#-provider-setup) - [CLI Integration](#-cli-integration) - [Deployment](#-deployment) - [Available Models](#-available-models) - [Advanced Features](#-advanced-features) --- ## ๐Ÿ’ฐ Pricing at a Glance | Tier | Provider | Cost | Quota Reset | Best For | | ------------------- | ----------------- | ----------- | ---------------- | -------------------- | | **๐Ÿ’ณ SUBSCRIPTION** | Claude Code (Pro) | $20/mo | 5h + weekly | Already subscribed | | | Codex (Plus/Pro) | $20-200/mo | 5h + weekly | OpenAI users | | | Gemini CLI | **FREE** | 180K/mo + 1K/day | Everyone! | | | GitHub Copilot | $10-19/mo | Monthly | GitHub users | | **๐Ÿ”‘ API KEY** | DeepSeek | Pay per use | None | Cheap reasoning | | | Groq | Pay per use | None | Ultra-fast inference | | | xAI (Grok) | Pay per use | None | Grok 4 reasoning | | | Mistral | Pay per use | None | EU-hosted models | | | Perplexity | Pay per use | None | Search-augmented | | | Together AI | Pay per use | None | Open-source models | | | Fireworks AI | Pay per use | None | Fast FLUX images | | | Cerebras | Pay per use | None | Wafer-scale speed | | | Cohere | Pay per use | None | Command R+ RAG | | | NVIDIA NIM | Pay per use | None | Enterprise models | | **๐Ÿ’ฐ CHEAP** | GLM-4.7 | $0.6/1M | Daily 10AM | Budget backup | | | MiniMax M2.1 | $0.2/1M | 5-hour rolling | Cheapest option | | | Kimi K2 | $9/mo flat | 10M tokens/mo | Predictable cost | | **๐Ÿ†“ FREE** | iFlow | $0 | Unlimited | 8 models free | | | Qwen | $0 | Unlimited | 3 models free | | | Kiro | $0 | Unlimited | Claude free | **๐Ÿ’ก Pro Tip:** Start with Gemini CLI (180K free/month) + iFlow (unlimited free) combo = $0 cost! --- ## ๐ŸŽฏ Use Cases ### Case 1: "I have Claude Pro subscription" **Problem:** Quota expires unused, rate limits during heavy coding ``` Combo: "maximize-claude" 1. cc/claude-opus-4-6 (use subscription fully) 2. glm/glm-4.7 (cheap backup when quota out) 3. if/kimi-k2-thinking (free emergency fallback) Monthly cost: $20 (subscription) + ~$5 (backup) = $25 total vs. $20 + hitting limits = frustration ``` ### Case 2: "I want zero cost" **Problem:** Can't afford subscriptions, need reliable AI coding ``` Combo: "free-forever" 1. gc/gemini-3-flash (180K free/month) 2. if/kimi-k2-thinking (unlimited free) 3. qw/qwen3-coder-plus (unlimited free) Monthly cost: $0 Quality: Production-ready models ``` ### Case 3: "I need 24/7 coding, no interruptions" **Problem:** Deadlines, can't afford downtime ``` Combo: "always-on" 1. cc/claude-opus-4-6 (best quality) 2. cx/gpt-5.2-codex (second subscription) 3. glm/glm-4.7 (cheap, resets daily) 4. minimax/MiniMax-M2.1 (cheapest, 5h reset) 5. if/kimi-k2-thinking (free unlimited) Result: 5 layers of fallback = zero downtime Monthly cost: $20-200 (subscriptions) + $10-20 (backup) ``` ### Case 4: "I want FREE AI in OpenClaw" **Problem:** Need AI assistant in messaging apps, completely free ``` Combo: "openclaw-free" 1. if/glm-4.7 (unlimited free) 2. if/minimax-m2.1 (unlimited free) 3. if/kimi-k2-thinking (unlimited free) Monthly cost: $0 Access via: WhatsApp, Telegram, Slack, Discord, iMessage, Signal... ``` --- ## ๐Ÿ“– Provider Setup ### ๐Ÿ” Subscription Providers #### Claude Code (Pro/Max) ```bash Dashboard โ†’ Providers โ†’ Connect Claude Code โ†’ OAuth login โ†’ Auto token refresh โ†’ 5-hour + weekly quota tracking Models: cc/claude-opus-4-6 cc/claude-sonnet-4-5-20250929 cc/claude-haiku-4-5-20251001 ``` **Pro Tip:** Use Opus for complex tasks, Sonnet for speed. OmniRoute tracks quota per model! #### OpenAI Codex (Plus/Pro) ```bash Dashboard โ†’ Providers โ†’ Connect Codex โ†’ OAuth login (port 1455) โ†’ 5-hour + weekly reset Models: cx/gpt-5.2-codex cx/gpt-5.1-codex-max ``` #### Gemini CLI (FREE 180K/month!) ```bash Dashboard โ†’ Providers โ†’ Connect Gemini CLI โ†’ Google OAuth โ†’ 180K completions/month + 1K/day Models: gc/gemini-3-flash-preview gc/gemini-2.5-pro ``` **Best Value:** Huge free tier! Use this before paid tiers. #### GitHub Copilot ```bash Dashboard โ†’ Providers โ†’ Connect GitHub โ†’ OAuth via GitHub โ†’ Monthly reset (1st of month) Models: gh/gpt-5 gh/claude-4.5-sonnet gh/gemini-3-pro ``` ### ๐Ÿ’ฐ Cheap Providers #### GLM-4.7 (Daily reset, $0.6/1M) 1. Sign up: [Zhipu AI](https://open.bigmodel.cn/) 2. Get API key from Coding Plan 3. Dashboard โ†’ Add API Key: Provider: `glm`, API Key: `your-key` **Use:** `glm/glm-4.7` โ€” **Pro Tip:** Coding Plan offers 3ร— quota at 1/7 cost! Reset daily 10:00 AM. #### MiniMax M2.1 (5h reset, $0.20/1M) 1. Sign up: [MiniMax](https://www.minimax.io/) 2. Get API key โ†’ Dashboard โ†’ Add API Key **Use:** `minimax/MiniMax-M2.1` โ€” **Pro Tip:** Cheapest option for long context (1M tokens)! #### Kimi K2 ($9/month flat) 1. Subscribe: [Moonshot AI](https://platform.moonshot.ai/) 2. Get API key โ†’ Dashboard โ†’ Add API Key **Use:** `kimi/kimi-latest` โ€” **Pro Tip:** Fixed $9/month for 10M tokens = $0.90/1M effective cost! ### ๐Ÿ†“ FREE Providers #### iFlow (8 FREE models) ```bash Dashboard โ†’ Connect iFlow โ†’ OAuth login โ†’ Unlimited usage Models: if/kimi-k2-thinking, if/qwen3-coder-plus, if/glm-4.7, if/minimax-m2, if/deepseek-r1 ``` #### Qwen (3 FREE models) ```bash Dashboard โ†’ Connect Qwen โ†’ Device code auth โ†’ Unlimited usage Models: qw/qwen3-coder-plus, qw/qwen3-coder-flash ``` #### Kiro (Claude FREE) ```bash Dashboard โ†’ Connect Kiro โ†’ AWS Builder ID or Google/GitHub โ†’ Unlimited Models: kr/claude-sonnet-4.5, kr/claude-haiku-4.5 ``` --- ## ๐ŸŽจ Combos ### Example 1: Maximize Subscription โ†’ Cheap Backup ``` Dashboard โ†’ Combos โ†’ Create New Name: premium-coding Models: 1. cc/claude-opus-4-6 (Subscription primary) 2. glm/glm-4.7 (Cheap backup, $0.6/1M) 3. minimax/MiniMax-M2.1 (Cheapest fallback, $0.20/1M) Use in CLI: premium-coding ``` ### Example 2: Free-Only (Zero Cost) ``` Name: free-combo Models: 1. gc/gemini-3-flash-preview (180K free/month) 2. if/kimi-k2-thinking (unlimited) 3. qw/qwen3-coder-plus (unlimited) Cost: $0 forever! ``` --- ## ๐Ÿ”ง CLI Integration ### Cursor IDE ``` Settings โ†’ Models โ†’ Advanced: OpenAI API Base URL: http://localhost:20128/v1 OpenAI API Key: [from omniroute dashboard] Model: cc/claude-opus-4-6 ``` ### Claude Code Edit `~/.claude/config.json`: ```json { "anthropic_api_base": "http://localhost:20128/v1", "anthropic_api_key": "your-omniroute-api-key" } ``` ### Codex CLI ```bash export OPENAI_BASE_URL="http://localhost:20128" export OPENAI_API_KEY="your-omniroute-api-key" codex "your prompt" ``` ### OpenClaw Edit `~/.openclaw/openclaw.json`: ```json { "agents": { "defaults": { "model": { "primary": "omniroute/if/glm-4.7" } } }, "models": { "providers": { "omniroute": { "baseUrl": "http://localhost:20128/v1", "apiKey": "your-omniroute-api-key", "api": "openai-completions", "models": [{ "id": "if/glm-4.7", "name": "glm-4.7" }] } } } } ``` **Or use Dashboard:** CLI Tools โ†’ OpenClaw โ†’ Auto-config ### Cline / Continue / RooCode ``` Provider: OpenAI Compatible Base URL: http://localhost:20128/v1 API Key: [from dashboard] Model: cc/claude-opus-4-6 ``` --- ## ๐Ÿš€ Deployment ### Global npm install (Recommended) ```bash npm install -g omniroute # Create config directory mkdir -p ~/.omniroute # Create .env file (see .env.example) cp .env.example ~/.omniroute/.env # Start server omniroute # Or with custom port: omniroute --port 3000 ``` The CLI automatically loads `.env` from `~/.omniroute/.env` or `./.env`. ### VPS Deployment ```bash git clone https://github.com/diegosouzapw/OmniRoute.git cd OmniRoute && npm install && npm run build export JWT_SECRET="your-secure-secret-change-this" export INITIAL_PASSWORD="your-password" export DATA_DIR="/var/lib/omniroute" export PORT="20128" export HOSTNAME="0.0.0.0" export NODE_ENV="production" export NEXT_PUBLIC_BASE_URL="http://localhost:20128" export API_KEY_SECRET="endpoint-proxy-api-key-secret" npm run start # Or: pm2 start npm --name omniroute -- start ``` ### PM2 Deployment (Low Memory) For servers with limited RAM, use the memory limit option: ```bash # With 512MB limit (default) pm2 start npm --name omniroute -- start # Or with custom memory limit OMNIROUTE_MEMORY_MB=512 pm2 start npm --name omniroute -- start # Or using ecosystem.config.js pm2 start ecosystem.config.js ``` Create `ecosystem.config.js`: ```javascript module.exports = { apps: [ { name: "omniroute", script: "npm", args: "start", env: { NODE_ENV: "production", OMNIROUTE_MEMORY_MB: "512", JWT_SECRET: "your-secret", INITIAL_PASSWORD: "your-password", }, node_args: "--max-old-space-size=512", max_memory_restart: "300M", }, ], }; ``` ### Docker ```bash # Build image (default = runner-cli with codex/claude/droid preinstalled) docker build -t omniroute:cli . # Portable mode (recommended) docker run -d --name omniroute -p 20128:20128 --env-file ./.env -v omniroute-data:/app/data omniroute:cli ``` For host-integrated mode with CLI binaries, see the Docker section in the main docs. ### Environment Variables | Variable | Default | Description | | ------------------------- | ------------------------------------ | ------------------------------------------------------- | | `JWT_SECRET` | `omniroute-default-secret-change-me` | JWT signing secret (**change in production**) | | `INITIAL_PASSWORD` | `123456` | First login password | | `DATA_DIR` | `~/.omniroute` | Data directory (db, usage, logs) | | `PORT` | framework default | Service port (`20128` in examples) | | `HOSTNAME` | framework default | Bind host (Docker defaults to `0.0.0.0`) | | `NODE_ENV` | runtime default | Set `production` for deploy | | `BASE_URL` | `http://localhost:20128` | Server-side internal base URL | | `CLOUD_URL` | `https://omniroute.dev` | Cloud sync endpoint base URL | | `API_KEY_SECRET` | `endpoint-proxy-api-key-secret` | HMAC secret for generated API keys | | `REQUIRE_API_KEY` | `false` | Enforce Bearer API key on `/v1/*` | | `ENABLE_REQUEST_LOGS` | `false` | Enables request/response logs | | `AUTH_COOKIE_SECURE` | `false` | Force `Secure` auth cookie (behind HTTPS reverse proxy) | | `OMNIROUTE_MEMORY_MB` | `512` | Node.js heap limit in MB | | `PROMPT_CACHE_MAX_SIZE` | `50` | Max prompt cache entries | | `SEMANTIC_CACHE_MAX_SIZE` | `100` | Max semantic cache entries | For the full environment variable reference, see the [README](../README.md). --- ## ๐Ÿ“Š Available Models
View all available models **Claude Code (`cc/`)** โ€” Pro/Max: `cc/claude-opus-4-6`, `cc/claude-sonnet-4-5-20250929`, `cc/claude-haiku-4-5-20251001` **Codex (`cx/`)** โ€” Plus/Pro: `cx/gpt-5.2-codex`, `cx/gpt-5.1-codex-max` **Gemini CLI (`gc/`)** โ€” FREE: `gc/gemini-3-flash-preview`, `gc/gemini-2.5-pro` **GitHub Copilot (`gh/`)**: `gh/gpt-5`, `gh/claude-4.5-sonnet` **GLM (`glm/`)** โ€” $0.6/1M: `glm/glm-4.7` **MiniMax (`minimax/`)** โ€” $0.2/1M: `minimax/MiniMax-M2.1` **iFlow (`if/`)** โ€” FREE: `if/kimi-k2-thinking`, `if/qwen3-coder-plus`, `if/deepseek-r1` **Qwen (`qw/`)** โ€” FREE: `qw/qwen3-coder-plus`, `qw/qwen3-coder-flash` **Kiro (`kr/`)** โ€” FREE: `kr/claude-sonnet-4.5`, `kr/claude-haiku-4.5` **DeepSeek (`ds/`)**: `ds/deepseek-chat`, `ds/deepseek-reasoner` **Groq (`groq/`)**: `groq/llama-3.3-70b-versatile`, `groq/llama-4-maverick-17b-128e-instruct` **xAI (`xai/`)**: `xai/grok-4`, `xai/grok-4-0709-fast-reasoning`, `xai/grok-code-mini` **Mistral (`mistral/`)**: `mistral/mistral-large-2501`, `mistral/codestral-2501` **Perplexity (`pplx/`)**: `pplx/sonar-pro`, `pplx/sonar` **Together AI (`together/`)**: `together/meta-llama/Llama-3.3-70B-Instruct-Turbo` **Fireworks AI (`fireworks/`)**: `fireworks/accounts/fireworks/models/deepseek-v3p1` **Cerebras (`cerebras/`)**: `cerebras/llama-3.3-70b` **Cohere (`cohere/`)**: `cohere/command-r-plus-08-2024` **NVIDIA NIM (`nvidia/`)**: `nvidia/nvidia/llama-3.3-70b-instruct`
--- ## ๐Ÿงฉ Advanced Features ### Custom Models Add any model ID to any provider without waiting for an app update: ```bash # Via API curl -X POST http://localhost:20128/api/provider-models \ -H "Content-Type: application/json" \ -d '{"provider": "openai", "modelId": "gpt-4.5-preview", "modelName": "GPT-4.5 Preview"}' # List: curl http://localhost:20128/api/provider-models?provider=openai # Remove: curl -X DELETE "http://localhost:20128/api/provider-models?provider=openai&model=gpt-4.5-preview" ``` Or use Dashboard: **Providers โ†’ [Provider] โ†’ Custom Models**. ### Dedicated Provider Routes Route requests directly to a specific provider with model validation: ```bash POST http://localhost:20128/v1/providers/openai/chat/completions POST http://localhost:20128/v1/providers/openai/embeddings POST http://localhost:20128/v1/providers/fireworks/images/generations ``` The provider prefix is auto-added if missing. Mismatched models return `400`. ### Network Proxy Configuration ```bash # Set global proxy curl -X PUT http://localhost:20128/api/settings/proxy \ -d '{"global": {"type":"http","host":"proxy.example.com","port":"8080"}}' # Per-provider proxy curl -X PUT http://localhost:20128/api/settings/proxy \ -d '{"providers": {"openai": {"type":"socks5","host":"proxy.example.com","port":"1080"}}}' # Test proxy curl -X POST http://localhost:20128/api/settings/proxy/test \ -d '{"proxy":{"type":"socks5","host":"proxy.example.com","port":"1080"}}' ``` **Precedence:** Key-specific โ†’ Combo-specific โ†’ Provider-specific โ†’ Global โ†’ Environment. ### Model Catalog API ```bash curl http://localhost:20128/api/models/catalog ``` Returns models grouped by provider with types (`chat`, `embedding`, `image`). ### Cloud Sync - Sync providers, combos, and settings across devices - Automatic background sync with timeout + fail-fast - Prefer server-side `BASE_URL`/`CLOUD_URL` in production ### LLM Gateway Intelligence (Phase 9) - **Semantic Cache** โ€” Auto-caches non-streaming, temperature=0 responses (bypass with `X-OmniRoute-No-Cache: true`) - **Request Idempotency** โ€” Deduplicates requests within 5s via `Idempotency-Key` or `X-Request-Id` header - **Progress Tracking** โ€” Opt-in SSE `event: progress` events via `X-OmniRoute-Progress: true` header --- ### Translator Playground Access via **Dashboard โ†’ Translator**. Debug and visualize how OmniRoute translates API requests between providers. | Mode | Purpose | | ---------------- | -------------------------------------------------------------------------------------- | | **Playground** | Select source/target formats, paste a request, and see the translated output instantly | | **Chat Tester** | Send live chat messages through the proxy and inspect the full request/response cycle | | **Test Bench** | Run batch tests across multiple format combinations to verify translation correctness | | **Live Monitor** | Watch real-time translations as requests flow through the proxy | **Use cases:** - Debug why a specific client/provider combination fails - Verify that thinking tags, tool calls, and system prompts translate correctly - Compare format differences between OpenAI, Claude, Gemini, and Responses API formats --- ### Routing Strategies Configure via **Dashboard โ†’ Settings โ†’ Routing**. | Strategy | Description | | ------------------------------ | ------------------------------------------------------------------------------------------------ | | **Fill First** | Uses accounts in priority order โ€” primary account handles all requests until unavailable | | **Round Robin** | Cycles through all accounts with a configurable sticky limit (default: 3 calls per account) | | **P2C (Power of Two Choices)** | Picks 2 random accounts and routes to the healthier one โ€” balances load with awareness of health | | **Random** | Randomly selects an account for each request using Fisher-Yates shuffle | | **Least Used** | Routes to the account with the oldest `lastUsedAt` timestamp, distributing traffic evenly | | **Cost Optimized** | Routes to the account with the lowest priority value, optimizing for lowest-cost providers | #### Wildcard Model Aliases Create wildcard patterns to remap model names: ``` Pattern: claude-sonnet-* โ†’ Target: cc/claude-sonnet-4-5-20250929 Pattern: gpt-* โ†’ Target: gh/gpt-5.1-codex ``` Wildcards support `*` (any characters) and `?` (single character). #### Fallback Chains Define global fallback chains that apply across all requests: ``` Chain: production-fallback 1. cc/claude-opus-4-6 2. gh/gpt-5.1-codex 3. glm/glm-4.7 ``` --- ### Resilience & Circuit Breakers Configure via **Dashboard โ†’ Settings โ†’ Resilience**. OmniRoute implements provider-level resilience with four components: 1. **Provider Profiles** โ€” Per-provider configuration for: - Failure threshold (how many failures before opening) - Cooldown duration - Rate limit detection sensitivity - Exponential backoff parameters 2. **Editable Rate Limits** โ€” System-level defaults configurable in the dashboard: - **Requests Per Minute (RPM)** โ€” Maximum requests per minute per account - **Min Time Between Requests** โ€” Minimum gap in milliseconds between requests - **Max Concurrent Requests** โ€” Maximum simultaneous requests per account - Click **Edit** to modify, then **Save** or **Cancel**. Values persist via the resilience API. 3. **Circuit Breaker** โ€” Tracks failures per provider and automatically opens the circuit when a threshold is reached: - **CLOSED** (Healthy) โ€” Requests flow normally - **OPEN** โ€” Provider is temporarily blocked after repeated failures - **HALF_OPEN** โ€” Testing if provider has recovered 4. **Policies & Locked Identifiers** โ€” Shows circuit breaker status and locked identifiers with force-unlock capability. 5. **Rate Limit Auto-Detection** โ€” Monitors `429` and `Retry-After` headers to proactively avoid hitting provider rate limits. **Pro Tip:** Use **Reset All** button to clear all circuit breakers and cooldowns when a provider recovers from an outage. --- ### Database Export / Import Manage database backups in **Dashboard โ†’ Settings โ†’ System & Storage**. | Action | Description | | ------------------------ | ------------------------------------------------------------------------------------------------------------------------------ | | **Export Database** | Downloads the current SQLite database as a `.sqlite` file | | **Export All (.tar.gz)** | Downloads a full backup archive including: database, settings, combos, provider connections (no credentials), API key metadata | | **Import Database** | Upload a `.sqlite` file to replace the current database. A pre-import backup is automatically created | ```bash # API: Export database curl -o backup.sqlite http://localhost:20128/api/db-backups/export # API: Export all (full archive) curl -o backup.tar.gz http://localhost:20128/api/db-backups/exportAll # API: Import database curl -X POST http://localhost:20128/api/db-backups/import \ -F "file=@backup.sqlite" ``` **Import Validation:** The imported file is validated for integrity (SQLite pragma check), required tables (`provider_connections`, `provider_nodes`, `combos`, `api_keys`), and size (max 100MB). **Use Cases:** - Migrate OmniRoute between machines - Create external backups for disaster recovery - Share configurations between team members (export all โ†’ share archive) --- ### Settings Dashboard The settings page is organized into 5 tabs for easy navigation: | Tab | Contents | | -------------- | ---------------------------------------------------------------------------------------------- | | **Security** | Login/Password settings, IP Access Control, API auth for `/models`, and Provider Blocking | | **Routing** | Global routing strategy (6 options), wildcard model aliases, fallback chains, combo defaults | | **Resilience** | Provider profiles, editable rate limits, circuit breaker status, policies & locked identifiers | | **AI** | Thinking budget configuration, global system prompt injection, prompt cache stats | | **Advanced** | Global proxy configuration (HTTP/SOCKS5) | --- ### Costs & Budget Management Access via **Dashboard โ†’ Costs**. | Tab | Purpose | | ----------- | ---------------------------------------------------------------------------------------- | | **Budget** | Set spending limits per API key with daily/weekly/monthly budgets and real-time tracking | | **Pricing** | View and edit model pricing entries โ€” cost per 1K input/output tokens per provider | ```bash # API: Set a budget curl -X POST http://localhost:20128/api/usage/budget \ -H "Content-Type: application/json" \ -d '{"keyId": "key-123", "limit": 50.00, "period": "monthly"}' # API: Get current budget status curl http://localhost:20128/api/usage/budget ``` **Cost Tracking:** Every request logs token usage and calculates cost using the pricing table. View breakdowns in **Dashboard โ†’ Usage** by provider, model, and API key. --- ### Audio Transcription OmniRoute supports audio transcription via the OpenAI-compatible endpoint: ```bash POST /v1/audio/transcriptions Authorization: Bearer your-api-key Content-Type: multipart/form-data # Example with curl curl -X POST http://localhost:20128/v1/audio/transcriptions \ -H "Authorization: Bearer your-api-key" \ -F "file=@audio.mp3" \ -F "model=deepgram/nova-3" ``` Available providers: **Deepgram** (`deepgram/`), **AssemblyAI** (`assemblyai/`). Supported audio formats: `mp3`, `wav`, `m4a`, `flac`, `ogg`, `webm`. --- ### Combo Balancing Strategies Configure per-combo balancing in **Dashboard โ†’ Combos โ†’ Create/Edit โ†’ Strategy**. | Strategy | Description | | ------------------ | ------------------------------------------------------------------------ | | **Round-Robin** | Rotates through models sequentially | | **Priority** | Always tries the first model; falls back only on error | | **Random** | Picks a random model from the combo for each request | | **Weighted** | Routes proportionally based on assigned weights per model | | **Least-Used** | Routes to the model with the fewest recent requests (uses combo metrics) | | **Cost-Optimized** | Routes to the cheapest available model (uses pricing table) | Global combo defaults can be set in **Dashboard โ†’ Settings โ†’ Routing โ†’ Combo Defaults**. --- ### Health Dashboard Access via **Dashboard โ†’ Health**. Real-time system health overview with 6 cards: | Card | What It Shows | | --------------------- | ----------------------------------------------------------- | | **System Status** | Uptime, version, memory usage, data directory | | **Provider Health** | Per-provider circuit breaker state (Closed/Open/Half-Open) | | **Rate Limits** | Active rate limit cooldowns per account with remaining time | | **Active Lockouts** | Providers temporarily blocked by the lockout policy | | **Signature Cache** | Deduplication cache stats (active keys, hit rate) | | **Latency Telemetry** | p50/p95/p99 latency aggregation per provider | **Pro Tip:** The Health page auto-refreshes every 10 seconds. Use the circuit breaker card to identify which providers are experiencing issues. --- ## ๐Ÿ–ฅ๏ธ Desktop Application (Electron) OmniRoute is available as a native desktop application for Windows, macOS, and Linux. ### Installation ```bash # From the electron directory: cd electron npm install # Development mode (connect to running Next.js dev server): npm run dev # Production mode (uses standalone build): npm start ``` ### Building Installers ```bash cd electron npm run build # Current platform npm run build:win # Windows (.exe NSIS) npm run build:mac # macOS (.dmg universal) npm run build:linux # Linux (.AppImage) ``` Output โ†’ `electron/dist-electron/` ### Key Features | Feature | Description | | --------------------------- | ---------------------------------------------------- | | **Server Readiness** | Polls server before showing window (no blank screen) | | **System Tray** | Minimize to tray, change port, quit from tray menu | | **Port Management** | Change server port from tray (auto-restarts server) | | **Content Security Policy** | Restrictive CSP via session headers | | **Single Instance** | Only one app instance can run at a time | | **Offline Mode** | Bundled Next.js server works without internet | ### Environment Variables | Variable | Default | Description | | --------------------- | ------- | -------------------------------- | | `OMNIROUTE_PORT` | `20128` | Server port | | `OMNIROUTE_MEMORY_MB` | `512` | Node.js heap limit (64โ€“16384 MB) | ๐Ÿ“– Full documentation: [`electron/README.md`](../electron/README.md)