🌐 Languages: 🇺🇸 English | 🇧🇷 Português (Brasil) | 🇪🇸 Español | 🇫🇷 Français | 🇮🇹 Italiano | 🇷🇺 Русский | 🇨🇳 中文 (简体) | 🇩🇪 Deutsch | 🇮🇳 हिन्दी | 🇹🇭 ไทย | 🇺🇦 Українська | 🇸🇦 العربية | 🇯🇵 日本語 | 🇻🇳 Tiếng Việt | 🇧🇬 Български | 🇩🇰 Dansk | 🇫🇮 Suomi | 🇮🇱 עברית | 🇭🇺 Magyar | 🇮🇩 Bahasa Indonesia | 🇰🇷 한국어 | 🇲🇾 Bahasa Melayu | 🇳🇱 Nederlands | 🇳🇴 Norsk | 🇵🇹 Português (Portugal) | 🇷🇴 Română | 🇵🇱 Polski | 🇸🇰 Slovenčina | 🇸🇪 Svenska | 🇵🇭 Filipino
Common problems and solutions for OmniRoute.
| Problem | Solution |
|---|---|
| First login not working | Check INITIAL_PASSWORD in .env (default: 123456) |
| Dashboard opens on wrong port | Set PORT=20128 and NEXT_PUBLIC_BASE_URL=http://localhost:20128 |
No request logs under logs/ |
Set ENABLE_REQUEST_LOGS=true |
| EACCES: permission denied | Set DATA_DIR=/path/to/writable/dir to override ~/.omniroute |
| Routing strategy not saving | Update to v1.4.11+ (Zod schema fix for settings persistence) |
Cause: Provider quota exhausted.
Fix:
- Check dashboard quota tracker
- Use a combo with fallback tiers
- Switch to cheaper/free tier
Cause: Subscription quota exhausted.
Fix:
- Add fallback:
cc/claude-opus-4-6 → glm/glm-4.7 → if/kimi-k2-thinking - Use GLM/MiniMax as cheap backup
OmniRoute auto-refreshes tokens. If issues persist:
- Dashboard → Provider → Reconnect
- Delete and re-add the provider connection
- Verify
BASE_URLpoints to your running instance (e.g.,http://localhost:20128) - Verify
CLOUD_URLpoints to your cloud endpoint (e.g.,https://omniroute.dev) - Keep
NEXT_PUBLIC_*values aligned with server-side values
Symptom: Unexpected token 'd'... on cloud endpoint for non-streaming calls.
Cause: Upstream returns SSE payload while client expects JSON.
Workaround: Use stream=true for cloud direct calls. Local runtime includes SSE→JSON fallback.
- Create a fresh key from local dashboard (
/api/keys) - Run cloud sync: Enable Cloud → Sync Now
- Old/non-synced keys can still return
401on cloud
- Check runtime fields:
curl http://localhost:20128/api/cli-tools/runtime/codex | jq - For portable mode: use image target
runner-cli(bundled CLIs) - For host mount mode: set
CLI_EXTRA_PATHSand mount host bin directory as read-only - If
installed=trueandrunnable=false: binary was found but failed healthcheck
curl -s http://localhost:20128/api/cli-tools/codex-settings | jq '{installed,runnable,commandPath,runtimeMode,reason}'
curl -s http://localhost:20128/api/cli-tools/claude-settings | jq '{installed,runnable,commandPath,runtimeMode,reason}'
curl -s http://localhost:20128/api/cli-tools/openclaw-settings | jq '{installed,runnable,commandPath,runtimeMode,reason}'- Check usage stats in Dashboard → Usage
- Switch primary model to GLM/MiniMax
- Use free tier (Gemini CLI, iFlow) for non-critical tasks
- Set cost budgets per API key: Dashboard → API Keys → Budget
Set ENABLE_REQUEST_LOGS=true in your .env file. Logs appear under logs/ directory.
# Health dashboard
http://localhost:20128/dashboard/health
# API health check
curl http://localhost:20128/api/monitoring/health- Main state:
${DATA_DIR}/db.json(providers, combos, aliases, keys, settings) - Usage:
${DATA_DIR}/usage.json,${DATA_DIR}/log.txt,${DATA_DIR}/call_logs/ - Request logs:
<repo>/logs/...(whenENABLE_REQUEST_LOGS=true)
When a provider's circuit breaker is OPEN, requests are blocked until the cooldown expires.
Fix:
- Go to Dashboard → Settings → Resilience
- Check the circuit breaker card for the affected provider
- Click Reset All to clear all breakers, or wait for the cooldown to expire
- Verify the provider is actually available before resetting
If a provider repeatedly enters OPEN state:
- Check Dashboard → Health → Provider Health for the failure pattern
- Go to Settings → Resilience → Provider Profiles and increase the failure threshold
- Check if the provider has changed API limits or requires re-authentication
- Review latency telemetry — high latency may cause timeout-based failures
- Ensure you're using the correct prefix:
deepgram/nova-3orassemblyai/best - Verify the provider is connected in Dashboard → Providers
- Check supported audio formats:
mp3,wav,m4a,flac,ogg,webm - Verify file size is within provider limits (typically < 25MB)
- Check provider API key validity in the provider card
Use Dashboard → Translator to debug format translation issues:
| Mode | When to Use |
|---|---|
| Playground | Compare input/output formats side by side — paste a failing request to see how it translates |
| Chat Tester | Send live messages and inspect the full request/response payload including headers |
| Test Bench | Run batch tests across format combinations to find which translations are broken |
| Live Monitor | Watch real-time request flow to catch intermittent translation issues |
- Thinking tags not appearing — Check if the target provider supports thinking and the thinking budget setting
- Tool calls dropping — Some format translations may strip unsupported fields; verify in Playground mode
- System prompt missing — Claude and Gemini handle system prompts differently; check translation output
- SDK returns raw string instead of object — Fixed in v1.1.0: response sanitizer now strips non-standard fields (
x_groq,usage_breakdown, etc.) that cause OpenAI SDK Pydantic validation failures - GLM/ERNIE rejects
systemrole — Fixed in v1.1.0: role normalizer automatically merges system messages into user messages for incompatible models developerrole not recognized — Fixed in v1.1.0: automatically converted tosystemfor non-OpenAI providersjson_schemanot working with Gemini — Fixed in v1.1.0:response_formatis now converted to Gemini'sresponseMimeType+responseSchema
- Auto rate-limit only applies to API key providers (not OAuth/subscription)
- Verify Settings → Resilience → Provider Profiles has auto-rate-limit enabled
- Check if the provider returns
429status codes orRetry-Afterheaders
Provider profiles support these settings:
- Base delay — Initial wait time after first failure (default: 1s)
- Max delay — Maximum wait time cap (default: 30s)
- Multiplier — How much to increase delay per consecutive failure (default: 2x)
When many concurrent requests hit a rate-limited provider, OmniRoute uses mutex + auto rate-limiting to serialize requests and prevent cascading failures. This is automatic for API key providers.
Some OmniRoute users place the gateway in front of RAG or agent stacks. In those setups it is common to see a strange pattern: OmniRoute looks healthy (providers up, routing profiles ok, no rate limit alerts) but the final answer is still wrong.
In practice these incidents usually come from the downstream RAG pipeline, not from the gateway itself.
If you want a shared vocabulary to describe those failures you can use the WFGY ProblemMap, an external MIT license text resource that defines sixteen recurring RAG / LLM failure patterns. At a high level it covers:
- retrieval drift and broken context boundaries
- empty or stale indexes and vector stores
- embedding versus semantic mismatch
- prompt assembly and context window issues
- logic collapse and overconfident answers
- long chain and agent coordination failures
- multi agent memory and role drift
- deployment and bootstrap ordering problems
The idea is simple:
- When you investigate a bad response, capture:
- user task and request
- route or provider combo in OmniRoute
- any RAG context used downstream (retrieved documents, tool calls, etc)
- Map the incident to one or two WFGY ProblemMap numbers (
No.1…No.16). - Store the number in your own dashboard, runbook, or incident tracker next to the OmniRoute logs.
- Use the corresponding WFGY page to decide whether you need to change your RAG stack, retriever, or routing strategy.
Full text and concrete recipes live here (MIT license, text only):
You can ignore this section if you do not run RAG or agent pipelines behind OmniRoute.
- GitHub Issues: github.com/diegosouzapw/OmniRoute/issues
- Architecture: See
docs/ARCHITECTURE.mdfor internal details - API Reference: See
docs/API_REFERENCE.mdfor all endpoints - Health Dashboard: Check Dashboard → Health for real-time system status
- Translator: Use Dashboard → Translator to debug format issues