ghostwright · mcheemaa · Apr 12, 2026 · Apr 12, 2026 · Apr 12, 2026
diff --git a/.env.example b/.env.example
@@ -3,12 +3,20 @@
 #   cp .env.example .env
 
 # ========================
-# REQUIRED
+# REQUIRED: provider credential
 # ========================
+# Phantom defaults to Anthropic. Set ANTHROPIC_API_KEY for the default setup.
+# To use a different provider (Z.AI, OpenRouter, Ollama, vLLM, LiteLLM, custom),
+# configure the `provider:` block in phantom.yaml and set the matching env var
+# below. See docs/providers.md for the full reference.
 
-# Your Anthropic API key (starts with sk-ant-)
 ANTHROPIC_API_KEY=
 
+# Alternative provider keys (set one that matches your provider block in phantom.yaml):
+# ZAI_API_KEY=
+# OPENROUTER_API_KEY=
+# LITELLM_KEY=
+
 # ========================
 # OPTIONAL: Slack
 # ========================
@@ -36,12 +44,20 @@ ANTHROPIC_API_KEY=
 # Agent role (default: swe). Options: swe, base
 # PHANTOM_ROLE=swe
 
-# Claude model for the agent brain.
+# Model for the agent brain. Keep a Claude model ID here even when using a
+# non-Anthropic provider: the bundled cli.js has hardcoded capability checks
+# against Claude model names. Use `provider.model_mappings` in phantom.yaml
+# to redirect the wire call to your actual model (e.g., glm-5.1).
 # Options:
 #   claude-sonnet-4-6   - Fast, capable, lower cost (default, recommended)
 #   claude-opus-4-6     - Most capable, higher cost
 # PHANTOM_MODEL=claude-sonnet-4-6
 
+# Provider override via env var (alternative to editing phantom.yaml).
+# Options: anthropic (default), zai, openrouter, vllm, ollama, litellm, custom
+# PHANTOM_PROVIDER_TYPE=anthropic
+# PHANTOM_PROVIDER_BASE_URL=
+
 # Domain for public URL (e.g., ghostwright.dev)
 # When set with PHANTOM_NAME, derives public URL as https://<name>.<domain>
 # PHANTOM_DOMAIN=

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -1,13 +1,13 @@
 # Phantom
 
-Phantom is an autonomous AI co-worker that runs as a persistent Bun process on a VM. It wraps the Claude Agent SDK (Opus 4.6), maintains vector-backed memory across sessions, rewrites its own configuration through a validated self-evolution engine, communicates via Slack/Telegram/Email/Webhook, and exposes all capabilities as an MCP server. 27,000+ lines of TypeScript, 822 tests, v0.18.2. Apache 2.0, repo at ghostwright/phantom.
+Phantom is an autonomous AI co-worker that runs as a persistent Bun process on a VM. It wraps the Claude Agent SDK as a subprocess (Anthropic by default, swappable via a `provider:` config block to Z.AI/GLM-5.1, OpenRouter, Ollama, vLLM, LiteLLM, or any Anthropic Messages API compatible endpoint). It maintains vector-backed memory across sessions, rewrites its own configuration through a validated self-evolution engine, communicates via Slack/Telegram/Email/Webhook, and exposes all capabilities as an MCP server. 27,000+ lines of TypeScript, 875 tests, v0.18.2. Apache 2.0, repo at ghostwright/phantom.
 
 ## Tech Stack
 
 | Layer | Technology |
 |-------|-----------|
 | Runtime | Bun (TypeScript-native, built-in SQLite, no bundler) |
-| Agent | Claude Agent SDK (`@anthropic-ai/claude-agent-sdk`) with Opus 4.6, 1M context |
+| Agent | Claude Agent SDK (`@anthropic-ai/claude-agent-sdk`) subprocess. Provider is configurable via `src/config/providers.ts`: Anthropic (default), Z.AI, OpenRouter, Ollama, vLLM, LiteLLM, custom. |
 | Memory | Qdrant (vector DB, Docker) + Ollama (nomic-embed-text, local embeddings) |
 | State | SQLite via Bun (sessions, tasks, metrics, evolution versions, scheduled jobs) |
 | Channels | Slack (Socket Mode, primary), Telegram (long polling), Email (IMAP/SMTP), Webhook (HMAC-SHA256), CLI |
@@ -41,7 +41,7 @@ If you find yourself writing a function that does something the agent can do bet
 
 ```bash
 bun install                          # Install dependencies
-bun test                             # Run 770 tests
+bun test                             # Run 875 tests
 bun run src/index.ts                 # Start the server
 bun run src/cli/main.ts init --yes   # Initialize config (reads env vars)
 bun run src/cli/main.ts doctor       # Check all subsystems

diff --git a/README.md b/README.md
@@ -7,7 +7,7 @@
 
 <p align="center">
   <a href="LICENSE"><img src="https://img.shields.io/badge/license-Apache%202.0-blue.svg" alt="License"></a>
-  <img src="https://img.shields.io/badge/tests-822%20passed-brightgreen.svg" alt="Tests">
+  <img src="https://img.shields.io/badge/tests-875%20passed-brightgreen.svg" alt="Tests">
   <a href="https://hub.docker.com/r/ghostwright/phantom"><img src="https://img.shields.io/docker/pulls/ghostwright/phantom.svg" alt="Docker Pulls"></a>
   <img src="https://img.shields.io/badge/version-0.18.2-orange.svg" alt="Version">
 </p>
@@ -75,6 +75,34 @@ A Phantom discovered [Vigil](https://github.com/baudsmithstudios/vigil), a light
 
 This is what happens when you give an AI its own computer.
 
+## Bring Your Own Model
+
+Phantom is not locked to any single AI backend. It ships with support for seven providers out of the box, configured through a single YAML block:
+
+- **Anthropic** (default) - Claude Opus, Sonnet, Haiku
+- **Z.AI** - GLM-5.1 and GLM-4.5-Air via [Z.AI's Anthropic-compatible API](https://docs.z.ai/guides/llm/glm-5). Roughly 15x cheaper than Claude Opus for comparable coding quality.
+- **OpenRouter** - 100+ models through one key
+- **Ollama** - Any GGUF model on your own GPU, zero API cost
+- **vLLM** - Self-hosted inference with OpenAI-compatible endpoints
+- **LiteLLM** - Local proxy bridging OpenAI, Gemini, and more
+- **Custom** - Any Anthropic Messages API compatible endpoint
+
+Switching providers is two lines of YAML:
+
+```yaml
+# phantom.yaml
+model: claude-sonnet-4-6
+provider:
+  type: zai
+  api_key_env: ZAI_API_KEY
+  model_mappings:
+    sonnet: glm-5.1
+```
+
+Set `ZAI_API_KEY` in `.env`, restart, done. Both the main agent and every evolution judge flow through the chosen provider from that point on. The tools are the same, the memory is the same, the self-evolution pipeline is the same. Only the brain changes.
+
+Anthropic stays the default. Existing deployments continue to work with no configuration changes. See [docs/providers.md](docs/providers.md) for the full reference.
+
 ## Quick Start
 
 ### Docker (recommended)
@@ -186,6 +214,7 @@ Because the agent that can only use pre-built tools hits a ceiling. Phantom buil
 | Feature | Why it matters |
 |---------|----------------|
 | **Its own computer** | Your laptop stays yours. The agent installs software, runs 24/7, and builds infrastructure on its own machine. |
+| **Bring your own model** | Anthropic, Z.AI (GLM-5.1), OpenRouter, Ollama, vLLM, LiteLLM, or any Anthropic Messages API compatible endpoint. Pick your backend in YAML, same agent everywhere. |
 | **Self-evolution** | The agent rewrites its own config after every session, validated by LLM judges. Day 30 knows things Day 1 didn't. |
 | **Persistent memory** | Three tiers of vector memory. Mention something on Monday, it uses it on Wednesday. No re-explaining. |
 | **Dynamic tools** | Creates and registers its own MCP tools at runtime. Tools survive restarts and work across sessions. |

diff --git a/config/phantom.yaml b/config/phantom.yaml
@@ -16,3 +16,30 @@ timeout_minutes: 240
 #     url: https://data.ghostwright.dev/mcp
 #     token: "bearer-token-for-data"
 #     description: "Data Analyst Phantom"
+
+# Provider selection. Defaults to Anthropic. Uncomment and customize to use
+# a different backend. Both the main agent and every LLM judge flow through
+# the chosen provider; authentication happens at the env var named below.
+#
+# provider:
+#   type: anthropic            # anthropic | zai | openrouter | vllm | ollama | litellm | custom
+#   # api_key_env: ANTHROPIC_API_KEY
+#
+# Example: GLM-5.1 via Z.AI's Anthropic-compatible API (15x cheaper than Opus)
+# provider:
+#   type: zai
+#   api_key_env: ZAI_API_KEY
+#   model_mappings:
+#     opus: glm-5.1
+#     sonnet: glm-5.1
+#     haiku: glm-4.5-air
+#
+# Example: Local vLLM server hosting any OpenAI-compatible model
+# provider:
+#   type: vllm
+#   base_url: http://localhost:8000
+#
+# Example: Local Ollama (free, runs on your GPU)
+# provider:
+#   type: ollama
+#   base_url: http://localhost:11434
diff --git a/docs/getting-started.md b/docs/getting-started.md
@@ -74,7 +74,25 @@ Open `.env` in your editor and fill in these values:
 ANTHROPIC_API_KEY=sk-ant-your-key-here
 ```
 
-Your Anthropic API key. This is the only value you absolutely must set.
+Your Anthropic API key. This is the only value you absolutely must set for the default setup.
+
+**Using a different provider?** Phantom supports Z.AI (GLM-5.1, ~15x cheaper than Claude Opus), OpenRouter, Ollama, vLLM, LiteLLM, and custom endpoints. For example, to run Phantom on Z.AI:
+
+```
+ZAI_API_KEY=your-zai-key
+```
+
+Then add this to `phantom.yaml`:
+
+```yaml
+provider:
+  type: zai
+  api_key_env: ZAI_API_KEY
+  model_mappings:
+    sonnet: glm-5.1
+```
+
+See [docs/providers.md](providers.md) for the full provider reference.
 
 ### Slack (recommended)
 

diff --git a/docs/providers.md b/docs/providers.md
@@ -0,0 +1,188 @@
+# Provider Configuration
+
+Phantom routes every LLM query (the main agent and every evolution judge) through the Claude Agent SDK as a subprocess. By setting environment variables that the bundled `cli.js` already honors, you can point that subprocess at any Anthropic Messages API compatible endpoint without changing a line of code.
+
+The `provider:` block in `phantom.yaml` is a small config surface that translates into those environment variables for you.
+
+## Supported Providers
+
+| Type | Base URL | API Key Env | Notes |
+|------|----------|-------------|-------|
+| `anthropic` (default) | `https://api.anthropic.com` | `ANTHROPIC_API_KEY` | Claude Opus, Sonnet, Haiku |
+| `zai` | `https://api.z.ai/api/anthropic` | `ZAI_API_KEY` | GLM-5.1 and GLM-4.5-Air, roughly 15x cheaper than Opus |
+| `openrouter` | `https://openrouter.ai/api/v1` | `OPENROUTER_API_KEY` | 100+ models through a single key |
+| `vllm` | `http://localhost:8000` | none | Self-hosted OpenAI-compatible inference |
+| `ollama` | `http://localhost:11434` | none | Local GGUF models, zero API cost |
+| `litellm` | `http://localhost:4000` | `LITELLM_KEY` | Local proxy bridging OpenAI, Gemini, and others |
+| `custom` | (you set it) | (you set it) | Any Anthropic Messages API compatible endpoint |
+
+## Quick Reference
+
+### Anthropic (default)
+
+No configuration needed. Existing deployments continue to work unchanged.
+
+```yaml
+# phantom.yaml
+model: claude-sonnet-4-6
+# No provider block = defaults to anthropic
+```
+
+```bash
+# .env
+ANTHROPIC_API_KEY=sk-ant-...
+```
+
+### Z.AI / GLM-5.1
+
+Z.AI provides an Anthropic Messages API compatible endpoint at `https://api.z.ai/api/anthropic`. Phantom ships with a `zai` preset that points there automatically. Get a key at [docs.z.ai](https://docs.z.ai/guides/llm/glm-5).
+
+```yaml
+# phantom.yaml
+model: claude-sonnet-4-6
+provider:
+  type: zai
+  api_key_env: ZAI_API_KEY
+  model_mappings:
+    opus: glm-5.1
+    sonnet: glm-5.1
+    haiku: glm-4.5-air
+```
+
+```bash
+# .env
+ZAI_API_KEY=<your-zai-key>
+```
+
+Both the main agent and every evolution judge route through Z.AI. The `claude-sonnet-4-6` model name is translated to `glm-5.1` on the wire by the `model_mappings` block.
+
+### Ollama (local, free)
+
+Run any GGUF model on your own GPU. No API key needed.
+
+```yaml
+# phantom.yaml
+model: claude-sonnet-4-6
+provider:
+  type: ollama
+  model_mappings:
+    opus: qwen3-coder:32b
+    sonnet: qwen3-coder:32b
+    haiku: qwen3-coder:14b
+```
+
+Ollama must be running at `http://localhost:11434` (the preset default). The model must support function calling to work with Phantom's agent loop.
+
+### vLLM (self-hosted)
+
+For organizations running their own inference clusters.
+
+```yaml
+# phantom.yaml
+model: claude-sonnet-4-6
+provider:
+  type: vllm
+  base_url: http://your-vllm-server:8000
+  model_mappings:
+    sonnet: your-model-name
+  timeout_ms: 300000  # local models can be slow on first call
+```
+
+Start vLLM with `--tool-call-parser` matching your model for tool use to work.
+
+### OpenRouter
+
+Access 100+ models through a single OpenRouter key.
+
+```yaml
+# phantom.yaml
+model: claude-sonnet-4-6
+provider:
+  type: openrouter
+  api_key_env: OPENROUTER_API_KEY
+  model_mappings:
+    sonnet: anthropic/claude-sonnet-4.5
+```
+
+### LiteLLM (proxy)
+
+Run a local LiteLLM proxy to bridge OpenAI, Gemini, and other formats.
+
+```yaml
+# phantom.yaml
+model: claude-sonnet-4-6
+provider:
+  type: litellm
+  api_key_env: LITELLM_KEY
+  # base_url defaults to http://localhost:4000
+```
+
+### Custom endpoint
+
+For any Anthropic Messages API compatible proxy (LM Studio, custom internal gateways, etc.).
+
+```yaml
+# phantom.yaml
+model: claude-sonnet-4-6
+provider:
+  type: custom
+  base_url: https://your-proxy.internal/anthropic
+  api_key_env: YOUR_CUSTOM_KEY_ENV
+```
+
+## Configuration Fields
+
+| Field | Type | Default | Purpose |
+|-------|------|---------|---------|
+| `type` | enum | `anthropic` | One of the supported provider types |
+| `base_url` | URL | preset default | Override the endpoint URL |
+| `api_key_env` | string | preset default | Name of the env var holding the credential |
+| `model_mappings.opus` | string | none | Concrete model ID for the opus tier |
+| `model_mappings.sonnet` | string | none | Concrete model ID for the sonnet tier |
+| `model_mappings.haiku` | string | none | Concrete model ID for the haiku tier |
+| `disable_betas` | boolean | preset default | Sets `CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1`. Defaulted true for every non-anthropic preset. |
+| `timeout_ms` | number | none | Sets `API_TIMEOUT_MS` for slow local inference |
+
+## Environment Variable Overrides
+
+For operators who prefer env variables over YAML edits:
+
+| Variable | Effect |
+|----------|--------|
+| `PHANTOM_PROVIDER_TYPE` | Override `provider.type` (validated against the supported values) |
+| `PHANTOM_PROVIDER_BASE_URL` | Override `provider.base_url` (validated as a URL) |
+| `PHANTOM_MODEL` | Override `config.model` |
+
+These are applied on top of the YAML-loaded config during startup.
+
+## How It Works
+
+The Claude Agent SDK runs as a subprocess. The SDK's bundled `cli.js` reads `ANTHROPIC_BASE_URL` and the `ANTHROPIC_DEFAULT_*_MODEL` aliases at call time. When `ANTHROPIC_BASE_URL` points at a non-Anthropic host, all Messages API requests go there instead.
+
+The `provider:` block is translated into those environment variables by `buildProviderEnv()` in [`src/config/providers.ts`](../src/config/providers.ts). The resulting map is merged into both the main agent query and the evolution judge query, so changing providers flips both tiers in lockstep.
+
+## Why keep a Claude model name in `model:`?
+
+The bundled `cli.js` has hardcoded model-name arrays for capability detection (thinking tokens, effort levels, compaction, etc.). Passing a literal `glm-5.1` as the model can break those checks. The recommended pattern is:
+
+1. Set `model: claude-sonnet-4-6` (or Opus) in `phantom.yaml` so `cli.js` treats the call as a known Claude model
+2. Set `model_mappings.sonnet: glm-5.1` in the provider block so the wire call goes to GLM-5.1
+
+This is the same pattern Z.AI's own documentation recommends.
+
+## Troubleshooting
+
+**Phantom responds but the logs show Claude-shaped costs.**
+The bundled `cli.js` calculates `total_cost_usd` from its local Claude pricing table based on the model name string. Cost reporting is not provider-aware, so the logged cost will look like Claude pricing even when the request went to Z.AI or another provider. The actual charge on your provider's bill will differ.
+
+**Auto mode judges fall back to heuristic mode.**
+`resolveJudgeMode` in auto mode enables LLM judges when any of these are true: (a) a non-anthropic provider is configured, (b) `provider.base_url` is set, (c) `ANTHROPIC_API_KEY` is present, or (d) `~/.claude/.credentials.json` exists. If none hold, judges run in heuristic mode. Set `judges.enabled: always` in `config/evolution.yaml` to force LLM judges on.
+
+**Third-party proxy rejects a beta header.**
+`disable_betas: true` is already the default for every non-anthropic preset, which sets `CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1`. If you still see beta header errors, explicitly set `disable_betas: true` on your provider block to make sure it overrides any custom `disable_betas: false`.
+
+**Tool calls fail with small local models.**
+Phantom's tool system assumes strong function-calling capability. Models like Qwen3-Coder and GLM-5.1 handle it well; smaller models often fail on complex multi-step tool chains. Test with a strong model first, then drop down.
+
+**Subprocess fails with a missing-credential error.**
+Phantom does not validate credentials at load time. The subprocess only sees the provider env vars when a query runs. If `api_key_env` names a variable that is not set in the process environment, the subprocess will fail at call time with the provider's own error message.
diff --git a/src/agent/__tests__/prompt-assembler.test.ts b/src/agent/__tests__/prompt-assembler.test.ts
@@ -7,6 +7,7 @@ const baseConfig: PhantomConfig = {
 	port: 3100,
 	role: "swe",
 	model: "claude-opus-4-6",
+	provider: { type: "anthropic" },
 	effort: "max",
 	max_budget_usd: 0,
 	timeout_minutes: 240,