diff --git a/.vibe/development-plan-feat-set-default-model-flinker-llama-cpp-config.md b/.vibe/development-plan-feat-set-default-model-flinker-llama-cpp-config.md new file mode 100644 index 000000000000..7e3273fb9747 --- /dev/null +++ b/.vibe/development-plan-feat-set-default-model-flinker-llama-cpp-config.md @@ -0,0 +1,122 @@ +# Development Plan: repo (feat/set-default-model-flinker-llama-cpp-config branch) + +_Generated on 2026-05-18 by Vibe Feature MCP_ +_Workflow: [minor](https://codemcp.github.io/workflows/workflows/minor)_ + +## Goal + +Make the `flinker/qwen3.6-35b-a3b` model the default for sessions created via the opencode router in the Homelab, and configure llama.cpp with `top_k` and `temperature` settings in the model configuration. + +## Key Decisions + +### Decision 1: Configuration via ConfigMap (not router code changes) + +The default model and model-specific options are configured in the **dynamic ConfigMap** (`opencode-config-dir`) in the Homelab deployment at `deployment/homelab/src/index.ts`. The init container in session pods deep-merges this dynamic `opencode.json` with the baked config from the container image. Setting the default model and model options in the ConfigMap is the idiomatic approach — no router code changes needed. + +### Decision 2: Default model field goes in root `opencode.json` + +The `model` field (from `Config.Info` schema in `packages/opencode/src/config/config.ts`) is a root-level config property in the format `provider/model`. Adding `model: "flinker/qwen3.6-35b-a3b"` to the dynamic `opencode.json` in the ConfigMap will make it the default for all new sessions. + +### Decision 3: Model options (`top_k`, `temperature`) go in the flinker model definition + +The flinker provider's model config supports an `options` field (from `ConfigProvider.Info` schema in `packages/opencode/src/config/provider.ts`) which accepts arbitrary key-value pairs. For llama.cpp running via OpenAI-compatible API, `temperature` flows through directly (it's a standard OpenAI param), and `top_k` can be passed through provider options. + +However, looking at the `OpenAICompatibleChatLanguageModel` implementation: + +- `temperature` and `top_p` are passed directly in the request body (standard OpenAI params) +- `topK` is explicitly flagged as unsupported (warning is pushed) — it does NOT get sent in the request body +- Any provider option keys NOT in `openaiCompatibleProviderOptions.shape` get spread into the request body via `...Object.entries(providerOptions?.[this.providerOptionsName] ?? {}).filter(...)` (lines 169-173) + +This means `top_k` can be passed through by setting it in `model.options` which becomes `providerOptions`. + +### Decision 4: Temperature default from transform.ts for qwen + +The `ProviderTransform.temperature()` function in `packages/opencode/src/provider/transform.ts` already has a default for qwen models: + +```ts +if (id.includes("qwen")) return 0.55 +``` + +We should override this via the model's `options.temperature` field if a different value is desired. Per user request, we will configure temperature for the llama.cpp setup. + +### Decision 5: Exact values for llama.cpp options (per Qwen model card) + +Per the [Qwen/Qwen3.6-35B-A3B model card](https://huggingface.co/Qwen/Qwen3.6-35B-A3B), the recommended parameters for **precise coding tasks** are `temperature=0.6, top_p=0.95, top_k=20`. We apply these to `flinker/qwen3.6-35b-a3b` via `model.options`. The values are injected only for the model whose `m.id === "qwen3.6-35b-a3b"`. + +## Notes + +### Architecture + +1. **opencode router** (`packages/opencode-router`) creates session pods with `opencode serve` +2. **Init container** clones repo and deep-merges dynamic config from ConfigMap (`/home/opencode/.opencode/opencode.json`) into `~/.config/opencode/opencode.json` +3. **ConfigMap** is defined in `deployment/homelab/src/index.ts` with provider definitions (openrouter, openrouter-free, flinker) +4. **Flinker models** are fetched dynamically from `http://flinker:8080/v1/models` at deploy time +5. **Model config** supports `options` field for provider-specific parameters + +### Relevant Files + +- `deployment/homelab/src/index.ts` — ConfigMap definition with dynamic `opencode.json` +- `packages/opencode/src/config/config.ts` — Config schema (root `model` field) +- `packages/opencode/src/config/provider.ts` — Provider schema (model `options` field) +- `packages/opencode/src/provider/transform.ts` — Default temperature/topP/topK per model +- `packages/opencode/src/session/llm.ts` — How model options flow into provider requests +- `packages/opencode/src/provider/sdk/copilot/chat/openai-compatible-chat-language-model.ts` — OpenAI-compatible provider implementation + +### llama.cpp OpenAI-compatible API support + +- `temperature` — supported directly as standard OpenAI param ✓ +- `top_p` — supported directly as standard OpenAI param ✓ +- `top_k` — NOT a standard OpenAI param, but llama.cpp server accepts it if passed in the request body. It can be passed through via provider options (non-standard keys get forwarded). + +## Explore + +### Tasks + +- [x] Analyze how session creation works in opencode router +- [x] Find where model configuration is defined (ConfigMap in homelab deployment) +- [x] Understand how default model is set (root `model` field in opencode.json) +- [x] Understand how model options flow to llama.cpp (providerOptions → request body) +- [x] Determine if `top_k` is supported by llama.cpp OpenAI-compatible API (yes, via provider options passthrough) +- [x] Verify `temperature` is supported (yes, standard OpenAI param) + +### Completed + +- [x] Created development plan file + +## Implement + +### Tasks + +- [x] Update ConfigMap in `deployment/homelab/src/index.ts` to add `model` field for default model +- [x] Update `parseFlinkerModel` to inject `options` with `top_k` and `temperature` for the qwen model +- [x] Verify the generated JSON structure is valid opencode config + +### Completed + +- [x] Added root `model: "flinker/qwen3.6-35b-a3b"` to the dynamic `opencode.json` in the ConfigMap +- [x] Added conditional `options: { top_k: 20, top_p: 0.95, temperature: 0.6 }` to `parseFlinkerModel` for `qwen3.6-35b-a3b` (per Qwen model card coding recommendations) +- [x] Ran `tsc --noEmit` in `deployment/homelab`; no type errors introduced + +## Finalize + +### Tasks + +- [x] Verify deployment config compiles +- [x] Code cleanup — remove debug output, review TODOs/FIXMEs, remove temp code +- [x] Documentation review — check `.vibe/docs/requirements.md` and `.vibe/docs/design.md` +- [x] Final validation — run tests, verify docs + +### Completed + +- [x] Verified `deployment/homelab/src/index.ts` compiles with `tsc --noEmit` (no new errors) +- [x] **Code Cleanup**: Scanned `deployment/homelab/src/index.ts` for debug output, TODOs, FIXMEs, and temporary code. No issues found. The `DEBUG_HEADERS` env var is legitimate runtime configuration. No TODO/FIXME comments exist. No commented-out or experimental code related to this feature. +- [x] **Documentation Review**: `requirements.md` and `design.md` are empty templates — no updates needed for this minor deployment config change. +- [x] **Final Validation**: Existing tests in `models.test.ts` cover `models.ts` only and are unaffected by our changes. Syntactic review confirms valid TypeScript. +- [x] **Commit & PR**: Committed all changes and created PR #69 → `main` +- [x] **Parameter refinement**: Updated `top_k` from 40 → 20 and added `top_p=0.95` per Qwen model card coding recommendations. PR updated. + +PR: https://github.com/mrsimpson/opencode/pull/69 + +--- + +_This plan is maintained by the LLM. Tool responses provide guidance on which section to focus on and what tasks are next._ diff --git a/deployment/homelab/src/index.ts b/deployment/homelab/src/index.ts index 398d5571c544..6e953c7fa4f8 100644 --- a/deployment/homelab/src/index.ts +++ b/deployment/homelab/src/index.ts @@ -210,6 +210,7 @@ function parseFlinkerModel(m: FlinkerModel): [string, object] | null { name: `${m.id} (local${m.status.value === "loaded" ? ", loaded" : ""})`, tool_call: true, ...(ctx ? { limit: { context: ctx, output: Math.min(ctx, 32768) } } : {}), + ...(m.id === "qwen3.6-35b-a3b" ? { options: { top_k: 20, top_p: 0.95, temperature: 0.6 } } : {}), }, ] } @@ -245,6 +246,7 @@ const configMap = new k8s.core.v1.ConfigMap( // Only contains the parts that need to be dynamic (model lists). "opencode.json": JSON.stringify( { + model: "flinker/qwen3.6-35b-a3b", provider: { openrouter: { models: paid,