Description
When using a non-vision model, image attachments are rejected at the backend level before the LLM can process them. The model receives an error text instead of the image, so it can't delegate the image to a vision-capable MCP tool (like zai-mcp-server_analyze_image).
This is inconsistent: a vision model and a non-vision model are both capable of calling the same MCP tool to analyze an image — neither uses its own "vision" for the actual analysis. But only the vision model can receive the image reference in the first place.
Steps to Reproduce
- Select a non-vision model (e.g., one without
image in its modalities.input)
- Upload an image via drag-and-drop or the attach button (this works — frontend gating doesn't exist)
- Submit the prompt
- The LLM receives:
ERROR: Cannot read "image.png" (this model does not support image input). Inform the user.
- The LLM cannot call
zai-mcp-server_analyze_image (or any other vision tool) because the image reference is gone
Root Cause
In packages/opencode/src/provider/transform.ts, the unsupportedParts() function (lines 391–427) replaces image/file parts with an error text when the model lacks capabilities.input.image:
// line 414-416
const modality = mimeToModality(mime) // → "image"
if (!modality) return part
if (model.capabilities.input[modality]) return part // passes through
// line 418-421 — image stripped, replaced with error
return {
type: "text" as const,
text: `ERROR: Cannot read ${name} (this model does not support ${modality} input). Inform the user.`,
}
This is called at line 430 on every message before dispatch.
Meanwhile, the frontend (packages/app/src/components/prompt-input.tsx) has zero awareness of model capabilities — the attach button is always active, drag-and-drop always works, paste always works. This creates a misleading UX where you can upload but the backend silently strips it.
Expected Behavior
A non-vision model should still receive the image as context (even if only as a text reference to the file), so it can dispatch the image to a vision-capable MCP tool (e.g., zai-mcp-server_analyze_image). The model itself doesn't need to support image input — it just needs to know the image exists and have a reference to pass to a tool.
Suggested Fix
Instead of replacing unsupported image parts with a hard error, pass the file metadata as a text part (e.g., [Attached image: "image.png" (image/png)]) so the LLM can still reason about it and route it to tools that handle images.
Environment
- OpenCode version: current
dev branch
- Reproducible: always
Description
When using a non-vision model, image attachments are rejected at the backend level before the LLM can process them. The model receives an error text instead of the image, so it can't delegate the image to a vision-capable MCP tool (like
zai-mcp-server_analyze_image).This is inconsistent: a vision model and a non-vision model are both capable of calling the same MCP tool to analyze an image — neither uses its own "vision" for the actual analysis. But only the vision model can receive the image reference in the first place.
Steps to Reproduce
imagein itsmodalities.input)ERROR: Cannot read "image.png" (this model does not support image input). Inform the user.zai-mcp-server_analyze_image(or any other vision tool) because the image reference is goneRoot Cause
In
packages/opencode/src/provider/transform.ts, theunsupportedParts()function (lines 391–427) replaces image/file parts with an error text when the model lackscapabilities.input.image:This is called at line 430 on every message before dispatch.
Meanwhile, the frontend (
packages/app/src/components/prompt-input.tsx) has zero awareness of model capabilities — the attach button is always active, drag-and-drop always works, paste always works. This creates a misleading UX where you can upload but the backend silently strips it.Expected Behavior
A non-vision model should still receive the image as context (even if only as a text reference to the file), so it can dispatch the image to a vision-capable MCP tool (e.g.,
zai-mcp-server_analyze_image). The model itself doesn't need to support image input — it just needs to know the image exists and have a reference to pass to a tool.Suggested Fix
Instead of replacing unsupported image parts with a hard error, pass the file metadata as a text part (e.g.,
[Attached image: "image.png" (image/png)]) so the LLM can still reason about it and route it to tools that handle images.Environment
devbranch