Skip to content

Non-vision models blocked from passing images to vision-capable MCP tools #29216

@andrea-tomassi

Description

@andrea-tomassi

Description

When using a non-vision model, image attachments are rejected at the backend level before the LLM can process them. The model receives an error text instead of the image, so it can't delegate the image to a vision-capable MCP tool (like zai-mcp-server_analyze_image).

This is inconsistent: a vision model and a non-vision model are both capable of calling the same MCP tool to analyze an image — neither uses its own "vision" for the actual analysis. But only the vision model can receive the image reference in the first place.

Steps to Reproduce

  1. Select a non-vision model (e.g., one without image in its modalities.input)
  2. Upload an image via drag-and-drop or the attach button (this works — frontend gating doesn't exist)
  3. Submit the prompt
  4. The LLM receives: ERROR: Cannot read "image.png" (this model does not support image input). Inform the user.
  5. The LLM cannot call zai-mcp-server_analyze_image (or any other vision tool) because the image reference is gone

Root Cause

In packages/opencode/src/provider/transform.ts, the unsupportedParts() function (lines 391–427) replaces image/file parts with an error text when the model lacks capabilities.input.image:

// line 414-416
const modality = mimeToModality(mime)         // → "image"
if (!modality) return part                     
if (model.capabilities.input[modality]) return part  // passes through

// line 418-421 — image stripped, replaced with error
return {
  type: "text" as const,
  text: `ERROR: Cannot read ${name} (this model does not support ${modality} input). Inform the user.`,
}

This is called at line 430 on every message before dispatch.

Meanwhile, the frontend (packages/app/src/components/prompt-input.tsx) has zero awareness of model capabilities — the attach button is always active, drag-and-drop always works, paste always works. This creates a misleading UX where you can upload but the backend silently strips it.

Expected Behavior

A non-vision model should still receive the image as context (even if only as a text reference to the file), so it can dispatch the image to a vision-capable MCP tool (e.g., zai-mcp-server_analyze_image). The model itself doesn't need to support image input — it just needs to know the image exists and have a reference to pass to a tool.

Suggested Fix

Instead of replacing unsupported image parts with a hard error, pass the file metadata as a text part (e.g., [Attached image: "image.png" (image/png)]) so the LLM can still reason about it and route it to tools that handle images.

Environment

  • OpenCode version: current dev branch
  • Reproducible: always

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions