Skip to content

Proposal: message.parts.before plugin hook for multimodal preflight #24125

Description

@alohaninja

Problem

When a user pastes a large image (e.g., a 5MB Retina screenshot at 2816×1536), OpenCode returns:

Failed to send message (400): Malformed JSON in request body

The full base64-encoded image (~6.7MB) is embedded in the JSON POST body from the TUI to the internal server. Hono's JSON parser in packages/opencode/src/server/routes/instance/session.ts fails before any plugin hook fires.

Beyond the crash, there's a broader problem: no coding assistant today preprocesses images to reduce token waste. A single Retina screenshot consumes ~1,568 Anthropic tokens. Four screenshots in a session burn ~26,000 tokens on images alone before the model reads a word of the prompt.

Root cause

The image pipeline has no preprocessing or size limits at any stage:

Clipboard paste → pasteAttachment() → data:image/png;base64,<FULL_SIZE>
    ↓
POST /session/{id}/message (7MB+ JSON body)
    ↓  ← JSON.parse() fails here (400)
createUserMessage() → resolvePart()
    ↓
plugin.trigger("chat.message")  ← never reached
    ↓
AI SDK → LLM API

The existing chat.message hook fires after the JSON body is parsed — too late to prevent the crash.

Proposal: message.parts.before hook

Add a hook that fires in the TUI, after parts are collected but before the HTTP POST is constructed:

"message.parts.before"?: (
  input: { sessionID: string; agent?: string },
  output: { parts: PartInput[] },  // mutable
) => Promise<void>;

Where it fires — in prompt/index.tsx, inside submit(), before sdk.client.session.prompt():

const parts = [textPart, ...nonTextParts]
await pluginTrigger("message.parts.before", { sessionID, agent }, { parts })
sdk.client.session.prompt({ sessionID, parts })

This follows the same intercept-transform-passthrough pattern as tool.execute.before (used by RTK for command output compression).

Why not AI SDK middleware?

The Vercel AI SDK has LanguageModelV3Middleware with transformParams which can modify the prompt array including images. However, this fires after the JSON body has already been parsed by the server — it doesn't fix the 400 crash. The message.parts.before hook is needed at the transport layer.

AI SDK middleware could complement this as a second line of defense, but the TUI-side hook is necessary to prevent the crash.

Landscape: no coding assistant does this today

Tool Crashes on large images? Wastes tokens? Plugin can preprocess images?
OpenCode Yes (400 error) Yes No (hook fires too late)
Claude Code No Yes No (UserPromptSubmit is text-only)
Codex No Yes No (same limitation)
Cursor No Yes No (no plugin system)

OpenCode could be the first coding assistant with automatic multimodal preflight — a differentiator for the plugin system.

Proof of concept: SHIFT plugin

SHIFT is an open-source multimodal preflight layer (Rust, Apache-2.0) that inspects and optimizes images in AI API payloads. With the proposed hook, a SHIFT plugin for OpenCode becomes trivial:

import type { Plugin } from "@opencode-ai/plugin";

export const server: Plugin = async ({ $ }) => {
  try { await $`which shift-ai`.quiet() } catch { return {} }

  return {
    "message.parts.before": async (_input, output) => {
      for (const part of output.parts) {
        if ((part as any).type !== "file") continue;
        if (!(part as any).mime?.startsWith("image/")) continue;

        const filePart = part as any;
        const url: string = filePart.url ?? "";
        const match = url.match(/^data:(image\/[^;]+);base64,(.+)$/s);
        if (!match) continue;

        const [, mediaType, base64Data] = match;
        if (base64Data.length < 512_000) continue; // skip small images

        const payload = JSON.stringify({
          model: "claude-sonnet-4-20250514",
          max_tokens: 1024,
          messages: [{
            role: "user",
            content: [
              { type: "image", source: { type: "base64", media_type: mediaType, data: base64Data } },
              { type: "text", text: "describe" },
            ],
          }],
        });

        const tmpIn = `/tmp/shift-in-${Date.now()}.json`;
        const tmpOut = `/tmp/shift-out-${Date.now()}.json`;
        await Bun.write(tmpIn, payload);

        const result = await $`shift-ai ${tmpIn} -p anthropic -m economy > ${tmpOut}`
          .quiet().nothrow();
        if (result.exitCode === 0) {
          const optimized = JSON.parse(await Bun.file(tmpOut).text());
          const newData = optimized?.messages?.[0]?.content?.[0]?.source?.data;
          const newMime = optimized?.messages?.[0]?.content?.[0]?.source?.media_type ?? mediaType;
          if (newData && newData.length < base64Data.length) {
            filePart.url = `data:${newMime};base64,${newData}`;
          }
        }

        await $`rm -f ${tmpIn} ${tmpOut}`.quiet().nothrow();
      }
    },
  };
};

Tested results on a 2816×1536 Retina screenshot:

Metric Before After Savings
Payload size 9.2 MB 1.2 MB 87%
Anthropic tokens 1,568 773 51%
Image dimensions 2816×1536 1023×558

More about SHIFT: shift-ai.dev · GitHub · Guide

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions