Skip to content

Ship opt-in code mode for vMCP #4742

@jerm-dro

Description

@jerm-dro

User Story

As a platform engineer,
I want my agents to be able to execute scripts on tools without shell access
so that they can safely reduce context bloat and inference cycles.

Background

Agents today make sequential tool calls with model inference between each one. For multi-service workflows (e.g., incident triage across PagerDuty, Datadog, Slack, Jira, GitHub, Confluence), this means 10+ round-trips and significant token spend. The Starlark script middleware lets agents submit a single script that calls multiple tools server-side, with loops, conditionals, and parallel() fan-out, returning an aggregated result in one tool call.

A working prototype exists in draft PR #4714 (branch jerm-dro/script-middleware-prototype). This story hardens that prototype into a shippable, opt-in feature by addressing all known limitations.

Scope

What this story delivers

  1. Config toggle -- Add an opt-in boolean flag to the VirtualMCPServer spec (or vMCP server config) that enables/disables the script middleware. Defaults to disabled. When disabled, execute_tool_script must not appear in tools/list responses.

  2. Inner tool call mechanism -- Replace httptest.NewRecorder usage in innerToolCaller.CallTool and fetchToolList with a proper internal dispatch mechanism. The current approach creates synthetic HTTP round-trips through httptest.NewRecorder; the replacement should invoke the middleware chain without constructing real HTTP request/response pairs (e.g., via a direct function call interface or an internal dispatcher).

  3. Step limit configuration -- Make the per-script Starlark step limit configurable rather than hardcoded at DefaultStepLimit (100,000). Expose this via the same config surface as the opt-in toggle. Provide a sensible default.

  4. Concurrency cap for parallel() -- Add a configurable maximum number of goroutines that parallel() can spawn concurrently. Currently parallelBuiltin launches one goroutine per callable with no bound. The cap should be configurable via the script config and have a sensible default (e.g., 10).

  5. Per-tool-call timeout -- Add a configurable timeout for individual tool calls made from within a script. If a tool call exceeds the timeout, it should be cancelled and return a clear error to the script. Expose via config with a sensible default.

  6. Result unwrapping -- Make the {"result": value} structured content unwrapping in parseToolResult robust across response formats. The current logic only handles the single-key {"result": ...} pattern from the mcp-go SDK. It should handle: (a) direct structured content without the wrapper, (b) multiple content items, (c) mixed text/structured responses.

  7. Error handling and messages -- Provide clear, actionable error messages when scripts hit step limits, tool call timeouts, or concurrency caps. Errors should include which limit was exceeded and the configured value.

  8. Optimizer compatibility -- Verify and ensure the script middleware works correctly when the vMCP optimizer is enabled. The optimizer may transform tool lists; execute_tool_script must remain visible and functional.

Out of scope

  • Observability (logging/metrics) -- covered by STORY-002
  • Adoption tracking metrics -- covered by STORY-003
  • SSE transport support
  • Full RFC THV-0060 session model (backends(), publish(), presets)
  • Starlark sandbox security beyond step limits
  • User-facing documentation (separate follow-up)

Acceptance Criteria

  • unit: When code mode is disabled (default), execute_tool_script does not appear in tools/list; when enabled, it does
  • unit: A Starlark script submitted via execute_tool_script can call multiple tools, use loops/conditionals, and return an aggregated result
  • unit: parallel() fans out tool calls concurrently and returns results in order
  • unit: When a script exceeds the configured step limit, execution stops and returns an error identifying the limit and configured value
  • unit: When parallel() exceeds the configured concurrency cap, excess callables are queued or rejected with a clear error
  • unit: When an inner tool call exceeds the configured timeout, that call returns a timeout error without hanging the script
  • unit: Result unwrapping handles: direct structured content, mcp-go SDK {"result": value} wrapper, multi-item responses, and plain text — unknown formats returned as-is
  • unit: When the optimizer is enabled, execute_tool_script remains in tools/list and inner tool calls resolve correctly through the optimized chain
  • acceptance: A VirtualMCPServer with code mode enabled accepts a Starlark script via execute_tool_script, executes tool calls through the proxy, and returns the aggregated result end-to-end
  • acceptance: Step limit, concurrency cap, and tool call timeout are configurable via VirtualMCPServer spec with sensible defaults

Technical Details

Key files (prototype, branch jerm-dro/script-middleware-prototype)

File Role
pkg/script/engine.go Starlark execution engine (Execute function, step limit, script wrapping)
pkg/script/bridge.go Tool bridge: MCP tools to Starlark callables, parallel() builtin, call_tool(), type conversion
pkg/script/middleware.go HTTP middleware: intercepts execute_tool_script calls, injects virtual tool into tools/list, inner tool dispatch via innerToolCaller
pkg/vmcp/server/server.go Server config struct (ScriptMiddleware field), middleware chain wiring
cmd/vmcp/app/commands.go CLI wiring: creates script.NewMiddleware() and passes to server config

Architecture context

  • The script middleware sits above authz in the vMCP middleware chain (outer in wrapping order, runs after authz in execution order)
  • It intercepts execute_tool_script tool/call requests and executes the Starlark script
  • Inner tool calls from scripts flow through the rest of the middleware chain (authz, discovery, etc.)
  • parallel() executes callables concurrently using goroutines
  • The middleware injects execute_tool_script into tools/list responses with a dynamic description listing available tools

Config design guidance

The config toggle and tuning parameters (step limit, concurrency cap, tool call timeout) should be grouped under a codeMode or script section in the VirtualMCPServer spec. Example shape:

spec:
  config:
    codeMode:
      enabled: false          # opt-in toggle
      stepLimit: 100000       # max Starlark execution steps
      parallelMaxConcurrency: 10  # max goroutines for parallel()
      toolCallTimeout: 30s    # per-tool-call timeout

Known prototype limitations to address

  1. httptest.NewRecorder in innerToolCaller.CallTool and fetchToolList (middleware.go lines ~300-350, ~230-270) -- creates unnecessary HTTP serialization overhead and couples to net/http/httptest
  2. Hardcoded step limit in engine.go (DefaultStepLimit = 100_000) -- not configurable at runtime
  3. Unbounded parallel() goroutines in bridge.go (parallelBuiltin) -- no semaphore or concurrency cap
  4. No tool call timeout -- callToolAndConvert uses the parent context with no per-call deadline
  5. Fragile result unwrapping in bridge.go (parseToolResult) -- only handles {"result": value} single-key pattern
  6. Always-on -- script.NewMiddleware() is unconditionally passed to the server config in commands.go

Dependencies

  • go.starlark.net -- Starlark interpreter (already a dependency in the prototype)
  • Existing vMCP middleware chain and authz middleware
  • VirtualMCPServer CRD (for config flag in Kubernetes deployments)

References

Metadata

Metadata

Assignees

Labels

code-modevMCP Code Mode (Starlark script middleware)enhancementNew feature or requestgoPull requests that update go codevmcpVirtual MCP Server related issues

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions