You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a platform engineer, I want my agents to be able to execute scripts on tools without shell access so that they can safely reduce context bloat and inference cycles.
Background
Agents today make sequential tool calls with model inference between each one. For multi-service workflows (e.g., incident triage across PagerDuty, Datadog, Slack, Jira, GitHub, Confluence), this means 10+ round-trips and significant token spend. The Starlark script middleware lets agents submit a single script that calls multiple tools server-side, with loops, conditionals, and parallel() fan-out, returning an aggregated result in one tool call.
A working prototype exists in draft PR #4714 (branch jerm-dro/script-middleware-prototype). This story hardens that prototype into a shippable, opt-in feature by addressing all known limitations.
Scope
What this story delivers
Config toggle -- Add an opt-in boolean flag to the VirtualMCPServer spec (or vMCP server config) that enables/disables the script middleware. Defaults to disabled. When disabled, execute_tool_script must not appear in tools/list responses.
Inner tool call mechanism -- Replace httptest.NewRecorder usage in innerToolCaller.CallTool and fetchToolList with a proper internal dispatch mechanism. The current approach creates synthetic HTTP round-trips through httptest.NewRecorder; the replacement should invoke the middleware chain without constructing real HTTP request/response pairs (e.g., via a direct function call interface or an internal dispatcher).
Step limit configuration -- Make the per-script Starlark step limit configurable rather than hardcoded at DefaultStepLimit (100,000). Expose this via the same config surface as the opt-in toggle. Provide a sensible default.
Concurrency cap for parallel() -- Add a configurable maximum number of goroutines that parallel() can spawn concurrently. Currently parallelBuiltin launches one goroutine per callable with no bound. The cap should be configurable via the script config and have a sensible default (e.g., 10).
Per-tool-call timeout -- Add a configurable timeout for individual tool calls made from within a script. If a tool call exceeds the timeout, it should be cancelled and return a clear error to the script. Expose via config with a sensible default.
Result unwrapping -- Make the {"result": value} structured content unwrapping in parseToolResult robust across response formats. The current logic only handles the single-key {"result": ...} pattern from the mcp-go SDK. It should handle: (a) direct structured content without the wrapper, (b) multiple content items, (c) mixed text/structured responses.
Error handling and messages -- Provide clear, actionable error messages when scripts hit step limits, tool call timeouts, or concurrency caps. Errors should include which limit was exceeded and the configured value.
Optimizer compatibility -- Verify and ensure the script middleware works correctly when the vMCP optimizer is enabled. The optimizer may transform tool lists; execute_tool_script must remain visible and functional.
Out of scope
Observability (logging/metrics) -- covered by STORY-002
Adoption tracking metrics -- covered by STORY-003
SSE transport support
Full RFC THV-0060 session model (backends(), publish(), presets)
Starlark sandbox security beyond step limits
User-facing documentation (separate follow-up)
Acceptance Criteria
unit: When code mode is disabled (default), execute_tool_script does not appear in tools/list; when enabled, it does
unit: A Starlark script submitted via execute_tool_script can call multiple tools, use loops/conditionals, and return an aggregated result
unit: parallel() fans out tool calls concurrently and returns results in order
unit: When a script exceeds the configured step limit, execution stops and returns an error identifying the limit and configured value
unit: When parallel() exceeds the configured concurrency cap, excess callables are queued or rejected with a clear error
unit: When an inner tool call exceeds the configured timeout, that call returns a timeout error without hanging the script
unit: Result unwrapping handles: direct structured content, mcp-go SDK {"result": value} wrapper, multi-item responses, and plain text — unknown formats returned as-is
unit: When the optimizer is enabled, execute_tool_script remains in tools/list and inner tool calls resolve correctly through the optimized chain
acceptance: A VirtualMCPServer with code mode enabled accepts a Starlark script via execute_tool_script, executes tool calls through the proxy, and returns the aggregated result end-to-end
acceptance: Step limit, concurrency cap, and tool call timeout are configurable via VirtualMCPServer spec with sensible defaults
Tool bridge: MCP tools to Starlark callables, parallel() builtin, call_tool(), type conversion
pkg/script/middleware.go
HTTP middleware: intercepts execute_tool_script calls, injects virtual tool into tools/list, inner tool dispatch via innerToolCaller
pkg/vmcp/server/server.go
Server config struct (ScriptMiddleware field), middleware chain wiring
cmd/vmcp/app/commands.go
CLI wiring: creates script.NewMiddleware() and passes to server config
Architecture context
The script middleware sits above authz in the vMCP middleware chain (outer in wrapping order, runs after authz in execution order)
It intercepts execute_tool_scripttool/call requests and executes the Starlark script
Inner tool calls from scripts flow through the rest of the middleware chain (authz, discovery, etc.)
parallel() executes callables concurrently using goroutines
The middleware injects execute_tool_script into tools/list responses with a dynamic description listing available tools
Config design guidance
The config toggle and tuning parameters (step limit, concurrency cap, tool call timeout) should be grouped under a codeMode or script section in the VirtualMCPServer spec. Example shape:
spec:
config:
codeMode:
enabled: false # opt-in togglestepLimit: 100000# max Starlark execution stepsparallelMaxConcurrency: 10# max goroutines for parallel()toolCallTimeout: 30s# per-tool-call timeout
Known prototype limitations to address
httptest.NewRecorder in innerToolCaller.CallTool and fetchToolList (middleware.go lines ~300-350, ~230-270) -- creates unnecessary HTTP serialization overhead and couples to net/http/httptest
Hardcoded step limit in engine.go (DefaultStepLimit = 100_000) -- not configurable at runtime
Unbounded parallel() goroutines in bridge.go (parallelBuiltin) -- no semaphore or concurrency cap
No tool call timeout -- callToolAndConvert uses the parent context with no per-call deadline
Fragile result unwrapping in bridge.go (parseToolResult) -- only handles {"result": value} single-key pattern
Always-on -- script.NewMiddleware() is unconditionally passed to the server config in commands.go
Dependencies
go.starlark.net -- Starlark interpreter (already a dependency in the prototype)
Existing vMCP middleware chain and authz middleware
VirtualMCPServer CRD (for config flag in Kubernetes deployments)
User Story
As a platform engineer,
I want my agents to be able to execute scripts on tools without shell access
so that they can safely reduce context bloat and inference cycles.
Background
Agents today make sequential tool calls with model inference between each one. For multi-service workflows (e.g., incident triage across PagerDuty, Datadog, Slack, Jira, GitHub, Confluence), this means 10+ round-trips and significant token spend. The Starlark script middleware lets agents submit a single script that calls multiple tools server-side, with loops, conditionals, and
parallel()fan-out, returning an aggregated result in one tool call.A working prototype exists in draft PR #4714 (branch
jerm-dro/script-middleware-prototype). This story hardens that prototype into a shippable, opt-in feature by addressing all known limitations.Scope
What this story delivers
Config toggle -- Add an opt-in boolean flag to the VirtualMCPServer spec (or vMCP server config) that enables/disables the script middleware. Defaults to disabled. When disabled,
execute_tool_scriptmust not appear intools/listresponses.Inner tool call mechanism -- Replace
httptest.NewRecorderusage ininnerToolCaller.CallToolandfetchToolListwith a proper internal dispatch mechanism. The current approach creates synthetic HTTP round-trips throughhttptest.NewRecorder; the replacement should invoke the middleware chain without constructing real HTTP request/response pairs (e.g., via a direct function call interface or an internal dispatcher).Step limit configuration -- Make the per-script Starlark step limit configurable rather than hardcoded at
DefaultStepLimit(100,000). Expose this via the same config surface as the opt-in toggle. Provide a sensible default.Concurrency cap for
parallel()-- Add a configurable maximum number of goroutines thatparallel()can spawn concurrently. CurrentlyparallelBuiltinlaunches one goroutine per callable with no bound. The cap should be configurable via the script config and have a sensible default (e.g., 10).Per-tool-call timeout -- Add a configurable timeout for individual tool calls made from within a script. If a tool call exceeds the timeout, it should be cancelled and return a clear error to the script. Expose via config with a sensible default.
Result unwrapping -- Make the
{"result": value}structured content unwrapping inparseToolResultrobust across response formats. The current logic only handles the single-key{"result": ...}pattern from the mcp-go SDK. It should handle: (a) direct structured content without the wrapper, (b) multiple content items, (c) mixed text/structured responses.Error handling and messages -- Provide clear, actionable error messages when scripts hit step limits, tool call timeouts, or concurrency caps. Errors should include which limit was exceeded and the configured value.
Optimizer compatibility -- Verify and ensure the script middleware works correctly when the vMCP optimizer is enabled. The optimizer may transform tool lists;
execute_tool_scriptmust remain visible and functional.Out of scope
backends(),publish(), presets)Acceptance Criteria
execute_tool_scriptdoes not appear intools/list; when enabled, it doesexecute_tool_scriptcan call multiple tools, use loops/conditionals, and return an aggregated resultparallel()fans out tool calls concurrently and returns results in orderparallel()exceeds the configured concurrency cap, excess callables are queued or rejected with a clear error{"result": value}wrapper, multi-item responses, and plain text — unknown formats returned as-isexecute_tool_scriptremains intools/listand inner tool calls resolve correctly through the optimized chainexecute_tool_script, executes tool calls through the proxy, and returns the aggregated result end-to-endTechnical Details
Key files (prototype, branch
jerm-dro/script-middleware-prototype)pkg/script/engine.goExecutefunction, step limit, script wrapping)pkg/script/bridge.goparallel()builtin,call_tool(), type conversionpkg/script/middleware.goexecute_tool_scriptcalls, injects virtual tool intotools/list, inner tool dispatch viainnerToolCallerpkg/vmcp/server/server.goScriptMiddlewarefield), middleware chain wiringcmd/vmcp/app/commands.goscript.NewMiddleware()and passes to server configArchitecture context
execute_tool_scripttool/callrequests and executes the Starlark scriptparallel()executes callables concurrently using goroutinesexecute_tool_scriptintotools/listresponses with a dynamic description listing available toolsConfig design guidance
The config toggle and tuning parameters (step limit, concurrency cap, tool call timeout) should be grouped under a
codeModeorscriptsection in the VirtualMCPServer spec. Example shape:Known prototype limitations to address
httptest.NewRecorderininnerToolCaller.CallToolandfetchToolList(middleware.go lines ~300-350, ~230-270) -- creates unnecessary HTTP serialization overhead and couples tonet/http/httptestengine.go(DefaultStepLimit = 100_000) -- not configurable at runtimeparallel()goroutines inbridge.go(parallelBuiltin) -- no semaphore or concurrency capcallToolAndConvertuses the parent context with no per-call deadlinebridge.go(parseToolResult) -- only handles{"result": value}single-key patternscript.NewMiddleware()is unconditionally passed to the server config incommands.goDependencies
go.starlark.net-- Starlark interpreter (already a dependency in the prototype)References