Ship opt-in code mode for vMCP

## User Story

**As a** platform engineer,
**I want** my agents to be able to execute scripts on tools without shell access
**so that** they can safely reduce context bloat and inference cycles.

## Background

Agents today make sequential tool calls with model inference between each one. For multi-service workflows (e.g., incident triage across PagerDuty, Datadog, Slack, Jira, GitHub, Confluence), this means 10+ round-trips and significant token spend. The Starlark script middleware lets agents submit a single script that calls multiple tools server-side, with loops, conditionals, and `parallel()` fan-out, returning an aggregated result in one tool call.

A working prototype exists in draft PR #4714 (branch `jerm-dro/script-middleware-prototype`). This story hardens that prototype into a shippable, opt-in feature by addressing all known limitations.

## Scope

### What this story delivers

1. **Config toggle** -- Add an opt-in boolean flag to the VirtualMCPServer spec (or vMCP server config) that enables/disables the script middleware. Defaults to disabled. When disabled, `execute_tool_script` must not appear in `tools/list` responses.

2. **Inner tool call mechanism** -- Replace `httptest.NewRecorder` usage in `innerToolCaller.CallTool` and `fetchToolList` with a proper internal dispatch mechanism. The current approach creates synthetic HTTP round-trips through `httptest.NewRecorder`; the replacement should invoke the middleware chain without constructing real HTTP request/response pairs (e.g., via a direct function call interface or an internal dispatcher).

3. **Step limit configuration** -- Make the per-script Starlark step limit configurable rather than hardcoded at `DefaultStepLimit` (100,000). Expose this via the same config surface as the opt-in toggle. Provide a sensible default.

4. **Concurrency cap for `parallel()`** -- Add a configurable maximum number of goroutines that `parallel()` can spawn concurrently. Currently `parallelBuiltin` launches one goroutine per callable with no bound. The cap should be configurable via the script config and have a sensible default (e.g., 10).

5. **Per-tool-call timeout** -- Add a configurable timeout for individual tool calls made from within a script. If a tool call exceeds the timeout, it should be cancelled and return a clear error to the script. Expose via config with a sensible default.

6. **Result unwrapping** -- Make the `{"result": value}` structured content unwrapping in `parseToolResult` robust across response formats. The current logic only handles the single-key `{"result": ...}` pattern from the mcp-go SDK. It should handle: (a) direct structured content without the wrapper, (b) multiple content items, (c) mixed text/structured responses.

7. **Error handling and messages** -- Provide clear, actionable error messages when scripts hit step limits, tool call timeouts, or concurrency caps. Errors should include which limit was exceeded and the configured value.

8. **Optimizer compatibility** -- Verify and ensure the script middleware works correctly when the vMCP optimizer is enabled. The optimizer may transform tool lists; `execute_tool_script` must remain visible and functional.

### Out of scope

- Observability (logging/metrics) -- covered by STORY-002
- Adoption tracking metrics -- covered by STORY-003
- SSE transport support
- Full RFC THV-0060 session model (`backends()`, `publish()`, presets)
- Starlark sandbox security beyond step limits
- User-facing documentation (separate follow-up)

## Acceptance Criteria

- [ ] unit: When code mode is disabled (default), `execute_tool_script` does not appear in `tools/list`; when enabled, it does
- [ ] unit: A Starlark script submitted via `execute_tool_script` can call multiple tools, use loops/conditionals, and return an aggregated result
- [ ] unit: `parallel()` fans out tool calls concurrently and returns results in order
- [ ] unit: When a script exceeds the configured step limit, execution stops and returns an error identifying the limit and configured value
- [ ] unit: When `parallel()` exceeds the configured concurrency cap, excess callables are queued or rejected with a clear error
- [ ] unit: When an inner tool call exceeds the configured timeout, that call returns a timeout error without hanging the script
- [ ] unit: Result unwrapping handles: direct structured content, mcp-go SDK `{"result": value}` wrapper, multi-item responses, and plain text — unknown formats returned as-is
- [ ] unit: When the optimizer is enabled, `execute_tool_script` remains in `tools/list` and inner tool calls resolve correctly through the optimized chain
- [ ] acceptance: A VirtualMCPServer with code mode enabled accepts a Starlark script via `execute_tool_script`, executes tool calls through the proxy, and returns the aggregated result end-to-end
- [ ] acceptance: Step limit, concurrency cap, and tool call timeout are configurable via VirtualMCPServer spec with sensible defaults

## Technical Details

### Key files (prototype, branch `jerm-dro/script-middleware-prototype`)

| File | Role |
|------|------|
| `pkg/script/engine.go` | Starlark execution engine (`Execute` function, step limit, script wrapping) |
| `pkg/script/bridge.go` | Tool bridge: MCP tools to Starlark callables, `parallel()` builtin, `call_tool()`, type conversion |
| `pkg/script/middleware.go` | HTTP middleware: intercepts `execute_tool_script` calls, injects virtual tool into `tools/list`, inner tool dispatch via `innerToolCaller` |
| `pkg/vmcp/server/server.go` | Server config struct (`ScriptMiddleware` field), middleware chain wiring |
| `cmd/vmcp/app/commands.go` | CLI wiring: creates `script.NewMiddleware()` and passes to server config |

### Architecture context

- The script middleware sits **above authz** in the vMCP middleware chain (outer in wrapping order, runs after authz in execution order)
- It intercepts `execute_tool_script` `tool/call` requests and executes the Starlark script
- Inner tool calls from scripts flow through the rest of the middleware chain (authz, discovery, etc.)
- `parallel()` executes callables concurrently using goroutines
- The middleware injects `execute_tool_script` into `tools/list` responses with a dynamic description listing available tools

### Config design guidance

The config toggle and tuning parameters (step limit, concurrency cap, tool call timeout) should be grouped under a `codeMode` or `script` section in the VirtualMCPServer spec. Example shape:

```yaml
spec:
  config:
    codeMode:
      enabled: false          # opt-in toggle
      stepLimit: 100000       # max Starlark execution steps
      parallelMaxConcurrency: 10  # max goroutines for parallel()
      toolCallTimeout: 30s    # per-tool-call timeout
```

### Known prototype limitations to address

1. **`httptest.NewRecorder`** in `innerToolCaller.CallTool` and `fetchToolList` (middleware.go lines ~300-350, ~230-270) -- creates unnecessary HTTP serialization overhead and couples to `net/http/httptest`
2. **Hardcoded step limit** in `engine.go` (`DefaultStepLimit = 100_000`) -- not configurable at runtime
3. **Unbounded `parallel()` goroutines** in `bridge.go` (`parallelBuiltin`) -- no semaphore or concurrency cap
4. **No tool call timeout** -- `callToolAndConvert` uses the parent context with no per-call deadline
5. **Fragile result unwrapping** in `bridge.go` (`parseToolResult`) -- only handles `{"result": value}` single-key pattern
6. **Always-on** -- `script.NewMiddleware()` is unconditionally passed to the server config in `commands.go`

## Dependencies

- `go.starlark.net` -- Starlark interpreter (already a dependency in the prototype)
- Existing vMCP middleware chain and authz middleware
- VirtualMCPServer CRD (for config flag in Kubernetes deployments)

## References

- Prototype PR: https://github.com/stacklok/toolhive/pull/4714
- RFC THV-0060: https://github.com/stacklok/toolhive-rfcs/pull/60


File	Role
`pkg/script/engine.go`	Starlark execution engine (`Execute` function, step limit, script wrapping)
`pkg/script/bridge.go`	Tool bridge: MCP tools to Starlark callables, `parallel()` builtin, `call_tool()`, type conversion
`pkg/script/middleware.go`	HTTP middleware: intercepts `execute_tool_script` calls, injects virtual tool into `tools/list`, inner tool dispatch via `innerToolCaller`
`pkg/vmcp/server/server.go`	Server config struct (`ScriptMiddleware` field), middleware chain wiring
`cmd/vmcp/app/commands.go`	CLI wiring: creates `script.NewMiddleware()` and passes to server config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ship opt-in code mode for vMCP #4742

User Story

Background

Scope

What this story delivers

Out of scope

Acceptance Criteria

Technical Details

Key files (prototype, branch `jerm-dro/script-middleware-prototype`)

Architecture context

Config design guidance

Known prototype limitations to address

Dependencies

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ship opt-in code mode for vMCP #4742

Description

User Story

Background

Scope

What this story delivers

Out of scope

Acceptance Criteria

Technical Details

Key files (prototype, branch jerm-dro/script-middleware-prototype)

Architecture context

Config design guidance

Known prototype limitations to address

Dependencies

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Key files (prototype, branch `jerm-dro/script-middleware-prototype`)