Prototype Starlark script middleware for vMCP by jerm-dro · Pull Request #4714 · stacklok/toolhive

jerm-dro · 2026-04-09T19:10:15Z

Note

This is a prototype / proof-of-concept. It validates the Starlark execution model described in RFC THV-0060 without committing to the full session initialization scope. Not intended for merge as-is — this is a starting point for discussion and iteration.

Summary

Agents today must make sequential tool calls with model inference between each one. For workflows that touch multiple services (e.g., incident triage across PagerDuty, Datadog, Slack, Jira, GitHub, Confluence), this means 10+ round-trips and significant token spend just to gather context.
This PR adds an execute_tool_script virtual tool to vMCP that accepts a Starlark script. The script can call any authorized MCP tool as a function, use loops and conditionals to cross-reference results, fan out calls with parallel(), and return a single aggregated result — all server-side in one tool call.

What's included

pkg/script/ — Starlark execution engine, MCP tool bridge with type conversion, parallel() builtin for concurrent fan-out, HTTP middleware for request interception and tools/list injection
vMCP integration — wired above authz in the middleware chain so scripts only see/call authorized tools
Unit + acceptance tests — 24 tests covering engine, bridge, result parsing, middleware, and a full-stack integration test with the motivating use case
K8s e2e test — Ginkgo test deploying yardstick + VirtualMCPServer and executing scripts through the real proxy
Demo environment — Kind cluster manifests with 8 enterprise dummy MCP servers (Slack, Jira, GitHub, PagerDuty, Datadog, Confluence, Google Drive, Linear) and /incident-triage skill for interactive demos

Type of change

New feature

Test plan

Unit tests (task test)
Linting (task lint-fix)
Manual testing (describe below)

Deployed to a local Kind cluster with 8 dummy MCP servers. Connected via thv run as a remote MCP server. Verified execute_tool_script appears in tools/list with dynamic description, executed scripts with loops over degraded services, parallel() fan-out, and string parsing. Compared /incident-triage (scripted) vs /incident-triage-lame (sequential) side-by-side.

Changes

File	Change
`pkg/script/engine.go`	Starlark execution engine — wraps scripts for top-level `return`, step limits, print capture
`pkg/script/bridge.go`	Tool bridge — converts MCP tools to Starlark callables, type conversion, `parallel()` builtin, result parsing with SDK wrapper unwrapping
`pkg/script/middleware.go`	HTTP middleware — intercepts `execute_tool_script`, injects into `tools/list` with dynamic description, `innerToolCaller` for backend dispatch
`pkg/script/*_test.go`	24 unit + acceptance tests
`pkg/vmcp/server/server.go`	`ScriptMiddleware` config field, applied above authz in `Handler()`
`cmd/vmcp/app/commands.go`	Wire script middleware into vMCP server config
`test/e2e/.../virtualmcp_script_test.go`	K8s acceptance test with yardstick backend
`demo/script-middleware/`	Kind cluster deploy script + 8 dummy MCP server manifests
`.claude/skills/incident-triage/`	Skill that steers agents toward `execute_tool_script` with `parallel()`
`.claude/skills/incident-triage-lame/`	Comparison skill — same task, sequential tool calls only

Special notes for reviewers

This is a prototype. Known limitations and things to resolve before any production path:

Always-on: no config toggle to enable/disable the script middleware
httptest.NewRecorder used in production code for inner tool calls (works fine, but unconventional)
No per-script step limit configuration (hardcoded 100K default)
parallel() creates a goroutine per callable with no concurrency cap
No timeout on individual tool calls within a script
SSE transport not tested (JSON-only for now)
The {"result": value} structured content unwrapping is specific to mcp-go SDK behavior

Generated with Claude Code

Prototype a "tool script middleware" that lets agents write Starlark scripts to orchestrate multiple MCP tool calls in a single atomic operation. This validates the Starlark execution model from RFC THV-0060 without committing to the full session initialization scope. Key components: - Starlark execution engine with step limits and script wrapping for top-level return support - Tool bridge converting MCP tools into callable Starlark functions with type conversion between Go/JSON and Starlark values - parallel() builtin for concurrent fan-out of tool calls - HTTP middleware intercepting execute_tool_script and injecting it into tools/list with dynamic descriptions - Wired into vMCP server above authz so scripts only see authorized tools Includes unit tests, in-process acceptance tests, a k8s e2e test, demo manifests for a Kind cluster with 8 enterprise dummy MCP servers, and /incident-triage skills for interactive demos. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.

This review will be automatically dismissed once you add the justification section.

github-actions bot added the size/XL Extra large PR: 1000+ lines changed label Apr 9, 2026

github-actions bot requested changes Apr 9, 2026

View reviewed changes

This was referenced Apr 10, 2026

vMCP Code Mode #4741

Open

Ship opt-in code mode for vMCP #4742

Open

Add observability for script execution #4743

Open

Track code mode adoption and usage #4744

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prototype Starlark script middleware for vMCP#4714

Prototype Starlark script middleware for vMCP#4714
jerm-dro wants to merge 1 commit intomainfrom
jerm-dro/script-middleware-prototype

jerm-dro commented Apr 9, 2026

Uh oh!

github-actions bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jerm-dro commented Apr 9, 2026

Summary

What's included

Type of change

Test plan

Changes

Special notes for reviewers

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Large PR Detected

How to unblock this PR:

Alternative:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant