🤖 feat: expand MCP control-plane and aggregated MCP operations#48
Conversation
- add MCP tools for control-plane pods, deployment, service, workspace/template get, and running-state updates - add thread-safe CRUD support for aggregated in-memory workspace/template storage - update MCP RBAC and docs, plus unit tests for tools and storage behavior --- _Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$0.45`_ <!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=0.45 -->
|
@codex review Please review this MCP/server/storage expansion. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2815c01c91
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- reject workspace/template updates that omit metadata.resourceVersion - preserve optimistic-lock semantics and return conflicts on mismatches - add regression tests for missing resourceVersion update requests --- _Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$0.45`_ <!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=0.45 -->
|
@codex review Addressed the resourceVersion update feedback and added regression tests. Please take another look. |
|
Codex Review: Didn't find any major issues. Another round soon, please! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Summary
This PR expands the MCP HTTP server tool surface so clients can inspect real Kubernetes control-plane workloads and operate on aggregated workspace/template resources through stable MCP contracts.
Background
The MCP server already exposed basic read-only tools, but clients still lacked direct tools for control-plane pod/deployment/service visibility and detail/update operations for aggregated resources. The aggregated API storage was also Get/List-only, which blocked end-to-end operation flows.
Implementation
list_control_plane_podsget_control_plane_deployment_statusget_service_statusget_workspaceget_templateset_workspace_runningset_template_runningworkspace/template) with thread-safe CRUD behavior (Create,Update,Delete) plus namespace/name/resourceVersion invariants.Validation
make verify-vendormake testmake buildmake lintRisks
📋 Implementation Plan
Plan: Connect MCP tools to real Kubernetes operations (keep resource data hardcoded)
Context / Why
The repo already has an MCP HTTP server (
--app=mcp-http) with a small set of tools. Most of those tools already call the real Kubernetes API server, but some parts of the data plane (notably the aggregated API storage forCoderWorkspace/CoderTemplate) are still hardcoded in-memory.This plan expands the MCP tool surface so clients can start integrating against a stable API now, while we keep the underlying resources hardcoded/stubbed where needed. The intent is to:
Goals
CoderControlPlane(pods, deployments, services, events, logs).get_workspace,get_template) so clients don’t have to infer from list output.Non-goals
Evidence (repo facts)
internal/app/mcpapp/tools.golist_control_planes,get_control_plane_status,list_workspaces,list_templates,get_events,get_pod_logs,check_health.internal/app/mcpapp/server.go(ctrl.GetConfigOrDie(), controller-runtimeclient.New,kubernetes.NewForConfig).internal/app/apiserverapp/apiserverapp.go→storage.NewWorkspaceStorage()/storage.NewTemplateStorage().internal/aggregated/storage/workspace.gointernal/aggregated/storage/template.gointernal/controller/codercontrolplane_controller.go(controlPlaneLabels).deploy/rbac.yaml.docs/how-to/mcp-server.md.Implementation details
1) Expand MCP tools to cover real operator-managed K8s resources
Files:
internal/app/mcpapp/tools.gointernal/app/mcpapp/<new_file>.goiftools.gogets too largeAdd read-only tools that map directly to real Kubernetes operations.
1.1
list_control_plane_podsCoderControlPlaneusing its label set.k8sClient.List(ctx, &corev1.PodList{}, client.InNamespace(ns), client.MatchingLabels(labels)){ namespace, name }(control plane name){ name, namespace, phase, nodeName, readyContainers/total, startTime }1.2
get_control_plane_deployment_statusapps/v1.Deploymentcreated for aCoderControlPlane.k8sClient.Get(ctx, client.ObjectKey{Namespace: ns, Name: cpName}, &appsv1.Deployment{}){ replicas, readyReplicas, updatedReplicas, availableReplicas, conditions }1.3 (Optional)
get_service_statusCoderControlPlane.k8sClient.Get(ctx, key, &corev1.Service{})Shape example (tool pattern):
Defensive programming / guardrails (apply to all new tools):
namespace,name).limit+continuelikeget_events).2) Add “detail” tools for aggregated resources (still backed by hardcoded storage)
Files:
internal/app/mcpapp/tools.goAdd:
get_workspace→k8sClient.Getonaggregationv1alpha1.CoderWorkspaceget_template→k8sClient.Getonaggregationv1alpha1.CoderTemplateThese are “real K8s operations” (they hit the API server); the returned objects may remain hardcoded until storage is replaced.
3) Extend aggregated API storage to support CRUD (enables real API flows with hardcoded data)
Why this matters: It lets us introduce MCP tools like
set_workspace_runningorcreate_workspacenow, while still storing data in-memory.Files:
internal/aggregated/storage/workspace.gointernal/aggregated/storage/template.goChanges:
sync.RWMutexaround the internal maps to avoid data races.rest.Creater(Create)rest.Updater(Update; enables PATCH/PUT)rest.GracefulDeleter(Delete)rest.CollectionDeleter(DeleteCollection)Minimal semantics (enough to unblock client integration):
(namespace,name)uniqueness.spec.running).4) Add MCP “operation” tools that exercise the new CRUD endpoints (optional but recommended)
Files:
internal/app/mcpapp/tools.goAdd narrowly-scoped tools (safer than exposing generic update):
set_workspace_running(inputs: namespace, name, running)set_template_running(inputs: namespace, name, running)create_workspace,delete_workspace,create_template,delete_templateImplementation can start as simple
k8sClient.Patch/k8sClient.Updateagainst the aggregated resources.5) Update RBAC for new real-K8s inspection tools
Files:
deploy/rbac.yamlUpdate the
coder-k8s-mcpClusterRole rules to include any newly-read core resources. For the tools above, likely:apiGroups: ["apps"], resources: ["deployments"], verbs: ["get", "list", "watch"]apiGroups: [""], resources: ["services"], verbs: ["get", "list", "watch"](only if adding Service tools)Keep MCP permissions read-only unless/until we explicitly add write tools that truly need them.
6) Tests
Files:
internal/app/mcpapp/tools_test.go(new)internal/aggregated/storage/workspace_test.go(new)internal/aggregated/storage/template_test.go(new)Suggested coverage:
fakeclient.7) Documentation
Files:
docs/how-to/mcp-server.mdUpdate “Available tools” to list the new tool names and short examples.
Validation (when implemented)
make testmake buildmake lintmake manifests(only if required by this repo’s workflow for the touched files).Future follow-ups (intentionally out of scope for this pass)
watchsemantics for aggregated resources (or removewatchRBAC if we choose not to support it).Implementation approach (single agent vs. team)
Recommendation: If you want the fastest path with the lowest coordination overhead, have one agent/engineer implement the whole plan end-to-end (tools + aggregated storage + RBAC + tests + docs). The changes are tightly coupled (tool shape ↔ RBAC ↔ storage semantics), and a single implementer reduces churn from interface mismatches.
If you prefer parallelism, split work by stable interfaces and use small PRs that merge in order:
Option A: Single agent (lowest risk)
get_workspace/get_templatetools + tests.Option B: Team of agents (faster, but needs coordination)
internal/app/mcpapp/tools.go, plustools_test.go.internal/aggregated/storage/{workspace,template}.go, plus focused storage tests.Coordination contract (do this up front):
make test,make build,make lint) on each PR to keep integration failures localized.Generated with
mux• Model:openai:gpt-5.3-codex• Thinking:xhighGenerated with
mux• Model:openai:gpt-5.3-codex• Thinking:xhigh• Cost:$0.45