Skip to content

🤖 feat: expand MCP control-plane and aggregated MCP operations#48

Merged
ThomasK33 merged 2 commits into
mainfrom
mcp-server-s3bz
Feb 11, 2026
Merged

🤖 feat: expand MCP control-plane and aggregated MCP operations#48
ThomasK33 merged 2 commits into
mainfrom
mcp-server-s3bz

Conversation

@ThomasK33
Copy link
Copy Markdown
Member

Summary
This PR expands the MCP HTTP server tool surface so clients can inspect real Kubernetes control-plane workloads and operate on aggregated workspace/template resources through stable MCP contracts.

Background
The MCP server already exposed basic read-only tools, but clients still lacked direct tools for control-plane pod/deployment/service visibility and detail/update operations for aggregated resources. The aggregated API storage was also Get/List-only, which blocked end-to-end operation flows.

Implementation

  • Added MCP tools for:
    • list_control_plane_pods
    • get_control_plane_deployment_status
    • get_service_status
    • get_workspace
    • get_template
    • set_workspace_running
    • set_template_running
  • Kept existing tools and added defensive validation/bounds/assertion-style checks.
  • Extended in-memory aggregated storage (workspace/template) with thread-safe CRUD behavior (Create, Update, Delete) plus namespace/name/resourceVersion invariants.
  • Added shared storage helpers for namespace resolution and resourceVersion increments.
  • Updated MCP RBAC for deployment/service reads and workspace/template update/patch.
  • Updated MCP docs with the full tool list.
  • Added MCP tool tests and focused storage CRUD tests.

Validation

  • make verify-vendor
  • make test
  • make build
  • make lint

Risks

  • Medium/contained: Aggregated storage update semantics now enforce metadata/resourceVersion constraints and ignore status writes on normal update; this is intentional for current in-memory behavior but could affect clients relying on looser semantics.
  • Low: New MCP tools are additive and map to existing Kubernetes resources; RBAC scope increase is limited to required resources/verbs.

📋 Implementation Plan

Plan: Connect MCP tools to real Kubernetes operations (keep resource data hardcoded)

Context / Why

The repo already has an MCP HTTP server (--app=mcp-http) with a small set of tools. Most of those tools already call the real Kubernetes API server, but some parts of the data plane (notably the aggregated API storage for CoderWorkspace/CoderTemplate) are still hardcoded in-memory.

This plan expands the MCP tool surface so clients can start integrating against a stable API now, while we keep the underlying resources hardcoded/stubbed where needed. The intent is to:

  • Keep the tool names + JSON schemas stable (treat them as an API contract).
  • Wire tools to real Kubernetes API calls wherever possible (pods, deployments, services, CRDs).
  • Where the backing system is still stubbed (aggregated workspaces/templates), implement the API operations (Get/Create/Update/Delete) against the in-memory store so the end-to-end request flow exists.

Goals

  • Add MCP tools that inspect real Kubernetes resources involved in running a CoderControlPlane (pods, deployments, services, events, logs).
  • Add missing “detail” tools for the aggregated resources (get_workspace, get_template) so clients don’t have to infer from list output.
  • (Optional but recommended) Extend the aggregated API server storage to support CRUD so we can add MCP tools for workspace/template lifecycle operations while still using hardcoded seed data.
  • Keep output sizes bounded and avoid leaking sensitive data (especially Secrets).

Non-goals

  • Replace the aggregated storage with a real Coder backend (that can come later).
  • Implement broad “kubectl apply” or arbitrary YAML execution via MCP.
  • Provide a fully-correct watch implementation for in-memory aggregated storage (nice-to-have later).

Evidence (repo facts)

  • MCP server tool registration + implementations: internal/app/mcpapp/tools.go
    • Existing tools: list_control_planes, get_control_plane_status, list_workspaces, list_templates, get_events, get_pod_logs, check_health.
  • MCP server K8s client wiring (real clients): internal/app/mcpapp/server.go (ctrl.GetConfigOrDie(), controller-runtime client.New, kubernetes.NewForConfig).
  • Aggregated API server installs hardcoded in-memory storage: internal/app/apiserverapp/apiserverapp.gostorage.NewWorkspaceStorage() / storage.NewTemplateStorage().
  • Hardcoded aggregated storage (Get/List only):
    • internal/aggregated/storage/workspace.go
    • internal/aggregated/storage/template.go
  • Control-plane workload labels that we can use for real K8s lookups: internal/controller/codercontrolplane_controller.go (controlPlaneLabels).
  • Current MCP RBAC is read-only and does not include deployments/services: deploy/rbac.yaml.
  • MCP server usage docs: docs/how-to/mcp-server.md.

Implementation details

1) Expand MCP tools to cover real operator-managed K8s resources

Files:

  • internal/app/mcpapp/tools.go
  • (optional) internal/app/mcpapp/<new_file>.go if tools.go gets too large

Add read-only tools that map directly to real Kubernetes operations.

1.1 list_control_plane_pods

  • Purpose: Find pods for a given CoderControlPlane using its label set.
  • API: k8sClient.List(ctx, &corev1.PodList{}, client.InNamespace(ns), client.MatchingLabels(labels))
  • Input: { namespace, name } (control plane name)
  • Output: list of { name, namespace, phase, nodeName, readyContainers/total, startTime }

1.2 get_control_plane_deployment_status

  • Purpose: Inspect the apps/v1.Deployment created for a CoderControlPlane.
  • API: k8sClient.Get(ctx, client.ObjectKey{Namespace: ns, Name: cpName}, &appsv1.Deployment{})
  • Output: { replicas, readyReplicas, updatedReplicas, availableReplicas, conditions }

1.3 (Optional) get_service_status

  • Purpose: Inspect the Service created for a CoderControlPlane.
  • API: k8sClient.Get(ctx, key, &corev1.Service{})
  • Output: type/clusterIP/ports/annotations.

Security note: do not add Secret-reading tools (or, if absolutely required later, return only metadata and never .data).

Shape example (tool pattern):

mcp.AddTool(server, &mcp.Tool{
    Name:        "list_control_plane_pods",
    Description: "List pods for a CoderControlPlane.",
}, func(ctx context.Context, _ *mcp.CallToolRequest, input listControlPlanePodsInput) (*mcp.CallToolResult, listControlPlanePodsOutput, error) {
    if input.Namespace == "" { return nil, ..., fmt.Errorf("namespace is required") }
    if input.Name == "" { return nil, ..., fmt.Errorf("name is required") }

    labels := map[string]string{
        "app.kubernetes.io/name":       "coder-control-plane",
        "app.kubernetes.io/instance":   input.Name,
        "app.kubernetes.io/managed-by": "coder-k8s",
    }

    pods := &corev1.PodList{}
    if err := k8sClient.List(ctx, pods, client.InNamespace(input.Namespace), client.MatchingLabels(labels)); err != nil {
        return nil, ..., fmt.Errorf("list control plane pods: %w", err)
    }

    ...
})

Defensive programming / guardrails (apply to all new tools):

  • Validate required fields (namespace, name).
  • Bound list sizes (e.g., cap returned items to N, or implement limit + continue like get_events).
  • Keep log/event outputs bounded (follow existing patterns).

2) Add “detail” tools for aggregated resources (still backed by hardcoded storage)

Files:

  • internal/app/mcpapp/tools.go

Add:

  • get_workspacek8sClient.Get on aggregationv1alpha1.CoderWorkspace
  • get_templatek8sClient.Get on aggregationv1alpha1.CoderTemplate

These are “real K8s operations” (they hit the API server); the returned objects may remain hardcoded until storage is replaced.

3) Extend aggregated API storage to support CRUD (enables real API flows with hardcoded data)

Why this matters: It lets us introduce MCP tools like set_workspace_running or create_workspace now, while still storing data in-memory.

Files:

  • internal/aggregated/storage/workspace.go
  • internal/aggregated/storage/template.go

Changes:

  • Add a sync.RWMutex around the internal maps to avoid data races.
  • Implement additional apiserver storage interfaces:
    • rest.Creater (Create)
    • rest.Updater (Update; enables PATCH/PUT)
    • rest.GracefulDeleter (Delete)
    • (optional) rest.CollectionDeleter (DeleteCollection)
  • Keep the existing hardcoded seed objects, but allow the map to be modified by API calls.

Minimal semantics (enough to unblock client integration):

  • Enforce (namespace,name) uniqueness.
  • Disallow renaming / namespace changes on update.
  • Support updating spec fields (e.g., spec.running).
  • Either reject status writes or ignore them for now.

4) Add MCP “operation” tools that exercise the new CRUD endpoints (optional but recommended)

Files:

  • internal/app/mcpapp/tools.go

Add narrowly-scoped tools (safer than exposing generic update):

  • set_workspace_running (inputs: namespace, name, running)
  • set_template_running (inputs: namespace, name, running)
  • (optional) create_workspace, delete_workspace, create_template, delete_template

Implementation can start as simple k8sClient.Patch / k8sClient.Update against the aggregated resources.

5) Update RBAC for new real-K8s inspection tools

Files:

  • deploy/rbac.yaml

Update the coder-k8s-mcp ClusterRole rules to include any newly-read core resources. For the tools above, likely:

  • apiGroups: ["apps"], resources: ["deployments"], verbs: ["get", "list", "watch"]
  • apiGroups: [""], resources: ["services"], verbs: ["get", "list", "watch"] (only if adding Service tools)

Keep MCP permissions read-only unless/until we explicitly add write tools that truly need them.

6) Tests

Files:

  • internal/app/mcpapp/tools_test.go (new)
  • internal/aggregated/storage/workspace_test.go (new)
  • internal/aggregated/storage/template_test.go (new)

Suggested coverage:

  • MCP tools: validate required inputs, label selection logic, and that list/get calls return expected summaries using a controller-runtime fake client.
  • Aggregated storage: Create/Update/Delete behavior, immutability constraints, namespace scoping.

7) Documentation

Files:

  • docs/how-to/mcp-server.md

Update “Available tools” to list the new tool names and short examples.

Validation (when implemented)

  • make test
  • make build
  • make lint
  • If manifests/RBAC are changed and generated artifacts are involved: make manifests (only if required by this repo’s workflow for the touched files).
Future follow-ups (intentionally out of scope for this pass)
  • Replace aggregated storage with a real backend (Coder API) while keeping the MCP tool contract stable.
  • Implement watch semantics for aggregated resources (or remove watch RBAC if we choose not to support it).
  • Add higher-level “diagnose control plane” tool that bundles deployment/service/pods/events into one response.

Implementation approach (single agent vs. team)

Recommendation: If you want the fastest path with the lowest coordination overhead, have one agent/engineer implement the whole plan end-to-end (tools + aggregated storage + RBAC + tests + docs). The changes are tightly coupled (tool shape ↔ RBAC ↔ storage semantics), and a single implementer reduces churn from interface mismatches.

If you prefer parallelism, split work by stable interfaces and use small PRs that merge in order:

Option A: Single agent (lowest risk)

  1. Add new read-only MCP tools (pods/deployment/service) + unit tests.
  2. Add get_workspace / get_template tools + tests.
  3. Update RBAC + docs.
  4. (Optional) Add aggregated CRUD + MCP operation tools + tests.

Option B: Team of agents (faster, but needs coordination)

  • Agent 1 (MCP tools): Implement sections 1, 2, 4 in internal/app/mcpapp/tools.go, plus tools_test.go.
  • Agent 2 (Aggregated storage): Implement section 3 in internal/aggregated/storage/{workspace,template}.go, plus focused storage tests.
  • Agent 3 (Ops surface): Update RBAC + docs (sections 5, 7) and ensure deployment manifests still align.

Coordination contract (do this up front):

  • Agree on the exact MCP tool names + JSON I/O schema, and the minimal storage semantics (create/update/delete rules).
  • Merge order: aggregated storage (if needed) → MCP tools → RBAC/docs.
  • Run the repo validation suite (make test, make build, make lint) on each PR to keep integration failures localized.

Generated with mux • Model: openai:gpt-5.3-codex • Thinking: xhigh

Generated with mux • Model: openai:gpt-5.3-codex • Thinking: xhigh • Cost: $0.45

- add MCP tools for control-plane pods, deployment, service, workspace/template get, and running-state updates
- add thread-safe CRUD support for aggregated in-memory workspace/template storage
- update MCP RBAC and docs, plus unit tests for tools and storage behavior

---

_Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$0.45`_

<!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=0.45 -->
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Please review this MCP/server/storage expansion.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2815c01c91

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/aggregated/storage/workspace.go Outdated
Comment thread internal/aggregated/storage/template.go Outdated
- reject workspace/template updates that omit metadata.resourceVersion
- preserve optimistic-lock semantics and return conflicts on mismatches
- add regression tests for missing resourceVersion update requests

---

_Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$0.45`_

<!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=0.45 -->
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Addressed the resourceVersion update feedback and added regression tests. Please take another look.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Another round soon, please!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ThomasK33 ThomasK33 added this pull request to the merge queue Feb 11, 2026
@ThomasK33
Copy link
Copy Markdown
Member Author

Merged via the queue into main with commit db4eb3d Feb 11, 2026
11 checks passed
@ThomasK33 ThomasK33 deleted the mcp-server-s3bz branch February 11, 2026 12:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant