🤖 feat: expand MCP control-plane and aggregated MCP operations by ThomasK33 · Pull Request #48 · coder/coder-k8s

ThomasK33 · 2026-02-11T12:16:51Z

Summary
This PR expands the MCP HTTP server tool surface so clients can inspect real Kubernetes control-plane workloads and operate on aggregated workspace/template resources through stable MCP contracts.

Background
The MCP server already exposed basic read-only tools, but clients still lacked direct tools for control-plane pod/deployment/service visibility and detail/update operations for aggregated resources. The aggregated API storage was also Get/List-only, which blocked end-to-end operation flows.

Implementation

Added MCP tools for:
- list_control_plane_pods
- get_control_plane_deployment_status
- get_service_status
- get_workspace
- get_template
- set_workspace_running
- set_template_running
Kept existing tools and added defensive validation/bounds/assertion-style checks.
Extended in-memory aggregated storage (workspace/template) with thread-safe CRUD behavior (Create, Update, Delete) plus namespace/name/resourceVersion invariants.
Added shared storage helpers for namespace resolution and resourceVersion increments.
Updated MCP RBAC for deployment/service reads and workspace/template update/patch.
Updated MCP docs with the full tool list.
Added MCP tool tests and focused storage CRUD tests.

Validation

make verify-vendor
make test
make build
make lint

Risks

Medium/contained: Aggregated storage update semantics now enforce metadata/resourceVersion constraints and ignore status writes on normal update; this is intentional for current in-memory behavior but could affect clients relying on looser semantics.
Low: New MCP tools are additive and map to existing Kubernetes resources; RBAC scope increase is limited to required resources/verbs.

📋 Implementation Plan

Plan: Connect MCP tools to real Kubernetes operations (keep resource data hardcoded)

Context / Why

The repo already has an MCP HTTP server (--app=mcp-http) with a small set of tools. Most of those tools already call the real Kubernetes API server, but some parts of the data plane (notably the aggregated API storage for CoderWorkspace/CoderTemplate) are still hardcoded in-memory.

This plan expands the MCP tool surface so clients can start integrating against a stable API now, while we keep the underlying resources hardcoded/stubbed where needed. The intent is to:

Keep the tool names + JSON schemas stable (treat them as an API contract).
Wire tools to real Kubernetes API calls wherever possible (pods, deployments, services, CRDs).
Where the backing system is still stubbed (aggregated workspaces/templates), implement the API operations (Get/Create/Update/Delete) against the in-memory store so the end-to-end request flow exists.

Goals

Add MCP tools that inspect real Kubernetes resources involved in running a CoderControlPlane (pods, deployments, services, events, logs).
Add missing “detail” tools for the aggregated resources (get_workspace, get_template) so clients don’t have to infer from list output.
(Optional but recommended) Extend the aggregated API server storage to support CRUD so we can add MCP tools for workspace/template lifecycle operations while still using hardcoded seed data.
Keep output sizes bounded and avoid leaking sensitive data (especially Secrets).

Non-goals

Replace the aggregated storage with a real Coder backend (that can come later).
Implement broad “kubectl apply” or arbitrary YAML execution via MCP.
Provide a fully-correct watch implementation for in-memory aggregated storage (nice-to-have later).

Evidence (repo facts)

MCP server tool registration + implementations: internal/app/mcpapp/tools.go
- Existing tools: list_control_planes, get_control_plane_status, list_workspaces, list_templates, get_events, get_pod_logs, check_health.
MCP server K8s client wiring (real clients): internal/app/mcpapp/server.go (ctrl.GetConfigOrDie(), controller-runtime client.New, kubernetes.NewForConfig).
Aggregated API server installs hardcoded in-memory storage: internal/app/apiserverapp/apiserverapp.go → storage.NewWorkspaceStorage() / storage.NewTemplateStorage().
Hardcoded aggregated storage (Get/List only):
- internal/aggregated/storage/workspace.go
- internal/aggregated/storage/template.go
Control-plane workload labels that we can use for real K8s lookups: internal/controller/codercontrolplane_controller.go (controlPlaneLabels).
Current MCP RBAC is read-only and does not include deployments/services: deploy/rbac.yaml.
MCP server usage docs: docs/how-to/mcp-server.md.

Implementation details

1) Expand MCP tools to cover real operator-managed K8s resources

Files:

internal/app/mcpapp/tools.go
(optional) internal/app/mcpapp/<new_file>.go if tools.go gets too large

Add read-only tools that map directly to real Kubernetes operations.

1.1 `list_control_plane_pods`

Purpose: Find pods for a given CoderControlPlane using its label set.
API: k8sClient.List(ctx, &corev1.PodList{}, client.InNamespace(ns), client.MatchingLabels(labels))
Input: { namespace, name } (control plane name)
Output: list of { name, namespace, phase, nodeName, readyContainers/total, startTime }

1.2 `get_control_plane_deployment_status`

Purpose: Inspect the apps/v1.Deployment created for a CoderControlPlane.
API: k8sClient.Get(ctx, client.ObjectKey{Namespace: ns, Name: cpName}, &appsv1.Deployment{})
Output: { replicas, readyReplicas, updatedReplicas, availableReplicas, conditions }

1.3 (Optional) `get_service_status`

Purpose: Inspect the Service created for a CoderControlPlane.
API: k8sClient.Get(ctx, key, &corev1.Service{})
Output: type/clusterIP/ports/annotations.

Security note: do not add Secret-reading tools (or, if absolutely required later, return only metadata and never .data).

Shape example (tool pattern):

mcp.AddTool(server, &mcp.Tool{
    Name:        "list_control_plane_pods",
    Description: "List pods for a CoderControlPlane.",
}, func(ctx context.Context, _ *mcp.CallToolRequest, input listControlPlanePodsInput) (*mcp.CallToolResult, listControlPlanePodsOutput, error) {
    if input.Namespace == "" { return nil, ..., fmt.Errorf("namespace is required") }
    if input.Name == "" { return nil, ..., fmt.Errorf("name is required") }

    labels := map[string]string{
        "app.kubernetes.io/name":       "coder-control-plane",
        "app.kubernetes.io/instance":   input.Name,
        "app.kubernetes.io/managed-by": "coder-k8s",
    }

    pods := &corev1.PodList{}
    if err := k8sClient.List(ctx, pods, client.InNamespace(input.Namespace), client.MatchingLabels(labels)); err != nil {
        return nil, ..., fmt.Errorf("list control plane pods: %w", err)
    }

    ...
})

Defensive programming / guardrails (apply to all new tools):

Validate required fields (namespace, name).
Bound list sizes (e.g., cap returned items to N, or implement limit + continue like get_events).
Keep log/event outputs bounded (follow existing patterns).

2) Add “detail” tools for aggregated resources (still backed by hardcoded storage)

Files:

internal/app/mcpapp/tools.go

Add:

get_workspace → k8sClient.Get on aggregationv1alpha1.CoderWorkspace
get_template → k8sClient.Get on aggregationv1alpha1.CoderTemplate

These are “real K8s operations” (they hit the API server); the returned objects may remain hardcoded until storage is replaced.

3) Extend aggregated API storage to support CRUD (enables real API flows with hardcoded data)

Why this matters: It lets us introduce MCP tools like set_workspace_running or create_workspace now, while still storing data in-memory.

Files:

internal/aggregated/storage/workspace.go
internal/aggregated/storage/template.go

Changes:

Add a sync.RWMutex around the internal maps to avoid data races.
Implement additional apiserver storage interfaces:
- rest.Creater (Create)
- rest.Updater (Update; enables PATCH/PUT)
- rest.GracefulDeleter (Delete)
- (optional) rest.CollectionDeleter (DeleteCollection)
Keep the existing hardcoded seed objects, but allow the map to be modified by API calls.

Minimal semantics (enough to unblock client integration):

Enforce (namespace,name) uniqueness.
Disallow renaming / namespace changes on update.
Support updating spec fields (e.g., spec.running).
Either reject status writes or ignore them for now.

4) Add MCP “operation” tools that exercise the new CRUD endpoints (optional but recommended)

Files:

internal/app/mcpapp/tools.go

Add narrowly-scoped tools (safer than exposing generic update):

set_workspace_running (inputs: namespace, name, running)
set_template_running (inputs: namespace, name, running)
(optional) create_workspace, delete_workspace, create_template, delete_template

Implementation can start as simple k8sClient.Patch / k8sClient.Update against the aggregated resources.

5) Update RBAC for new real-K8s inspection tools

Files:

deploy/rbac.yaml

Update the coder-k8s-mcp ClusterRole rules to include any newly-read core resources. For the tools above, likely:

apiGroups: ["apps"], resources: ["deployments"], verbs: ["get", "list", "watch"]
apiGroups: [""], resources: ["services"], verbs: ["get", "list", "watch"] (only if adding Service tools)

Keep MCP permissions read-only unless/until we explicitly add write tools that truly need them.

6) Tests

Files:

internal/app/mcpapp/tools_test.go (new)
internal/aggregated/storage/workspace_test.go (new)
internal/aggregated/storage/template_test.go (new)

Suggested coverage:

MCP tools: validate required inputs, label selection logic, and that list/get calls return expected summaries using a controller-runtime fake client.
Aggregated storage: Create/Update/Delete behavior, immutability constraints, namespace scoping.

7) Documentation

Files:

docs/how-to/mcp-server.md

Update “Available tools” to list the new tool names and short examples.

Validation (when implemented)

make test
make build
make lint
If manifests/RBAC are changed and generated artifacts are involved: make manifests (only if required by this repo’s workflow for the touched files).

Future follow-ups (intentionally out of scope for this pass)

Replace aggregated storage with a real backend (Coder API) while keeping the MCP tool contract stable.
Implement watch semantics for aggregated resources (or remove watch RBAC if we choose not to support it).
Add higher-level “diagnose control plane” tool that bundles deployment/service/pods/events into one response.

Implementation approach (single agent vs. team)

Recommendation: If you want the fastest path with the lowest coordination overhead, have one agent/engineer implement the whole plan end-to-end (tools + aggregated storage + RBAC + tests + docs). The changes are tightly coupled (tool shape ↔ RBAC ↔ storage semantics), and a single implementer reduces churn from interface mismatches.

If you prefer parallelism, split work by stable interfaces and use small PRs that merge in order:

Option A: Single agent (lowest risk)

Add new read-only MCP tools (pods/deployment/service) + unit tests.
Add get_workspace / get_template tools + tests.
Update RBAC + docs.
(Optional) Add aggregated CRUD + MCP operation tools + tests.

Option B: Team of agents (faster, but needs coordination)

Agent 1 (MCP tools): Implement sections 1, 2, 4 in internal/app/mcpapp/tools.go, plus tools_test.go.
Agent 2 (Aggregated storage): Implement section 3 in internal/aggregated/storage/{workspace,template}.go, plus focused storage tests.
Agent 3 (Ops surface): Update RBAC + docs (sections 5, 7) and ensure deployment manifests still align.

Coordination contract (do this up front):

Agree on the exact MCP tool names + JSON I/O schema, and the minimal storage semantics (create/update/delete rules).
Merge order: aggregated storage (if needed) → MCP tools → RBAC/docs.
Run the repo validation suite (make test, make build, make lint) on each PR to keep integration failures localized.

Generated with mux • Model: openai:gpt-5.3-codex • Thinking: xhigh

Generated with mux • Model: openai:gpt-5.3-codex • Thinking: xhigh • Cost: $0.45

- add MCP tools for control-plane pods, deployment, service, workspace/template get, and running-state updates - add thread-safe CRUD support for aggregated in-memory workspace/template storage - update MCP RBAC and docs, plus unit tests for tools and storage behavior --- _Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$0.45`_

ThomasK33 · 2026-02-11T12:17:11Z

@codex review

Please review this MCP/server/storage expansion.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2815c01c91

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- reject workspace/template updates that omit metadata.resourceVersion - preserve optimistic-lock semantics and return conflicts on mismatches - add regression tests for missing resourceVersion update requests --- _Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$0.45`_

ThomasK33 · 2026-02-11T12:25:50Z

@codex review

Addressed the resourceVersion update feedback and added regression tests. Please take another look.

chatgpt-codex-connector · 2026-02-11T12:32:40Z

Codex Review: Didn't find any major issues. Another round soon, please!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

ThomasK33 · 2026-02-11T12:52:39Z

https://mux.md/bG6Er#K54vyblzULQt0Q

chatgpt-codex-connector Bot reviewed Feb 11, 2026

View reviewed changes

Comment thread internal/aggregated/storage/workspace.go Outdated

Comment thread internal/aggregated/storage/template.go Outdated

ThomasK33 added this pull request to the merge queue Feb 11, 2026

Merged via the queue into main with commit db4eb3d Feb 11, 2026
11 checks passed

ThomasK33 deleted the mcp-server-s3bz branch February 11, 2026 12:53

ThomasK33 mentioned this pull request Feb 11, 2026

🤖 feat: wire aggregated API server to codersdk backend #50

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🤖 feat: expand MCP control-plane and aggregated MCP operations#48

🤖 feat: expand MCP control-plane and aggregated MCP operations#48
ThomasK33 merged 2 commits into
mainfrom
mcp-server-s3bz

ThomasK33 commented Feb 11, 2026

Uh oh!

ThomasK33 commented Feb 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

ThomasK33 commented Feb 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Feb 11, 2026

Uh oh!

ThomasK33 commented Feb 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ThomasK33 commented Feb 11, 2026

Plan: Connect MCP tools to real Kubernetes operations (keep resource data hardcoded)

Context / Why

Goals

Non-goals

Evidence (repo facts)

Implementation details

1) Expand MCP tools to cover real operator-managed K8s resources

1.1 list_control_plane_pods

1.2 get_control_plane_deployment_status

1.3 (Optional) get_service_status

2) Add “detail” tools for aggregated resources (still backed by hardcoded storage)

3) Extend aggregated API storage to support CRUD (enables real API flows with hardcoded data)

4) Add MCP “operation” tools that exercise the new CRUD endpoints (optional but recommended)

5) Update RBAC for new real-K8s inspection tools

6) Tests

7) Documentation

Validation (when implemented)

Implementation approach (single agent vs. team)

Option A: Single agent (lowest risk)

Option B: Team of agents (faster, but needs coordination)

Uh oh!

ThomasK33 commented Feb 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

ThomasK33 commented Feb 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Feb 11, 2026

Uh oh!

ThomasK33 commented Feb 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1.1 `list_control_plane_pods`

1.2 `get_control_plane_deployment_status`

1.3 (Optional) `get_service_status`