Skip to content

🤖 feat: unify controller, aggregated API server, and MCP into --app=all default mode#53

Merged
ThomasK33 merged 12 commits into
mainfrom
startup-gy16
Feb 12, 2026
Merged

🤖 feat: unify controller, aggregated API server, and MCP into --app=all default mode#53
ThomasK33 merged 12 commits into
mainfrom
startup-gy16

Conversation

@ThomasK33
Copy link
Copy Markdown
Member

Summary

Adds a new --app=all mode (now the default) that runs the controller, aggregated API server, and MCP HTTP server in a single process with a shared controller-runtime cache/client. This eliminates the need for three separate Deployments and enables single-pod deployment.

Background

Previously, coder-k8s required deploying three separate pods:

  • --app=controller — the operator managing CoderControlPlane, WorkspaceProxy, and CoderProvisioner CRDs
  • --app=aggregated-apiserver — the aggregated API server for CoderWorkspace/CoderTemplate (required manual --coder-url, --coder-session-token, --coder-namespace flags)
  • --app=mcp-http — the MCP server for AI tool integration

This created operational overhead and prevented the MCP server from sharing the controller's cache-backed informers, leading to redundant API reads.

Implementation

Architecture: Shared cache composition root

Created internal/app/allapp/allapp.go as a single composition root that:

  1. Builds one controller-runtime manager with a shared scheme (core + coder.com/v1alpha1 + aggregation.coder.com/v1alpha1)
  2. Wires all three components through the same manager:
    • Controller reconcilers — registered via controllerapp.SetupControllers(mgr), leader-election gated
    • Aggregated API server — runs as a non-leader runnable (all replicas), uses dynamic control-plane discovery
    • MCP HTTP server — runs as a non-leader runnable, shares the manager's cache-backed client.Client

Dynamic control-plane discovery

The aggregated API server no longer requires manual --coder-url/--coder-session-token flags in all mode. Instead, a new ControlPlaneClientProvider (internal/aggregated/coder/controlplane_provider.go) dynamically discovers Coder instances:

  1. Lists CoderControlPlane objects in the request namespace
  2. Filters eligible instances (operator access enabled + ready, secret ref present, URL present, name doesn't contain .)
  3. Fetches the operator token from the referenced Secret (using the manager's uncached API reader to avoid caching secrets in informers)
  4. Builds a codersdk client for API calls

Refactored components for reuse

  • controllerapp: Extracted NewManager(), SetupControllers(), SetupProbes() for reuse by allapp
  • mcpapp: Added RunHTTPWithClients() to accept injected clients instead of constructing its own
  • sharedscheme: New package providing a unified scheme used by all components
  • apiserverapp: Added Options.ClientProvider override so allapp can inject the dynamic provider

Unified deployment manifests

Replaced three separate Deployments with a single deploy/deployment.yaml:

  • One Deployment, one ServiceAccount, one ClusterRole (union of all permissions)
  • No --app arg needed (defaults to all)
  • Exposes all three ports: 8081 (health), 6443 (aggregated API), 8090 (MCP)
  • Services updated to target the unified pod selector

Backward compatibility

Existing explicit --app=controller, --app=aggregated-apiserver, and --app=mcp-http modes remain fully functional for users who prefer split deployments.

Validation

  • make verify-vendor
  • make build
  • make test ✅ (all existing + new tests)
  • make lint

Risks

  • Memory: Running all three components in one process increases per-pod memory. The shared cache helps (avoids duplicate informers between controller and MCP), but the aggregated API server's codersdk calls are still per-request.
  • Blast radius: A crash in any component takes down all three. Existing split deployment mode remains available as a mitigation path.
  • Multi-instance: The dynamic provider currently returns BadRequest when multiple eligible CoderControlPlane instances exist in the same namespace. Multi-instance support (with control-plane-prefixed naming) is planned as a follow-up.

📋 Implementation Plan

Plan: unify controller + aggregated API server + MCP into one --app=all (shared cache)

Context / Why

Today coder-k8s runs as one of three separate app modes selected by --app:

  • controller → controller-runtime operator for coder.com/v1alpha1 (CoderControlPlane, etc.)
  • aggregated-apiserver → aggregated API server for aggregation.coder.com/v1alpha1 (CoderWorkspace, CoderTemplate)
  • mcp-http → MCP server (HTTP transport) that interacts with both sets of APIs

Goal:

  1. Add a new app mode --app=all and make it the default.
  2. In all mode, run all three components in a single process and support deploying them as a single pod.
  3. Ensure the controller and MCP server share a single controller-runtime cache/client, so we don’t run separate informer stacks or issue redundant API reads.

Non-goals (for this change):

  • Re-architect the codersdk-backed aggregated API storage beyond adding dynamic control-plane routing.
  • Redesign authn/z of the aggregated API server (it currently uses anonymous + always-allow).

Evidence (what we verified)

From repo inspection:

  • app_dispatch.go currently requires --app and supports: controller, aggregated-apiserver, mcp-http.
  • internal/app/controllerapp/controllerapp.go creates a controller-runtime Manager and uses the manager’s cache-backed client.
  • api/v1alpha1/codercontrolplane_types.go + internal/controller/codercontrolplane_controller.go implement operator access:
    • spec.operatorAccess.disabled gates whether the operator/admin token is managed
    • status.operatorTokenSecretRef points at the Secret/key containing the admin token
    • status.operatorAccessReady indicates token bootstrap success
    • status.url is the in-cluster coderd base URL used for service discovery
  • internal/app/apiserverapp/apiserverapp.go serves coderworkspaces/codertemplates via codersdk-backed storage (internal/aggregated/storage/*).
    • Storage resolves a codersdk client through internal/aggregated/coder.ClientProvider.
    • Current deployment mode uses coder.NewStaticClientProvider configured via --coder-url, --coder-session-token, and --coder-namespace.
  • internal/app/mcpapp/server.go builds its own non-cached controller-runtime client (client.New) + client-go clientset; mcpapp.NewServer already supports injecting a client.Client + clientset.
  • internal/app/mcpapp/tools.go uses:
    • controller-runtime client for CRUD/listing of CoderControlPlane, CoderWorkspace, CoderTemplate, Pods, Services, Deployments
    • client-go clientset for Events and Pod logs
  • controller-runtime cache supports starting new informers on-demand by default (ReaderFailOnMissingInformer defaults to false).
  • controller-runtime Manager.Add(...) behavior: any runnable that doesn’t implement LeaderElectionRunnable is treated as leader-election gated. To run MCP/APIServer on all replicas, they must implement NeedLeaderElection() bool { return false }.
  • deploy/ currently has three Deployments and separate ServiceAccounts/RBAC; a single pod requires one ServiceAccount with the union of permissions.

Key files consulted:

  • app_dispatch.go, main_test.go
  • internal/app/controllerapp/controllerapp.go
  • internal/app/apiserverapp/apiserverapp.go
  • internal/app/mcpapp/http.go, internal/app/mcpapp/server.go, internal/app/mcpapp/tools.go
  • deploy/*.yaml (controller/apiserver/mcp)
  • vendor/sigs.k8s.io/controller-runtime/pkg/cache/*, vendor/.../pkg/manager/*

Implementation details

1) Introduce an all-mode “composition root” package

Create a new package:

  • internal/app/allapp/allapp.go

It will be the single place where we wire together:

  • controller-runtime manager (shared cache)
  • aggregated API server runnable
  • MCP HTTP server runnable

High-level shape:

package allapp

func Run(ctx context.Context) error {
    if ctx == nil {
        return fmt.Errorf("assertion failed: context must not be nil")
    }

    // 1) Build shared scheme (core + coder + aggregation)
    // 2) Build one rest.Config
    // 3) Build one controller-runtime manager
    // 4) Setup reconcilers on the manager
    // 5) Add MCP + aggregated apiserver as non-leader runnables
    // 6) Start manager (single blocking call)
}

Why a dedicated package?

  • Keeps main/dispatch simple.
  • Guarantees all mode really shares a single cache and config.
  • Makes it easier to test cache sharing via dependency injection.
Why not just run the existing 3 Run() functions concurrently?

Calling controllerapp.Run(ctx) + mcpapp.RunHTTP(ctx) concurrently would not share controller-runtime cache/client instances, because controllerapp.Run constructs a Manager internally and mcpapp.RunHTTP constructs its own client/clientset.

To share cache, we need a composition root that constructs the manager once and passes its cache-backed client into MCP.


2) Create a shared scheme builder used by controller + MCP

We need the manager’s scheme to include:

  • clientgoscheme.AddToScheme
  • coderv1alpha1.AddToScheme
  • aggregationv1alpha1.AddToScheme (MCP reads/patches these types)

Options (pick one; recommended is A):

A) Create internal/app/sharedscheme/sharedscheme.go:

package sharedscheme

func New() *runtime.Scheme {
    scheme := runtime.NewScheme()
    utilruntime.Must(clientgoscheme.AddToScheme(scheme))
    utilruntime.Must(coderv1alpha1.AddToScheme(scheme))
    utilruntime.Must(aggregationv1alpha1.AddToScheme(scheme))
    return scheme
}

Then:

  • controllerapp.NewScheme() becomes a thin wrapper calling sharedscheme.New() (or remove/replace).
  • mcpapp.newScheme() becomes a thin wrapper calling sharedscheme.New().
  • allapp uses sharedscheme.New().

B) Expand controllerapp.NewScheme() to also register aggregationv1alpha1 and delete mcpapp.newScheme().

(A avoids circular dependencies and keeps the scheme definition single-sourced.)


3) Refactor controller mode to allow reuse of the manager (minimal surgical extraction)

Currently controllerapp.Run both builds the manager and starts it.

We want allapp to build the manager once and start it once, but reuse controller setup.

Edits in internal/app/controllerapp/controllerapp.go:

  1. Extract manager construction:
func NewManager(cfg *rest.Config, scheme *runtime.Scheme) (manager.Manager, error) {
    // asserts
    // ctrl.NewManager(cfg, ctrl.Options{ ... })
}
  1. Extract controller wiring:
func SetupControllers(mgr manager.Manager) error {
    // create reconcilers and call SetupWithManager
}
  1. Extract health checks:
func SetupProbes(mgr manager.Manager) error {
    // AddHealthzCheck + AddReadyzCheck
}
  1. Keep existing Run(ctx) behavior by composing the above:
func Run(ctx context.Context) error {
    cfg := ctrl.GetConfigOrDie()
    scheme := sharedscheme.New()
    mgr, err := NewManager(cfg, scheme)
    ...
    if err := SetupControllers(mgr); err != nil { ... }
    if err := SetupProbes(mgr); err != nil { ... }
    return mgr.Start(ctx)
}

Also: detectLeaderElectionNamespace() is currently unexported; keep it in controllerapp and use it from NewManager.


4) Make MCP HTTP server runnable accept injected (shared) clients

We want MCP to use the manager’s cache-backed client:

  • k8sClient := mgr.GetClient()

and a clientset built from the same config:

  • clientset := kubernetes.NewForConfig(mgr.GetConfig())

Edits in internal/app/mcpapp/http.go:

  • Add a new function (preferred):
func RunHTTPWithClients(ctx context.Context, k8sClient client.Client, clientset kubernetes.Interface) error {
    // assert ctx/k8sClient/clientset
    server := NewServer(k8sClient, clientset)
    // start HTTP server (existing logic)
}
  • Keep RunHTTP(ctx) for standalone mode by delegating:
func RunHTTP(ctx context.Context) error {
    k8sClient, clientset, err := newClients()
    ...
    return RunHTTPWithClients(ctx, k8sClient, clientset)
}

This allows allapp to reuse the same cache-backed client without duplicating MCP’s HTTP lifecycle code.


5) Aggregated API server in all mode: dynamic discovery + operator-token auth

Current state (post-rebase):

  • Aggregated API storage is already codersdk-backed (internal/aggregated/storage/*).
  • Storage resolves a Coder API client through internal/aggregated/coder.ClientProvider (today: a namespace-pinned static provider built from --coder-url, --coder-session-token, --coder-namespace in apiserverapp.buildClientProvider).

Goal for all mode:

  • No manual --coder-* wiring.
  • For each request, discover the relevant CoderControlPlane(s), fetch the operator admin token from the Secret created by the controller, and use it for codersdk calls.
  • If spec.operatorAccess.disabled: true, that control plane must be treated as invisible: do not read its Secret, do not call its coderd, and do not return its resources.

5.1) Implement a control-plane-backed ClientProvider

Add internal/aggregated/coder/controlplane_provider.go (name flexible) that uses Kubernetes to build codersdk clients on demand.

Recommended interface evolution (minimal but supports LIST across instances):

type ClientProvider interface {
    // Existing behavior: used by GET/CREATE/UPDATE/DELETE once the control plane is known.
    ClientForNamespace(ctx context.Context, namespace string) (*codersdk.Client, error)

    // New: used by LIST to support multiple control planes per namespace and cluster-wide listing.
    ClientsForNamespace(ctx context.Context, namespace string) (map[types.NamespacedName]*codersdk.Client, error)
}

Provider construction inputs (from allapp):

  • client.Reader for CoderControlPlane discovery (cached mgr.GetClient() is fine)
  • client.Reader for Secret reads (prefer mgr.GetAPIReader() to avoid caching secret tokens in informers)
  • request timeout (reuse --coder-request-timeout, default 30s)

Eligibility filter (MUST run before any Secret read or coderd call):

  • cp.Spec.OperatorAccess.Disabled == true ⇒ skip
  • cp.Status.OperatorAccessReady != true ⇒ skip
  • cp.Status.OperatorTokenSecretRef == nil ⇒ skip
  • strings.TrimSpace(cp.Status.URL) == "" ⇒ skip

Token fetch:

  • Secret namespace: cp.Namespace
  • Secret name: cp.Status.OperatorTokenSecretRef.Name
  • Secret key: cp.Status.OperatorTokenSecretRef.Key (default to coderv1alpha1.DefaultTokenSecretKey when empty)

SDK construction:

  • Parse cp.Status.URL as a url.URL
  • coder.NewSDKClient(coder.Config{CoderURL: parsed, SessionToken: token, RequestTimeout: timeout})

Error handling rules:

  • Never log/return token contents.
  • If a namespace has zero eligible control planes, LIST returns an empty list; GET returns NotFound.
  • If a namespace has multiple eligible control planes and the request doesn’t identify one, return BadRequest with guidance (see §5.2).

5.2) Disambiguate multiple control planes per namespace (instance selection)

To support “multiple deployments in the same namespace” without collisions, encode the CoderControlPlane name into aggregated object names.

Recommended naming scheme:

  • Workspace: <control-plane>.<org>.<user>.<workspace>
  • Template: <control-plane>.<org>.<template>

Implementation:

  • Update internal/aggregated/coder/names.go:
    • Add Parse/Build helpers that include a control-plane segment.
    • Keep backward compatibility: accept old names (no control-plane segment) only when exactly one eligible control plane exists in that namespace; otherwise return BadRequest asking for the prefixed form.
  • Update api/aggregation/v1alpha1/types.go comments to match the new naming scheme.
  • Add labels for filtering/debugging:
    • aggregation.coder.com/control-plane-name=<cp.Name>
    • aggregation.coder.com/control-plane-namespace=<cp.Namespace>
Note: control plane names containing '.'

The aggregated naming scheme uses '.' as a separator. Kubernetes object names may contain '.', so for the initial implementation we will treat any CoderControlPlane whose name contains '.' as ineligible for aggregation (skip it) and surface a clear warning/error explaining why it’s being skipped.

5.3) Update converters + storage to use the dynamic provider

Update converters:

  • internal/aggregated/convert/workspace.go and template.go:
    • accept controlPlaneName string
    • build names with the new prefix
    • set the control-plane labels

Update storages:

  • internal/aggregated/storage/workspace.go and template.go:
    • LIST:
      • if request namespace is set: call provider.ClientsForNamespace(ctx, namespace) and merge results across all eligible control planes
      • if request namespace is empty (kubectl -A): call provider.ClientsForNamespace(ctx, "") and merge across all namespaces
    • GET/CREATE/UPDATE/DELETE:
      • parse the control-plane segment from name
      • resolve the appropriate codersdk client (either via a dedicated ClientForControlPlane helper, or by storing the chosen control-plane in ctx before calling ClientForNamespace)

Critically: when operator access is disabled, objects from that control plane must not appear in LIST results and must behave like NotFound/BadRequest for other verbs.

5.4) Wire dynamic provider only in all mode

Update internal/app/apiserverapp/apiserverapp.go:

  • Extend Options with a provider override, e.g. ClientProvider coder.ClientProvider.
  • In RunWithOptions, if override is non-nil, use it; otherwise keep the existing static-provider behavior for --app=aggregated-apiserver.

In internal/app/allapp/allapp.go:

  • Construct the control-plane-backed provider using mgr.GetClient() + mgr.GetAPIReader().
  • Pass it to apiserverapp.RunWithOptions.

6) Run aggregated apiserver + MCP as non-leader manager runnables

In internal/app/allapp/allapp.go:

  1. Build one manager:
  • scheme := sharedscheme.New()
  • cfg := ctrl.GetConfigOrDie()
  • mgr, err := controllerapp.NewManager(cfg, scheme)
  • controllerapp.SetupControllers(mgr)
  • controllerapp.SetupProbes(mgr)
  1. Add the aggregated API server as a runnable.

Because we want it to run on every pod replica (not gated by leader election), wrap it in a type implementing LeaderElectionRunnable:

type nonLeaderRunnable struct {
    run func(context.Context) error
}

func (r nonLeaderRunnable) Start(ctx context.Context) error { return r.run(ctx) }
func (r nonLeaderRunnable) NeedLeaderElection() bool        { return false }

Then (wait for cache sync and run with the dynamic control-plane provider):

_ = mgr.Add(nonLeaderRunnable{run: func(ctx context.Context) error {
    syncCtx, cancel := context.WithTimeout(ctx, 30*time.Second)
    defer cancel()
    if ok := mgr.GetCache().WaitForCacheSync(syncCtx); !ok {
        return fmt.Errorf("cache did not sync before timeout")
    }

    provider, err := coder.NewControlPlaneClientProvider(
        mgr.GetClient(),   // cached discovery for CoderControlPlane
        mgr.GetAPIReader(), // uncached Secret reads (avoid caching tokens)
        30*time.Second,
    )
    if err != nil {
        return err
    }

    return apiserverapp.RunWithOptions(ctx, apiserverapp.Options{
        ClientProvider:      provider,
        CoderRequestTimeout: 30 * time.Second,
    })
}})
  1. Add MCP runnable that waits for cache sync before serving:
_ = mgr.Add(nonLeaderRunnable{run: func(ctx context.Context) error {
    // Defensive: don’t serve MCP until cache is ready.
    syncCtx, cancel := context.WithTimeout(ctx, 30*time.Second)
    defer cancel()
    if ok := mgr.GetCache().WaitForCacheSync(syncCtx); !ok {
        return fmt.Errorf("cache did not sync before timeout")
    }

    clientset, err := kubernetes.NewForConfig(mgr.GetConfig())
    ...
    return mcpapp.RunHTTPWithClients(ctx, mgr.GetClient(), clientset)
}})
  1. Finally start the manager:
return mgr.Start(ctx)

This yields a single top-level blocking call and ensures the MCP server uses the exact same cache and config as the operator.

Note on cache size / informers

Using mgr.GetClient() in MCP means reads/lists will go through the shared cache and may start informers for object types MCP touches (Pods, Services, Deployments, aggregated API types, etc.). This is desirable for “share caches” but may increase memory usage vs direct REST reads.

If this becomes an issue, a follow-up optimization is:

  • set Cache.ReaderFailOnMissingInformer = true
  • use mgr.GetAPIReader() for types we explicitly do not want to cache (e.g., Pods)

That refinement can be postponed until profiling shows it’s needed.


7) Update app dispatch: add --app=all and make it default

Edits in app_dispatch.go:

  • Add internal/app/allapp import.
  • Update supportedAppModes and help text to include: all, controller, aggregated-apiserver, mcp-http.
  • Make --app default "all".
  • Add case "all": return allapp.Run(setupSignalHandler()).
  • Remove the case "": ... required branch (--app is no longer required).
  • Keep the existing aggregated-apiserver flags (--coder-url, --coder-session-token, --coder-namespace, --coder-request-timeout) and their validation logic for --app=aggregated-apiserver.
    • In all mode these should not be required; they can be ignored (or later repurposed as global codersdk tuning knobs).

8) Deploy manifests: single Deployment/Pod, preserve Services

Add a new unified Deployment manifest and migrate existing Services/selectors.

  1. Replace the three example Deployments with a single unified Deployment manifest.

Recommended approach:

  • Create deploy/deployment.yaml (or rename deploy/controller-deployment.yamldeploy/deployment.yaml).
  • Remove deploy/apiserver-deployment.yaml and deploy/mcp-deployment.yaml from the “happy path” examples (either delete them or move them under deploy/legacy/), to avoid users accidentally deploying 3 pods that each run all components.

Unified Deployment details:

  • single Deployment name (e.g., coder-k8s)
  • one container ghcr.io/coder/coder-k8s:latest
  • drop the container args: entirely (no --app), relying on the new default --app=all
  • expose ports: 8081 (controller probes), 6443 (aggregated apiserver), 8090 (MCP)
  1. Update deploy/apiserver-service.yaml selector to match the combined deployment label.

  2. Update deploy/mcp-service.yaml selector to match the combined deployment label.

  3. Update deploy/rbac.yaml:

  • Replace 3 ServiceAccounts with 1 (e.g., coder-k8s).
  • Bind:
    • controller ClusterRole rules
    • MCP ClusterRole rules
    • apiserver auth delegation bindings (system:auth-delegator + extension-apiserver-authentication-reader)

This yields a single pod with the union of permissions.

  1. Probes:
  • Initially, point readiness/liveness at the controller probe port :8081.
  • Extend controllerapp.SetupProbes (in all mode only, or generally) with additional readiness checks that confirm:
    • aggregated apiserver is serving (TCP dial localhost:6443 or HTTP GET https://localhost:6443/readyz if available)
    • MCP server is serving (TCP dial localhost:8090 or expose an atomic ready flag)

(Implementation choice depends on whether we want to introduce a new composite endpoint vs extending the controller’s existing /readyz.)

  1. Update any other “example deployment” YAMLs that run coder-k8s to rely on the default all mode.
  • config/e2e/deployment.yaml: remove args: ["--app=controller"] so the container starts in default all mode.
    • (Optional) Add container ports 6443 and 8090 so the manifest matches the unified runtime.

9) Tests

Update and add tests to prevent regressions.

  1. main_test.go
  • Replace TestRunRejectsEmptyMode with TestRunDefaultsToAllMode.
  • Add TestRunDispatchesAllMode.
  • Keep TestRunRejectsUnknownMode.
  1. New tests for cache sharing (recommended)

Create internal/app/allapp/allapp_test.go with dependency injection to avoid requiring a real Kubernetes cluster:

  • In allapp, define package-level vars for constructors used inside Run, e.g.:
    • newManager (wraps controllerapp.NewManager)
    • newClientset (wraps kubernetes.NewForConfig)
    • runAggregatedAPIServer (wraps apiserverapp.RunWithOptions)
    • runMCPHTTPWithClients (wraps mcpapp.RunHTTPWithClients)

In tests, stub these to:

  • return a fake manager with:
    • deterministic GetClient() pointer
    • fake GetCache().WaitForCacheSync returning true
  • capture the k8sClient passed into runMCPHTTPWithClients
  • assert pointer equality with mgr.GetClient()

This directly enforces the “shared cache client” requirement.

  1. New tests for dynamic discovery + operator-access skipping (recommended)
  • internal/aggregated/coder/*_test.go:
    • provider skips spec.operatorAccess.disabled=true without reading Secrets or calling coderd
    • provider skips status.operatorAccessReady=false
    • provider errors clearly when multiple eligible control planes exist but the request omits the control-plane name segment
  • internal/aggregated/storage/*_test.go:
    • LIST merges results across multiple eligible control planes in the same namespace and prefixes names correctly
    • LIST returns empty when all control planes are disabled/not-ready

10) Validation (when implementing)

Run:

  • make test
  • make build
  • make lint
  • make verify-vendor

If deploy manifests are considered part of release artifacts, also validate:

  • go run github.com/rhysd/actionlint/cmd/actionlint@v1.7.10 (only if workflow edits happen)

Rollout / migration notes

  • Existing deployments that explicitly set --app=controller / --app=aggregated-apiserver / --app=mcp-http remain valid.
  • The new unified deployment can omit --app and will run all components.
  • Because a single pod can only have one ServiceAccount, unified deployment requires new combined RBAC.

Suggested execution strategy (single agent vs. multiple agents)

Recommendation: implement with a single agent/engineer if possible.

Rationale:

  • The change is cross-cutting (dispatch + controller refactor + MCP injection + new allapp wiring + RBAC/manifests) and benefits from tight iteration with end-to-end validation (make test/build/lint).
  • Multiple agents working in parallel will likely collide on the same core files (e.g. controllerapp/controllerapp.go, app_dispatch.go, deployment YAMLs), creating merge churn.

If you do want to parallelize, the safest split is along file ownership boundaries, with one “integration agent” responsible for final wiring + validation:

  1. Agent A (runtime wiring / shared cache)

    • internal/app/allapp/*
    • internal/app/sharedscheme/*
    • internal/app/controllerapp/* refactor (NewManager, SetupControllers, SetupProbes)
  2. Agent B (MCP refactor)

    • internal/app/mcpapp/http.go (RunHTTPWithClients), keep standalone mode working
  3. Agent C (aggregated API server dynamic provider + multi-instance support)

    • internal/app/apiserverapp/* (add Options.ClientProvider override; keep static --coder-* mode working)
    • internal/aggregated/coder/* (control-plane-backed provider, naming helpers)
    • internal/aggregated/storage/* + internal/aggregated/convert/* (control-plane-prefixed names; LIST merge across instances; enforce disabled skip)
    • api/aggregation/v1alpha1/types.go + generated docs if naming comments change
  4. Agent D (dispatch + unit tests + docs)

    • app_dispatch.go, main_test.go
    • Update docs/README snippets that mention --app=... (if desired in this change)
  5. Agent E (manifests/RBAC)

    • deploy/*.yaml, config/e2e/deployment.yaml
    • Create unified Deployment manifest + update Services + unify ServiceAccount/RBAC
  6. Integration agent (final pass)

    • Rebases/merges work from A–E
    • Ensures the unified deployment examples are coherent and don’t accidentally create 3 pods each running all
    • Runs final validation and fixes any test/lint fallout

Generated with mux • Model: anthropic:claude-opus-4-6 • Thinking: xhigh • Cost: $1.51

@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 60b597c00e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread deploy/rbac.yaml Outdated
Comment thread internal/app/allapp/allapp.go Outdated
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Pushed a fix for the flaky TestDeleteProvisionerKey_Success CI failure — isolated codersdk.Client HTTP transports to prevent http.DefaultTransport connection pool sharing across parallel tests.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 160a0e26fb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread deploy/rbac.yaml Outdated
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Addressed both review comments:

  1. P1 (events RBAC): Added get/list/watch verbs for events in the unified ClusterRole so MCP's get_events tool works correctly.

  2. P2 (all-namespaces LIST): Added NamespaceResolver interface and implemented it on both StaticClientProvider and ControlPlaneClientProvider. Updated namespaceForListConversion to use the interface instead of type-asserting *StaticClientProvider. Added tests for all paths.

Also fixed a flaky TestDeleteProvisionerKey_Success test (isolated HTTP transports per codersdk client).

@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

The events RBAC comment has already been addressed in commit 25feb31a — the unified ClusterRole now grants get/list/watch/create/patch on events. Please review the latest HEAD.

@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Fixed E2E smoke test — restored --app=controller in config/e2e/deployment.yaml since the E2E test validates controller mode only and doesn't provision the infrastructure needed for all mode (APIService registration, TLS).

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 110b428af7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread deploy/deployment.yaml
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Updated all documentation references to use the unified deploy/deployment.yaml manifest. No remaining references to the deleted per-component deployment files. make docs-check passes.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7f5c16d313

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread app_dispatch.go Outdated
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Plumbed --coder-request-timeout through all mode — allapp.Run now accepts the timeout parameter, defaults to 30s when zero, and passes it to both the dynamic provider and apiserver options. Added corresponding test assertions.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Swish!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ThomasK33 ThomasK33 added this pull request to the merge queue Feb 12, 2026
Merged via the queue into main with commit d63b3db Feb 12, 2026
11 checks passed
@ThomasK33 ThomasK33 deleted the startup-gy16 branch February 12, 2026 07:16
@ThomasK33
Copy link
Copy Markdown
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant