Skip to content

🤖 fix: make generated RBAC the single source of truth#60

Merged
ThomasK33 merged 1 commit into
mainfrom
fix-rbac-single-source
Feb 12, 2026
Merged

🤖 fix: make generated RBAC the single source of truth#60
ThomasK33 merged 1 commit into
mainfrom
fix-rbac-single-source

Conversation

@ThomasK33
Copy link
Copy Markdown
Member

Summary

Consolidate RBAC manifests so config/rbac/ is the single source of truth for in-cluster RBAC, remove duplicated RBAC files, and update tooling/docs to apply RBAC from config/rbac/.

Background

deploy/rbac.yaml had drifted from generated RBAC and was missing permissions needed by controllers/MCP flows. This caused real-cluster informer sync failures due to forbidden list/watch operations.

Implementation

  • Added MCP +kubebuilder:rbac markers in internal/app/mcpapp/server.go for events, namespaces, pods, pod logs, services, deployments, and aggregated API resources.
  • Regenerated manifests so config/rbac/role.yaml includes the MCP-required permissions.
  • Moved apply-time RBAC glue into config/rbac/:
    • serviceaccount.yaml
    • clusterrolebinding.yaml
    • auth-delegator-binding.yaml
    • authentication-reader-binding.yaml
  • Removed duplicated RBAC manifests:
    • deploy/rbac.yaml
    • config/e2e/serviceaccount.yaml
    • config/e2e/clusterrole-binding.yaml
  • Updated automation:
    • hack/kind-dev.sh now applies namespace + config/rbac/ (no e2e RBAC duplicates)
    • CI e2e-kind workflow now applies namespace/CRDs/RBAC explicitly and deploys only config/e2e/deployment.yaml
  • Updated docs/examples to use kubectl apply -f config/rbac/.

Validation

  • make manifests
  • make verify-vendor
  • make test
  • make build
  • make lint
  • make docs-check

Risks

Low-to-moderate. This touches deployment RBAC wiring and removes legacy manifest paths. The mitigations are generated RBAC from source markers, CI/developer flow updates, and full local validation.


📋 Implementation Plan

Plan: Make generated RBAC the single source of truth (remove deploy/rbac.yaml)

Context / Why

Today we have two RBAC definitions:

  • config/rbac/role.yaml (generated by controller-gen from +kubebuilder:rbac markers; ClusterRole manager-role).
  • deploy/rbac.yaml (hand-curated “one file to apply”; ClusterRole coder-k8s).

These have drifted: deploy/rbac.yaml is missing permissions that the controllers (notably CoderProvisioner) require (list/watch on secrets/serviceaccounts/roles/rolebindings). In a real cluster this causes controller-runtime informers to fail to sync and the operator to exit/restart.

Goal: Eliminate drift by making the generated RBAC under config/rbac/ the only source of RBAC rules, and update docs/examples to apply config/rbac/ instead of deploy/rbac.yaml.

Evidence

  • Generated RBAC is produced by hack/update-manifests.sh (controller-gen rbac:roleName=manager-role … output:rbac:artifacts:config=config/rbac).
  • Drift + runtime failure reproduced in kind:
    • Operator logs show … is forbidden: User "system:serviceaccount:coder-system:coder-k8s" cannot list … for secrets/serviceaccounts/roles/rolebindings.
    • kubectl auth can-i --as=system:serviceaccount:coder-system:coder-k8s list secretsno.
  • deploy/rbac.yaml currently grants secrets: [get] only, while controllers and MCP tools use list/watch/update APIs.
  • Docs/examples reference deploy/rbac.yaml in:
    • docs/how-to/deploy-controller.md
    • docs/how-to/deploy-aggregated-apiserver.md
    • docs/how-to/mcp-server.md
    • examples/cloudnativepg/README.md

Implementation plan

1) Ensure generated ClusterRole contains all required permissions (controller + MCP)

  1. Add missing +kubebuilder:rbac markers for MCP to internal/app/mcpapp/server.go (or http.go) so controller-gen includes them in config/rbac/role.yaml.

    MCP tools currently call:

    • clientset.CoreV1().Events(ns).List(...) → needs events: list.
    • clientset.CoreV1().Pods(ns).GetLogs(...).Stream(...) → needs pods/log: get.
    • clientset.CoreV1().Namespaces().List(...) → needs namespaces: list.
    • controller-runtime client operations on:
      • pods (list)
      • deployments (get)
      • services (get)
      • coderworkspaces, codertemplates (get/list/update)

    Suggested marker set (exact verbs can be trimmed later, but start with the union needed for current code paths):

    // +kubebuilder:rbac:groups="",resources=pods,verbs=get;list;watch
    // +kubebuilder:rbac:groups="",resources=pods/log,verbs=get
    // +kubebuilder:rbac:groups="",resources=events,verbs=get;list;watch
    // +kubebuilder:rbac:groups="",resources=namespaces,verbs=get;list;watch
    // +kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch
    // +kubebuilder:rbac:groups="",resources=services,verbs=get;list;watch
    // +kubebuilder:rbac:groups=aggregation.coder.com,resources=codertemplates;coderworkspaces,verbs=get;list;watch;update;patch
  2. Run make manifests to regenerate config/rbac/role.yaml (and ensure it now contains the MCP permissions).

  3. (Optional but recommended) Add a CI/Makefile guardrail:

    • Add make verify-manifests that runs bash ./hack/update-manifests.sh and fails if git diff --exit-code is non-empty.
    • Call it from .github/workflows/ci.yaml so generated RBAC/CRDs can’t drift from code.
Why RBAC markers in MCP code (not deploy YAML)?

We want the permissions list to be generated from actual code usage. MCP currently needed permissions were only encoded in deploy/rbac.yaml, which is how we ended up with drift. Adding markers makes config/rbac/role.yaml the authoritative rule set.

2) Move “apply-time” RBAC objects into config/rbac/

controller-gen only outputs the ClusterRole rules. We still need a ServiceAccount + bindings for in-cluster runs.

  1. Add the following non-generated manifests under config/rbac/ (these are glue; the rules remain generated):

    • serviceaccount.yaml (SA coder-k8s in coder-system)
    • clusterrolebinding.yaml binding SA → ClusterRole manager-role
    • auth-delegator-binding.yaml (ClusterRoleBinding SA → system:auth-delegator) for aggregated apiserver delegation
    • authentication-reader-binding.yaml (RoleBinding in kube-system to extension-apiserver-authentication-reader)

    These resources should match what deploy/rbac.yaml provided without duplicating the ClusterRole rules.

  2. Remove/stop using duplicated RBAC manifests elsewhere:

    • config/e2e/serviceaccount.yaml
    • config/e2e/clusterrole-binding.yaml

    …and update:

    • hack/kind-dev.sh up to stop applying those files (since config/rbac/ will now include them).
    • CI “Deploy controller” step to only apply config/e2e/deployment.yaml (or apply the directory after removing the RBAC files).

3) Retire deploy/rbac.yaml

  1. Delete deploy/rbac.yaml (or replace it with a clear deprecation stub if removal is too disruptive).
  2. Ensure no remaining references via repo search.

4) Update docs + examples to use config/rbac/

Update commands everywhere RBAC is applied:

  • docs/how-to/deploy-controller.md
    • Replace kubectl apply -f deploy/rbac.yamlkubectl apply -f config/rbac/
  • docs/how-to/deploy-aggregated-apiserver.md
    • Same replacement
  • docs/how-to/mcp-server.md
    • Same replacement
  • examples/cloudnativepg/README.md
    • Replace deploy RBAC step accordingly

Also update any narrative text that claims deploy/rbac.yaml is the unified ServiceAccount RBAC.

5) Validation (local)

  1. Regenerate artifacts:
    • make manifests
    • git diff --exit-code (no uncommitted changes after regen)
  2. Kind smoke test using the new RBAC flow:
    • ./hack/kind-dev.sh up
    • kubectl apply -f deploy/deployment.yaml (or the e2e deployment)
    • Confirm operator starts and no longer logs RBAC “forbidden” / informer sync failures.
  3. MCP RBAC sanity:
    • Deploy with --app=mcp-http and confirm tools that touch events/pod logs/namespaces work.
  4. Aggregated apiserver sanity:
    • Deploy APIService + apiserver service and confirm kubectl get coderworkspaces.aggregation.coder.com -A succeeds.

Deliverables

  • deploy/rbac.yaml removed (or deprecated) and not referenced.
  • config/rbac/ is the one-stop directory to apply RBAC for in-cluster deployments.
  • MCP-required permissions are generated via +kubebuilder:rbac markers.
  • Docs/examples updated to reference config/rbac/.
  • (Optional) CI check prevents future generated-manifest drift.

Generated with mux • Model: openai:gpt-5.3-codex • Thinking: xhigh • Cost: $0.93

- add MCP kubebuilder RBAC markers and regenerate manager-role permissions
- move ServiceAccount/bindings into config/rbac and remove duplicated e2e/deploy RBAC files
- update kind/CI/doc flows to apply config/rbac instead of deploy/rbac.yaml

---

_Generated with [`mux`](https://github.com/coder/mux) • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$0.93`_

<!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=0.93 -->
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Please review this RBAC single-source consolidation PR.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Another round soon, please!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ThomasK33
Copy link
Copy Markdown
Member Author

@ThomasK33 ThomasK33 added this pull request to the merge queue Feb 12, 2026
Merged via the queue into main with commit 6adb4be Feb 12, 2026
11 checks passed
@ThomasK33 ThomasK33 deleted the fix-rbac-single-source branch February 12, 2026 11:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant