🤖 feat: bootstrap coderd operator access from CoderControlPlane#49
Conversation
Implement default operator-access bootstrap for CoderControlPlane by provisioning a system user and API token directly in coderd Postgres, persisting the token in a managed Secret, and surfacing status fields for consumers. Includes controller wiring, reconcile logic and tests, Postgres provisioner + unit tests, API/CRD/docs updates, and vendored postgres driver changes. --- _Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$3.91`_ <!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=3.91 -->
|
@codex review Please review this PR for the operator-access bootstrap implementation. |
Update the OperatorTokenSecretRef field comment to a single line so generated API reference markdown no longer emits an inline <br />, fixing docs-quality markdownlint failures. --- _Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$3.91`_ <!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=3.91 -->
|
@codex review Addressed docs-quality failure by removing inline HTML output in generated API reference docs. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 286437dedd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Address Codex feedback by removing the early return that treated Secret presence as readiness. The reconciler now always invokes the provisioner with any existing Secret token, and the Postgres provisioner validates that token against `api_keys` (hash match + unexpired) before reusing it. If the token is missing, invalid, revoked, or expired, it rotates a replacement and updates the Secret. Also updated controller and provisioner tests to cover passing existing tokens through reconciliation. --- _Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$3.91`_ <!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=3.91 -->
|
@codex review Addressed your P1 by validating existing operator tokens in Postgres before reuse and rotating only when invalid/expired/revoked. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 46c473f26d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Address additional Codex feedback by: - revoking the managed operator token in Postgres and deleting the managed Secret when `spec.operatorAccess.disabled=true`, and - treating malformed managed Secrets (missing/empty `token` key) as recoverable so reconciliation can re-provision a valid token. Also expanded controller/provisioner tests for revoke and malformed-secret recovery behavior. --- _Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$3.91`_ <!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=3.91 -->
|
@codex review Addressed the latest feedback by revoking managed credentials when operator access is disabled and by treating malformed managed Secrets as recoverable for reprovisioning. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b1fcb2d11b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Address follow-up Codex feedback by retaining a pending operator token status ref when disable-time cleanup fails. This ensures revocation is retried on subsequent reconciles even if the managed Secret has already been deleted, preventing orphaned active coderd tokens. Added controller coverage to verify requeue + retry behavior across repeated revoke failures. --- _Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$3.91`_ <!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=3.91 -->
|
@codex review Addressed the retry concern by keeping a pending token ref in status when disable cleanup fails, so revocation is retried on later reconciles even after secret deletion. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 530598a1ff
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Address Codex feedback by restricting disable-time Secret deletion to Secrets owned by the current `CoderControlPlane`. The cleanup path now verifies controller ownership before deleting, while still revoking managed DB tokens and retaining retry behavior on transient revoke failures. Added coverage for unmanaged secret preservation and updated managed cleanup tests to use owner references. --- _Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$3.91`_ <!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=3.91 -->
|
@codex review Addressed this by guarding disable-time secret deletion behind controller ownership checks so unmanaged secrets are preserved. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 110cdfa7c6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Address Codex feedback by making disable-time cleanup attempt DB token revocation whenever Postgres configuration is available, even when no managed secret/status reference is present. This prevents stale `coder-k8s-operator` tokens from surviving cases where prior status refs were cleared or secret names changed. Added controller coverage for disable-time revocation without managed secret/status state. --- _Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$3.91`_ <!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=3.91 -->
|
@codex review Updated disable-time cleanup to attempt revocation whenever Postgres config is available, even without managed secret/status ref. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6f98847b45
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Address Codex feedback by deriving a unique coderd token name per `CoderControlPlane` (namespace/name-hashed suffix) instead of sharing a single global token name. This prevents cross-resource interference when multiple control planes target the same Postgres database. Also added controller coverage validating distinct token names across two control planes and updated revoke assertions accordingly. --- _Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$3.91`_ <!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=3.91 -->
|
@codex review Scoped operator token names per control plane to avoid cross-resource token interference on shared Postgres. |
|
Codex Review: Didn't find any major issues. Swish! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Summary
This PR bootstraps durable coderd API access from
CoderControlPlaneby provisioning a controller-owned system user and token directly in coderd Postgres, then storing the minted token in a managed Kubernetes Secret.Background
We need in-cluster components to authenticate to the coderd instance created by
CoderControlPlanewithout manual setup. The implementation preserves first-user setup UX by creating acoder-k8s-operatorsystem user (is_system=true) and managing token issuance in the backing database.Implementation
spec.operatorAccessfields (disabled,generatedTokenSecretName) and status fields (operatorTokenSecretRef,operatorAccessReady) toCoderControlPlaneAPI.internal/coderbootstrap/PostgresOperatorAccessProvisionerto:coder-k8s-operatoras a system owner user,coder:allscope and return plaintext token.CoderControlPlaneReconcilerand controller app startup.CODER_PG_CONNECTION_URLfromspec.extraEnv(literal orsecretKeyRef), create/update managed token Secret, and update status readiness/ref fields.github.com/lib/pq).Validation
make verify-vendormake testmake buildmake lintRisks
📋 Implementation Plan
Plan: Bootstrap coderd API access from
CoderControlPlaneContext / Why
We want the
CoderControlPlanereconciler to establish durable API access to the coderd instance it deploys, so other in-cluster components (e.g. the aggregated API server in a later PR) can authenticate to coderd and perform CRUD.Key requirements:
coder-k8s-operatorCoder user by default.coder:allfor now.Reality check / constraint (validated against upstream Coder)
Coder does not expose a supported, unauthenticated REST “admin backdoor” that would let the operator create users/tokens after the first user exists.
However, upstream Coder’s first-user check explicitly excludes system users (
is_system=true). That means we can safely provision acoder-k8s-operatorsystem user without disabling the first-user setup UI.So the default bootstrap approach becomes:
CreateFirstUserautomatically (preserve first-user UX).CODER_PG_CONNECTION_URL) and the DB schema is ready, connect to Postgres and upsert:coder-k8s-operatorsystem user with site roleownerorganization-admin)coder:allCoderControlPlane.Evidence (repo + upstream Coder)
internal/controller/codercontrolplane_controller.go: currently reconciles only Deployment/Service and overwrites status each reconcile (we’ll need to preserve operator-access fields).internal/controller/workspaceproxy_controller.go: shows the established pattern of generating/storing a token in a Secret owned by the CR.CODER_PG_CONNECTION_URLis the canonical env var for coderd’s Postgres connection URL (seeexamples/cloudnativepg/codercontrolplane.yamland the bundled Coder docs under.mux/skills/coder-docs/).coder/coder):first-user detection excludes
is_system=trueusers (so system users do not block the setup wizard)API tokens live in
api_keysand are formatted{id}-{secret}withhashed_secret = sha256(secret); long-lived tokens uselogin_type='token', includescopes={'coder:all'}, and require a non-emptyallow_list(default*:*)org membership is stored in
organization_members, and the upstreamcreate-admin-userflow adds admins to all orgsdev scripts (
scripts/develop.sh) bootstrap the human first user viacoder login --first-user-*(API path); system users (e.g.prebuildsincoderd/database/migrations/000308_system_user.up.sql) are inserted directly in Postgres withis_system=true, matching this plan’s approach forcoder-k8s-operator.Proposed design
1) API/CRD additions (
coder.com/v1alpha1)Add an operator-access block to the
CoderControlPlanespec, enabled by default.Bootstrap input: the controller needs the coderd Postgres URL so it can provision the operator user/token directly in the DB.
spec.extraEnventryCODER_PG_CONNECTION_URL(either a literalvalueorvalueFrom.secretKeyRef).Add status fields so later components can discover the token location.
Notes:
WorkspaceProxy(GeneratedTokenSecretName,SecretKeySelector).OperatorAccessReadywith fullmetav1.Conditionusage, but start minimal.2) Operator access provisioning via Postgres (no bootstrap token)
Instead of trying to bootstrap via the coderd REST API, we provision the operator user/token directly in Postgres.
Implement a small provisioner that:
CODER_PG_CONNECTION_URLinspec.extraEnv(supports literalvalueorvalueFrom.secretKeyRef).coder-k8s-operator(users.is_system=true,users.login_type='none',users.rbac_roles={'owner'})organization_members.roles={'organization-admin'})api_keyswith:{id}-{secret}hashed_secret = sha256(secret)login_type='token'scopes={'coder:all'}allow_list(default*:*)Because the user is created with
is_system=true, it does not affect the first-user setup wizard.Implementation notes:
github.com/coder/coder/v2/coderd/apikey(Generate/HashSecret, random string lengths/charset) to avoid drifting from Coder’s constraints.username(user) and a stable token identifier (e.g.token_name='coder-k8s-operator').Suggested interface shape (for wiring + tests):
(Scopes are hardcoded to
coder:allfor now; we’ll scope down later.)DB upsert sketch (for the provisioner)
users.username='coder-k8s-operator'):is_system=true,login_type='none',status='active',rbac_roles={'owner'},hashed_password='none'INSERT ... SELECT id FROM organizations ...intoorganization_memberswith roleorganization-admin(upsert on(organization_id,user_id))api_keys:{id}-{secret}(id len 10, secret len 22,[0-9A-Za-z]), storehashed_secret=sha256(secret)login_type='token',token_name='coder-k8s-operator',scopes={'coder:all'},allow_list={'*:*'},expires_at=now()+lifetime,lifetime_seconds=...3) Controller changes (
CoderControlPlaneReconciler)Update
internal/controller/codercontrolplane_controller.goto:Add RBAC for secrets (needed to read
CODER_PG_CONNECTION_URLwhen it’s a Secret ref, and to write the generated operator token Secret):+kubebuilder:rbac:groups="",resources=secrets,verbs=get;list;watch;create;update;patch;deleteAdd an injectable provisioner (so controller tests don’t need a real Postgres instance):
ObservedGeneration,ReadyReplicas,URL,Phase)reconcileOperatorAccess(...)(DB-based; it can succeed even while the Deployment is Pending)reconcileStatuscurrently does).reconcileOperatorAccessbehavior (DB-based)Pseudo-flow:
Notes:
Create the operator as a system user (
is_system=true) withrbac_roles={'owner'}andlogin_type='none'.Ensure org membership for all orgs (
organization_members) with roleorganization-admin.Token row in
api_keysshould be created with:login_type='token'scopes={'coder:all'}allow_list={'*:*'}(schema constraint requires non-empty)Use stable defaults (unless we later add config fields):
coder-k8s-operatorcoder-k8s-operator@coder-k8s.invalidcoder-k8s-operator4) Wiring (controller app)
Update
internal/app/controllerapp/controllerapp.goto construct and pass a real Postgres-backed provisioner into the control plane reconciler (mirrors howWorkspaceProxyReconcilergets a bootstrap client today):The provisioner should encapsulate:
database/sql){id}-{secret}using upstream helpers (coderd/apikey)users,organization_members, andapi_keys5) Tests
Unit tests (envtest controller tests)
Update/add tests under
internal/controller/(use a fake provisioner so we don’t need a real Postgres instance):TestReconcile_DefaultOperatorAccess_MissingPostgresURLCoderControlPlanewith nooperatorAccessblock (defaults apply) and withoutCODER_PG_CONNECTION_URL.OperatorAccessReady=false; reconcile returnsRequeueAfter.TestReconcile_OperatorAccess_Disabledspec.operatorAccess.disabled: true.TestReconcile_OperatorAccess_ResolvesPostgresURLAndCreatesTokenSecretEnsureOperatorTokenreturnstoken-value.CODER_PG_CONNECTION_URLinspec.extraEnv(cover both literal andsecretKeyRefcases).Implementation detail: add a small
fakeOperatorAccessProvisioner(similar tofakeBootstrapClientin workspace proxy tests) so tests don’t need live coderd/Postgres.6) Generated artifacts + docs
Because the CRD API types change:
make codegen(deepcopy).make manifests(CRDs + RBAC).make docs-reference(updatesdocs/reference/api/codercontrolplane.md).Consider adding a small snippet to an existing “deploy controller” doc or sample YAML documenting the output token Secret:
CODER_PG_CONNECTION_URLis provided viaspec.extraEnv(value orsecretKeyRef).<controlplane-name>-operator-token(keytoken) containing thecoder-k8s-operatorAPI token.Validation (local)
make testmake buildmake lintmake verify-vendor(will include new vendored packages due to Postgres/token helpers)Appendix: why we don’t auto-create the first user
Coder’s first-user experience relies on no users existing. Automatically creating
coder-k8s-operatoras the first user would flip the UI into “login” mode immediately and block the initial setup flow.The chosen approach provisions
coder-k8s-operatoras a system user (is_system=true) and creates acoder:alltoken directly in Postgres. System users are excluded from the first-user check, so the setup wizard remains available.Implementation strategy (agents)
Recommendation: have a single agent implement this end-to-end. This change spans API types, controller logic, DB/token provisioning logic, tests, and generated artifacts (
codegen,manifests,docs-reference). A single implementer avoids merge conflicts and “regen drift”.If you do want to split it across multiple agents, a reasonable breakdown is:
api/v1alpha1/codercontrolplane_types.go, then run/updatemake codegen,make manifests,make docs-reference.internal/coderbootstrap/operator_access_postgres.go(DB upserts + token generation via upstream helpers), with focused unit tests for SQL/idempotency.internal/controller/codercontrolplane_controller.go, resolveCODER_PG_CONNECTION_URLfromspec.extraEnv, manage the output Secret, and preserve status fields.internal/controller/codercontrolplane_controller_test.gousing a fake provisioner.Coordination note: only one agent should be responsible for the final regeneration/validation pass to keep the branch consistent.
Generated with
mux• Model:openai:gpt-5.3-codex• Thinking:xhigh• Cost:$3.91