🤖 feat: add CoderProvisioner CRD and controller for external provisioner daemons#43
Conversation
|
@codex review |
2160e53 to
641c8f8
Compare
|
@codex review |
1 similar comment
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b2c135b1e7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Addressed both P1 issues:
Added test coverage for the control-plane-deleted-first scenario. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 01cd967b11
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Addressed the remaining P1 and P2 issues:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 811a212c52
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Addressed both remaining issues:
The deletion cleanup is now fully best-effort: any prerequisite failure (control plane, URL, credentials) is logged and the finalizer is removed regardless. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f7dc3d9675
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Addressed both issues:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6c4afaf89e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Added |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c0ed9d158d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Fixed both issues:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8cffa8ac73
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
8cffa8a to
635841d
Compare
|
@codex review |
…ecret loss Addresses Codex review feedback: - Deletion finalizer now gracefully handles missing CoderControlPlane or bootstrap credentials (common during namespace teardown). Logs a warning and proceeds to finalizer removal instead of blocking. - When the provisioner key secret is deleted but the key already exists in coderd, the controller now rotates the key (delete + recreate) to obtain fresh plaintext material for secret recovery. - Added test for best-effort deletion when control plane is deleted first.
…ble control plane Addresses additional Codex review feedback: - Finalizer cleanup now uses the key name from status (reflecting what was actually created in coderd) rather than the current spec value, preventing orphaned keys when spec.key.name is edited after creation. - All fetchControlPlane errors during deletion are treated as non-blocking (not just NotFound), handling the case where the control plane exists but has an empty status.url.
635841d to
117a6a7
Compare
|
@codex review |
|
Codex Review: Didn't find any major issues. More of your lovely PRs please. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Summary
Adds a new
CoderProvisionerCRD and controller that deploys external provisioner daemons for a Coder control plane. The controller:CoderControlPlaneCR to discover the coderd URLprovisionerd startwith the correct environmentBackground
Coder supports running provisioner daemons outside of
coderd(the CLI subcommandprovisionerd start). This lets operators scale workspace build/apply concurrency independently of the main Coder control-plane. This feature bridges thecoder-k8soperator with external provisioner management.Implementation
New files
api/v1alpha1/coderprovisioner_types.go— CRD spec/status types with bootstrap, key management, and pod template customization fieldsinternal/coderbootstrap/provisionerkeys.go—EnsureProvisionerKeyandDeleteProvisionerKeymethods using vendoredcodersdkinternal/coderbootstrap/provisionerkeys_test.go— httptest-based unit tests for provisioner key operationsinternal/controller/coderprovisioner_controller.go— full reconciler with finalizer-based deletion, key rotation support, and child resource managementinternal/controller/coderprovisioner_controller_test.go— comprehensive envtest-based controller tests (6 test cases)config/samples/coder_v1alpha1_coderprovisioner.yaml— sample CR manifestModified files
internal/coderbootstrap/client.go— extendedClientinterface with provisioner key methodsinternal/app/controllerapp/controllerapp.go— wired new controller into managerinternal/controller/workspaceproxy_controller_test.go— updated fake bootstrap client for interface complianceGenerated artifacts
api/v1alpha1/zz_generated.deepcopy.go— deepcopy methods for new typesconfig/crd/bases/coder.com_coderprovisioners.yaml— CRD manifestconfig/rbac/role.yaml— RBAC rules for new resourcesdocs/reference/api/coderprovisioner.md— auto-generated API referencemkdocs.yml— nav entry for new API docValidation
make build✅make test✅make test-integration✅make lint✅make verify-vendor✅make docs-reference-check✅Risks
EnsureProvisionerKeyimplementation does not yet support tag-change rotation (existing keys with mismatched tags are not automatically deleted/recreated). This is acceptable for the first iteration since tag changes would require manual key rotation or CR deletion/recreation.SetupWithManageronly watches Deployment/Secret/ServiceAccount (not Role/RoleBinding). Changes to Role/RoleBinding won't trigger reconciliation unless the CR itself is modified.📋 Implementation Plan
Plan: Managed external provisioner daemons (CoderProvisioner)
Context / Why
Coder supports running provisioner daemons outside of
coderd(the CLI subcommandprovisionerd start). This lets us scale workspace build/apply concurrency independently of the main Coder control-plane.Goal for
coder-k8s: add a new CRD + controller reconciler that can:coderd.coderdinstance.Non-goals (first iteration): autoscaling (HPA), multi-namespace workspace RBAC fan-out, observing per-daemon connection health via coderd APIs.
Evidence (what was inspected)
coder/coder (external provisioners)
enterprise/cli/provisionerdaemonstart.go:coder provisionerd startflags/env forCODER_URL,CODER_PROVISIONER_DAEMON_KEY, org, tags.enterprise/coderd/provisionerdaemons.go: coderd WebSocket endpoint/api/v2/organizations/{org}/provisionerdaemons/serve.coderd/httpmw/provisionerdaemon.go: auth headers (Coder-Provisioner-Daemon-Key, PSK, session token).provisionerd/proto/provisionerd.proto: DRPC service (AcquireJobWithCancel,UpdateJob,CompleteJob, …).helm/provisioner/values.yaml,helm/provisioner/templates/_coder.tpl:args: [provisionerd, start], secretKeyRef forCODER_PROVISIONER_DAEMON_KEY.helm/libcoder/templates/_helpers.tpl: RBAC rules (pods + PVCs, optionally deployments).coder-k8s (current patterns)
api/v1alpha1/codercontrolplane_types.go:CoderControlPlaneStatus.URL(in-cluster coderd URL), status pattern (Phase,Conditions).api/v1alpha1/types_shared.go:SecretKeySelectorhelper type.internal/controller/workspaceproxy_controller.go: secret generation pattern (ensureTokenSecret), CreateOrUpdate, owner refs.internal/coderbootstrap/client.go: coderd API abstraction (currentlyEnsureWorkspaceProxy).vendor/github.com/coder/coder/v2/codersdk/provisionerdaemons.gocontains provisioner key methods:CreateProvisionerKey,ListProvisionerKeys,GetProvisionerKey,DeleteProvisionerKey.Implementation details
1) API: add
CoderProvisionerCRD (v1alpha1)Create a new types file:
api/v1alpha1/coderprovisioner_types.goKey design:
CoderControlPlaneby name (same namespace) and deriveCODER_URLfrom its status.Proposed spec/status (shape; exact names can be tuned to match repo conventions):
Notes:
SecretKeySelectortype fromapi/v1alpha1/types_shared.go.coder.com/provisioner-key-cleanup).2) coderd API integration: extend
internal/coderbootstrapAdd provisioner-key operations so controllers don’t embed raw codersdk logic:
Files:
internal/coderbootstrap/client.go(extend interface)internal/coderbootstrap/provisionerkeys.go(new)Add a method that supports rotation when key material is required:
Implementation approach (using vendored
codersdk):client := codersdk.New(req.CoderURL); client.SetSessionToken(req.SessionToken).ListProvisionerKeys(orgID); find byKeyName.CreateProvisionerKey(orgID, {Name: KeyName, Tags: req.Tags})and return plaintext key.KeyMaterialRequired==trueand we don’t want to rely on unknown existing secret state:DeleteProvisionerKey(orgID, KeyName)(or delete+recreate if create fails due to conflict)Key=="".Defensive programming:
GetProvisionerKey(ctx, keyMaterial)and ensure the returnedName/Tags/OrgIDmatch expectations; if not, rotate.3) Controller:
CoderProvisionerReconcilerNew file:
internal/controller/coderprovisioner_controller.goRBAC markers to add (in this new controller file):
Reconcile flow (high level):
flowchart TD A[Fetch CoderProvisioner] --> B{DeletionTimestamp?} B -->|yes| Z[Finalizer: delete coderd provisioner key; remove finalizer] B -->|no| C[Fetch referenced CoderControlPlane] C --> D[Get coderd URL from controlPlane.status.url] D --> E[Read bootstrap session token secret] E --> F[Ensure/Rotate provisioner key via coderbootstrap] F --> G[Upsert K8s Secret with key material] G --> H[Upsert ServiceAccount] H --> I[Upsert Role/RoleBinding if workspacePermissions enabled] I --> J[Upsert Deployment: args provisionerd start; env CODER_URL + CODER_PROVISIONER_DAEMON_KEY] J --> K[Update Status: phase, readyReplicas, key metadata]Concrete resource behavior:
Secret
spec.key.secretNamedefault${crName}-provisioner-key.spec.key.secretKeydefault"key".CoderProvisioner.Deployment
spec.imagedefaultcontrolPlane.spec.image.provisionerd startplusspec.extraArgs.CODER_URL=controlPlane.status.urlCODER_ORGANIZATION=spec.organizationName(if set)CODER_PROVISIONER_DAEMON_KEYfrom SecretKeyRefCODER_PROVISIONER_DAEMON_NAMEfrom downward API (metadata.name) for observabilityterminationGracePeriodSecondsdefault 600.checksum/secretcomputed from key Secret data to force rollout on rotations.ServiceAccount + Role/RoleBinding
pods,persistentvolumeclaims(CRUD+watch)apps/deployments(CRUD+watch)Finalizer behavior:
coderbootstrap.DeleteProvisionerKey(...)(by name) then remove finalizer.4) Wire into manager/app
init()in the types file, but verifyapi/v1alpha1/groupversion_info.goregistration).internal/app/controllerapp/controllerapp.goto:coderbootstrap.SDKClient(already done for WorkspaceProxy)CoderProvisionerReconcilerwith that client.5) Tests
Fast/default tests (no real coderd):
Controller envtest coverage similar to
workspaceproxy_controller_test.go:internal/controller/coderprovisioner_controller_test.goCoderControlPlanewithstatus.urlpopulated.CoderProvisionerCR.coderbootstrap.Clientinto the reconciler to simulate:{Key: "plaintext", KeyID: ...}Unit tests for
coderbootstrap.EnsureProvisionerKey:httptest.Serverthat implements the minimal coderd endpoints used:POST/GET/DELETE /api/v2/organizations/{org_id}/provisionerkeysGET /api/v2/provisionerkeys/{key}for validating key materialOptional end-to-end (real enterprise coderd + real provisionerd join):
//go:build integration) that:github.com/coder/coder/v2/enterprise/coderd/coderdenttestwithAllFeatures: true(enablesexternal_provisioner_daemons).dbtestutil.NewDB()(dockertest-managed Postgres) by default.enterprise/test helpers + deps).CoderProvisionerReconcileragainst that coderd (realcoderbootstrap.SDKClient, not fake).provisionerd.Serverusingcodersdk.Client.ServeProvisionerDaemonwithProvisionerKeyset to the secret value.client.OrganizationProvisionerDaemons/client.ProvisionerDaemons).6) Generated artifacts and samples
make codegen(deepcopy updates).make manifests(CRD + RBAC role updates).make docs-reference(updatesdocs/reference/api/*for the new CRD types).CoderProvisionerand whattagsmean for job routingconfig/samples/coder_v1alpha1_coderprovisioner.yamldemonstrating:bootstrap.credentialsSecretRefcontrolPlaneReftagsandreplicasValidation checklist (when implementing)
make testmake buildmake lintmake verify-vendor(only if deps changed)make codegenmake manifestsmake docs-checkGenerated with
mux• Model:anthropic:claude-opus-4-6• Thinking:xhigh• Cost:$1.78