Skip to content

🤖 feat: reconcile CoderControlPlane licenses from Secret#66

Merged
ThomasK33 merged 6 commits into
mainfrom
control-plane-qcym
Feb 12, 2026
Merged

🤖 feat: reconcile CoderControlPlane licenses from Secret#66
ThomasK33 merged 6 commits into
mainfrom
control-plane-qcym

Conversation

@ThomasK33
Copy link
Copy Markdown
Member

Summary

Add automatic Coder Enterprise license management to CoderControlPlane.

Background

Previously, the operator had no API surface for license configuration and no reconcile logic to upload licenses after control plane bootstrap.
This change allows operators to point at a Secret and have the controller apply licenses once the control plane and operator access are ready, including rotation behavior.

Implementation

  • Added spec.licenseSecretRef to CoderControlPlane.
  • Added status fields licenseLastApplied and licenseLastAppliedHash.
  • Added default shared constant DefaultLicenseSecretKey = "license".
  • Added LicenseApplied condition type and condition updates in reconcile paths.
  • Implemented controller-side license reconciliation with:
    • readiness/operator-access preconditions,
    • Secret read + trim + SHA-256 hash idempotency,
    • upload via SDK-backed LicenseUploader,
    • 404 (NotSupported) and auth/error handling.
  • Added field index + Secret watch for non-owned licenseSecretRef Secrets.
  • Wired production uploader in controller app setup.
  • Added controller tests for:
    • no-ref behavior,
    • pending-before-ready,
    • first apply + idempotency,
    • Secret rotation,
    • 404/not-supported behavior.
  • Regenerated deepcopy, CRD, and API reference docs; updated sample manifest.

Validation

  • make verify-vendor
  • make test
  • make build
  • make lint
  • make codegen
  • make manifests
  • make docs-reference

Risks

  • Moderate: touches reconciliation/status/watches in CoderControlPlane.
  • Mitigated by focused unit/envtest coverage for apply/idempotency/rotation/error handling and by preserving existing operator access behavior.

📋 Implementation Plan

Plan: CoderControlPlane license Secret reference + automatic license application

Context / Why

We want the coder-k8s operator to manage Coder Enterprise licensing automatically:

  • Add spec.licenseSecretRef to coder.com/v1alpha1.CoderControlPlane to reference a Secret key containing a Coder license JWT.
  • Once the control plane is actually up (pods ready) and the operator has bootstrap API access, the controller should call the Coder API to upload/apply that license.
  • If the Secret value changes, the controller should apply the new license (license rotation).
  • Replace the previous boolean LicenseApplied idea with an optional timestamp: status.licenseLastApplied.

This aligns with existing patterns in the repo (SecretKeySelector usage, operator-managed API token creation, reconcile-with-requeue behavior), and avoids requiring users to manually run coder licenses add after every deployment/rotation.

Evidence (what we verified)

  • CoderControlPlane type today has no license field; it already uses SecretKeySelector in status for the operator token secret:
    • api/v1alpha1/codercontrolplane_types.go
    • api/v1alpha1/types_shared.go
  • Control plane readiness is currently determined by deployment.Status.ReadyReplicas > 0, stored in status.phase (PendingReady). The controller also computes an in-cluster URL:
    • internal/controller/codercontrolplane_controller.go (desiredStatus sets URL = http://<svc>.<ns>.svc.cluster.local:<port>)
  • The controller already owns/watches Secrets (for the operator token secret) and has a readSecretValue() helper with strong assertions.
  • The operator already vendors and uses github.com/coder/coder/v2/codersdk via internal/coderbootstrap.
  • Coder license upload is an enterprise-only API endpoint:
    • POST /api/v2/licenses with JSON body { "license": "<jwt>" }
    • Auth header accepted by Coder: Coder-Session-Token: <token> (Bearer token also supported)
    • Success status: 201 Created
    • Uploading the same license twice is not idempotent: DB has a unique constraint on licenses.jwt, and the server returns 500 on duplicate insert.
    • In OSS builds, the /licenses routes are not registered → 404 Not Found.

Design decisions

API surface

Add:

  • spec.licenseSecretRef (optional) — reference to Secret name + key.
  • status.licenseLastApplied (optional metav1.Time) — when the operator last successfully uploaded the currently-observed license.

Additionally (needed for correctness / idempotency):

  • status.licenseLastAppliedHash (optional string) — SHA-256 hex of the trimmed license JWT that was last successfully applied.

Rationale: Coder rejects duplicate uploads (500). Without persisting a stable identity (hash), the controller can’t safely be re-entrant and would spam POSTs on every reconcile.

Preconditions for applying a license

Only attempt license upload when:

  1. spec.licenseSecretRef != nil.
  2. status.phase == Ready (deployment has ≥1 ready replica).
  3. status.operatorAccessReady == true AND status.operatorTokenSecretRef != nil.

If operator access is disabled or not yet ready, we should not attempt license application (no credentials).

Rotation semantics

  • On each reconcile (and on referenced Secret changes), read the Secret value, compute hash, compare to status.licenseLastAppliedHash.
  • If hashes differ, call the license upload endpoint.
  • On success, set status.licenseLastApplied = now and update status.licenseLastAppliedHash.

Watching the referenced Secret

The controller currently only watches Secrets it owns (Owns(&corev1.Secret{})). User-provided license Secrets are not owned, so we must add an explicit watch for Secrets referenced by spec.licenseSecretRef.

Use a field index + watch mapping so we only enqueue CoderControlPlane reconciles for Secrets actually referenced by control planes.

Enterprise-only behavior

If the license API returns 404:

  • Treat as “not supported” (likely OSS image).
  • Set a Condition (see below) to False with reason NotSupported.
  • Do not requeue aggressively (avoid infinite loops).

Status Conditions (recommended)

CoderControlPlaneStatus already has conditions []metav1.Condition but it’s not currently populated. Introduce a single condition type for license:

  • type: LicenseApplied
  • status: True|False|Unknown
  • reasons: Applied, Pending, SecretMissing, Forbidden, NotSupported, Error

Keep messages stable to avoid noisy status updates.

Alternatives considered (kept short)
  • Always POST on every reconcile: rejected because duplicate uploads return 500 due to unique constraint.
  • Store the hash in an annotation instead of status: workable, but status is the more idiomatic place for “observed applied license identity”; also avoids having to Update both metadata + status.
  • Decode JWT and compare UUID claim vs GET /licenses response: adds JWT parsing/validation complexity; storing a SHA-256 hash is simpler and avoids relying on claim structure.

Implementation details (concrete edits)

1) API / CRD changes

Files:

  • api/v1alpha1/codercontrolplane_types.go
  • api/v1alpha1/types_shared.go
  • generated: api/v1alpha1/zz_generated.deepcopy.go
  • generated: config/crd/bases/coder.com_codercontrolplanes.yaml
  • generated docs: docs/reference/api/codercontrolplane.md
  • sample: config/samples/coder_v1alpha1_codercontrolplane.yaml

a) Add spec field

// CoderControlPlaneSpec defines the desired state of a CoderControlPlane.
type CoderControlPlaneSpec struct {
    ...

    // LicenseSecretRef references a Secret key containing a Coder Enterprise
    // license JWT. When set, the controller uploads the license to the Coder
    // API after the control plane is ready, and uploads a new license if the
    // referenced Secret value changes.
    // +optional
    LicenseSecretRef *SecretKeySelector `json:"licenseSecretRef,omitempty"`
}

b) Add status fields

// CoderControlPlaneStatus defines the observed state of a CoderControlPlane.
type CoderControlPlaneStatus struct {
    ...

    // LicenseLastApplied is the timestamp of the most recent successful
    // license upload performed by the operator. Nil means no license has been
    // applied by the operator.
    // +optional
    LicenseLastApplied *metav1.Time `json:"licenseLastApplied,omitempty"`

    // LicenseLastAppliedHash is the SHA-256 hex hash of the trimmed license JWT
    // that LicenseLastApplied refers to. This prevents duplicate uploads.
    // +optional
    LicenseLastAppliedHash string `json:"licenseLastAppliedHash,omitempty"`
}

c) Add a default key constant
In api/v1alpha1/types_shared.go (or a new shared constants file):

const DefaultLicenseSecretKey = "license"

Controller will treat empty licenseSecretRef.key as DefaultLicenseSecretKey.

d) Regenerate generated artifacts

  • make codegen
  • make manifests
  • make docs-reference

2) Controller changes

File: internal/controller/codercontrolplane_controller.go

a) Reconciler fields (for testability)
Add an interface that can be faked in tests:

type LicenseUploader interface {
    AddLicense(ctx context.Context, coderURL, sessionToken, licenseJWT string) error
}

// Production implementation uses codersdk.
type sdkLicenseUploader struct{}

Add to reconciler:

		client.Client
		Scheme *runtime.Scheme

		OperatorAccessProvisioner coderbootstrap.OperatorAccessProvisioner
		LicenseUploader           LicenseUploader // optional; if nil, controller skips license reconciliation
	}

Wire it in internal/app/controllerapp/controllerapp.go:

reconciler := &controller.CoderControlPlaneReconciler{
    Client: client,
    Scheme: managerScheme,
    OperatorAccessProvisioner: coderbootstrap.NewPostgresOperatorAccessProvisioner(),
    LicenseUploader:           controller.NewSDKLicenseUploader(),
}

b) Reconcile flow changes
After reconcileOperatorAccess and before reconcileStatus, call reconcileLicense:

operatorResult, err := r.reconcileOperatorAccess(...)
...
licenseResult, err := r.reconcileLicense(ctx, coderControlPlane, &nextStatus)
...
if err := r.reconcileStatus(...); err != nil { ... }

return mergeResults(operatorResult, licenseResult), nil

Where mergeResults chooses a non-zero requeue request deterministically (e.g., prefer the shorter RequeueAfter if both set).

c) Implement reconcileLicense
Shape:

func (r *CoderControlPlaneReconciler) reconcileLicense(
    ctx context.Context,
    cp *coderv1alpha1.CoderControlPlane,
    nextStatus *coderv1alpha1.CoderControlPlaneStatus,
) (ctrl.Result, error)

Logic:

  1. Defensive nil checks + validate inputs.
  2. If cp.Spec.LicenseSecretRef == nil: clear/leave license condition as Unknown; return.
  3. If nextStatus.Phase != Ready: set condition False (Pending); return.
  4. If !nextStatus.OperatorAccessReady || nextStatus.OperatorTokenSecretRef == nil: set condition False (Pending); return.
  5. Read operator token via readSecretValue(namespace, nextStatus.OperatorTokenSecretRef.Name, key).
  6. Read license JWT via readSecretValue(namespace, cp.Spec.LicenseSecretRef.Name, resolvedKey).
    • licenseJWT = strings.TrimSpace(licenseJWT); error if empty.
  7. Compute hash := sha256hex(licenseJWT).
  8. If hash == nextStatus.LicenseLastAppliedHash and nextStatus.LicenseLastApplied != nil: consider applied; set condition True; return.
  9. Call uploader:
    • err := r.LicenseUploader.AddLicense(ctx, nextStatus.URL, token, licenseJWT)
    • If 404: condition False reason NotSupported; return without aggressive requeue.
    • If 401/403: condition False reason Forbidden; requeue after operatorAccessRetryInterval.
    • Other errors: condition False reason Error; requeue after operatorAccessRetryInterval.
  10. On success: set
    • now := metav1.Now(); nextStatus.LicenseLastApplied = &now
    • nextStatus.LicenseLastAppliedHash = hash
    • condition True reason Applied

d) Implement SDK uploader
Use the same HTTP client setup pattern as internal/coderbootstrap/SDKClient (dedicated transport clone, timeout).
Pseudo-shape:

func (u *sdkLicenseUploader) AddLicense(ctx context.Context, coderURL, sessionToken, licenseJWT string) error {
    parsed, err := url.Parse(coderURL)
    ...
    c := codersdk.New(parsed)
    c.SetSessionToken(sessionToken)
    c.HTTPClient = &http.Client{Timeout: 30 * time.Second, Transport: http.DefaultTransport.(*http.Transport).Clone()}
    _, err = c.AddLicense(ctx, codersdk.AddLicenseRequest{License: licenseJWT})
    return err
}

e) Watch referenced license Secrets
In SetupWithManager:

  1. Add field indexer:
    • key: .spec.licenseSecretRef.name
  2. Add a watch on corev1.Secret events that maps to CoderControlPlane requests via that index.

Sketch:

const licenseSecretNameIndex = ".spec.licenseSecretRef.name"

if err := mgr.GetFieldIndexer().IndexField(ctx, &coderv1alpha1.CoderControlPlane{}, licenseSecretNameIndex, func(obj client.Object) []string {
    cp := obj.(*coderv1alpha1.CoderControlPlane)
    if cp.Spec.LicenseSecretRef == nil { return nil }
    name := strings.TrimSpace(cp.Spec.LicenseSecretRef.Name)
    if name == "" { return nil }
    return []string{name}
}); err != nil { ... }

builder.Watches(
    &corev1.Secret{},
    handler.EnqueueRequestsFromMapFunc(func(ctx context.Context, obj client.Object) []reconcile.Request {
        secret := obj.(*corev1.Secret)
        var list coderv1alpha1.CoderControlPlaneList
        if err := r.List(ctx, &list, client.InNamespace(secret.Namespace), client.MatchingFields{licenseSecretNameIndex: secret.Name}); err != nil {
            return nil
        }
        ...
    }),
)

3) Tests

File: internal/controller/codercontrolplane_controller_test.go

Add a fakeLicenseUploader similar to fakeOperatorAccessProvisioner.

Test cases (table-driven preferred):

  1. No ref → no action: LicenseSecretRef=nil results in no uploader calls.
  2. Pending phase → no action: with ref but deployment.Status.ReadyReplicas=0, uploader not called.
  3. Ready + operator access ready → applies once:
    • create cp with ExtraEnv containing CODER_PG_CONNECTION_URL value
    • fake operator access provisioner returns token
    • create license Secret with key license and some value
    • manually set deployment status readyReplicas to 1
    • reconcile; assert uploader called once
    • fetch cp; assert status.licenseLastApplied != nil and status.licenseLastAppliedHash != ""
  4. Idempotent on re-reconcile: second reconcile with same Secret value does not call uploader again.
  5. Rotation: update Secret data to a new value; reconcile; uploader called again; hash updates.
  6. OSS / 404 handling: uploader returns a *codersdk.Error with status 404; controller sets condition reason NotSupported and does not tight-loop.

4) Documentation + samples

  • Update config/samples/coder_v1alpha1_codercontrolplane.yaml to include an example:
    spec:
      licenseSecretRef:
        name: coder-license
        key: license
  • Run make docs-reference so docs/reference/api/codercontrolplane.md includes the new fields.

Validation / completion checklist

Run (in this repo root):

  • make codegen
  • make manifests
  • make docs-reference
  • make test
  • make build
  • make lint

Definition of done:

  • New CRD fields are present and documented.
  • Operator uploads the license exactly once per distinct Secret value after control plane readiness.
  • Secret rotation triggers a new upload (no duplicates / no 500 loops).
  • 404 (OSS) is handled gracefully.
  • Unit/integration tests cover success + rotation + idempotency.

Generated with mux • Model: openai:gpt-5.3-codex • Thinking: xhigh • Cost: $1.21

@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Please review this change set for CoderControlPlane license Secret reconciliation,
status tracking, and Secret-watch driven rotation behavior.

@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Rebased on latest main, resolved conflicts, and force-pushed.
Please re-review the updated diff.

Add CoderControlPlane license Secret support and operator-managed license
upload reconciliation with readiness/operator-access preconditions,
status tracking (`licenseLastApplied`, `licenseLastAppliedHash`),
idempotency hashing, Secret rotation handling, and license status conditions.

Also wire a codersdk-backed uploader, add referenced Secret field indexing
and watches, update controller app wiring, add focused controller tests, and
regenerate CRD/deepcopy/docs/sample artifacts. Includes docs reference
regeneration to align with CI output formatting.

---

_Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$1.21`_

<!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=1.21 -->
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Applied CI alignment updates and force-pushed latest commit history.
Please review the updated PR.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6f550f4acd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/controller/codercontrolplane_controller.go
Retry CoderControlPlane status updates with `retry.RetryOnConflict` and a
fresh read of the latest object before update. This ensures transient
resourceVersion conflicts do not drop status updates after a successful license
upload, which could otherwise trigger duplicate non-idempotent license posts.

---

_Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$1.21`_

<!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=1.21 -->
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Addressed the status-update conflict concern by adding RetryOnConflict around
reconcileStatus with a fresh read before status update.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b5ba757697

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/controller/codercontrolplane_controller.go Outdated
Comment thread internal/controller/codercontrolplane_controller.go
Address Codex review findings by:
- verifying coderd still reports at least one license before idempotent
  hash-based skip, so the controller re-uploads when backend license state
  is reset while Secret content is unchanged,
- reading through `APIReader` during `RetryOnConflict` status updates so
  conflict retries are based on uncached latest resource versions.

Also adds coverage for re-upload behavior when backend licenses are absent.

---

_Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$1.21`_

<!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=1.21 -->
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Addressed both open findings:

  • backend reset detection now re-uploads when no licenses are present,
  • status conflict retries now read latest state through APIReader.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7fa58d9935

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/controller/codercontrolplane_controller.go Outdated
When retrying CoderControlPlane status updates, skip writes if the latest
resource generation no longer matches `nextStatus.ObservedGeneration`.
This prevents older reconciles from overwriting fresher status computed by
newer reconciles while still retaining conflict retry behavior.

---

_Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$1.21`_

<!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=1.21 -->
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Added a stale-generation guard in status conflict retries to avoid older
reconciles overwriting newer status.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9184ed38af

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/controller/codercontrolplane_controller.go
Use the reconcile-start status snapshot as a delta base and only apply fields
that this reconcile actually changed when retrying status writes on conflicts.
This prevents same-generation stale reconciles from clearing fresher license
status updates written by newer reconciles.

---

_Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$1.21`_

<!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=1.21 -->
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Updated status conflict retries to apply only reconcile-intended status deltas
(from the reconcile-start snapshot), preventing same-generation stale
reconciles from clearing fresher status updates.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b09da13fce

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/controller/codercontrolplane_controller.go
Treat duplicate-license upload responses as converged success and persist
`licenseLastApplied*`/Applied condition, so rollback rotations (A→B→A)
do not loop on non-idempotent backend errors.

Also adds a regression test for rollback duplicate-upload convergence.

---

_Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$1.21`_

<!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=1.21 -->
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Handled rollback duplicate uploads (A→B→A) as converged success by detecting
already-uploaded license errors and persisting Applied state, plus added a
regression test.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 👍

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ThomasK33 ThomasK33 added this pull request to the merge queue Feb 12, 2026
Merged via the queue into main with commit 450fb4e Feb 12, 2026
11 checks passed
@ThomasK33 ThomasK33 deleted the control-plane-qcym branch February 12, 2026 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant