Skip to content

🤖 fix: back off provisioner reconciliation on coderd 429s#65

Merged
ThomasK33 merged 2 commits into
mainfrom
operator-spzg
Feb 12, 2026
Merged

🤖 fix: back off provisioner reconciliation on coderd 429s#65
ThomasK33 merged 2 commits into
mainfrom
operator-spzg

Conversation

@ThomasK33
Copy link
Copy Markdown
Member

Summary

This PR hardens CoderProvisioner reconciliation against coderd API rate limiting by adding explicit 429 backoff behavior, plus optional rate-limit bypass with fallback in bootstrap client calls. It also includes the existing flake.nix dev-tooling update present in this workspace.

Background

Provisioner reconciliation was entering tight error loops after coderd returned HTTP 429 (Too Many Requests), which amplified API pressure and prevented stable reconciliation. The operator needed controller-level pacing and safer bootstrap API behavior when bypass headers are unavailable.

Implementation

  • Added explicit per-resource jittered exponential backoff for coderd 429s in CoderProvisionerReconciler:
    • base 2s, cap 2m, floor 1s, jitter ratio 0.2
    • converts 429 failures into RequeueAfter instead of immediate error retries
    • sets ProvisionerKeyReady=False with Reason=RateLimited
    • resets per-resource backoff after non-rate-limited outcomes
  • Added bootstrap SDK helpers:
    • withOptionalRateLimitBypass(...)
    • bypassRateLimitRoundTripper to inject X-Coder-Bypass-Ratelimit: true when requested
    • automatic retry without bypass when server rejects bypass (412 Precondition Required)
    • exported IsRateLimitError(err) helper for controller logic
  • Applied bypass/fallback flow to:
    • workspace proxy create/patch operations
    • provisioner key create/query/delete paths
  • Added tests:
    • controller backoff + RateLimited condition coverage
    • workspace proxy bypass fallback coverage
    • provisioner key ensure/delete bypass fallback coverage
  • Included existing workspace change in flake.nix (yazi added to dev shell packages)

Validation

  • make test
  • make build
  • make lint
  • make verify-vendor

Risks

  • Low-to-medium: reconcile timing changes for 429 paths only.
  • Main risk is slower convergence during sustained coderd throttling; this is intentional to prevent hot-loop amplification and API overload.
  • Scope is limited to bootstrap calls and provisioner reconciliation retry behavior.

Generated with mux • Model: openai:gpt-5.3-codex • Thinking: xhigh • Cost: $1.46

- add explicit jittered exponential requeue for CoderProvisioner rate-limit responses
- add optional X-Coder-Bypass-Ratelimit handling with automatic fallback when rejected
- extend controller and bootstrap tests for backoff + bypass fallback behavior
- include existing flake.nix tooling update present in workspace

---

_Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$1.46`_

<!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=1.46 -->
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8ffc4f20e6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/controller/coderprovisioner_controller.go Outdated
- avoid unconditional status writes in 429 defer path
- only persist condition changes when status actually changed
- keep a stable RateLimited condition message so self-updates do not trigger rapid reconciles
- update rate-limit test assertion accordingly

---

_Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$1.46`_

<!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=1.46 -->
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Keep them coming!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ThomasK33 ThomasK33 added this pull request to the merge queue Feb 12, 2026
Merged via the queue into main with commit d68b694 Feb 12, 2026
8 checks passed
@ThomasK33 ThomasK33 deleted the operator-spzg branch February 12, 2026 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant