design: add doc for cluster autoscaling and background reconfiguration#36691
design: add doc for cluster autoscaling and background reconfiguration#36691aljoscha wants to merge 15 commits into
Conversation
…guration Proposes a cluster controller that runs alongside the Coordinator as a reconciler over durable cluster config. Reshapes graceful reconfiguration to run in the background by making the user's target the durable cluster config and removing session-bound intent. Introduces HYDRATION_SIZE for burst replicas during hydration, with the existing ON REFRESH scheduling lifted into the same strategy framework. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…cess hydration Removes the carve-out that excluded HYDRATION_SIZE on storage-only clusters, adds a callout for the storage-side hydration signal, and reframes the hydration consumption pattern around in-process compute-controller state rather than the builtin view, with guidance to avoid the existing graceful-reconfig 1-second polling cadence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
In practice they're the same mechanism for our purposes; the doc shouldn't draw a distinction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The doc as a whole describes the v1, and the individual steps don't make sense as standalone shippable units. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ing changes Drop the peek-routing and hydration-signal items (handled in the design body), and reframe the remaining three items as user-observable behavior changes that are corollaries of the design. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drops the stiff "Corollary" framing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sh combo No appetite to invest further in SCHEDULE syntax; the combination should stay rejected indefinitely rather than be left open as a possible follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ss of RF Burst is transient and N burst replicas to mirror an N-replica steady set multiplies cost for diminishing benefit (tear-down only requires one steady replica to have caught up). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e examples Speculative future strategies (queue-depth, scale-to-zero, time-of-day) are not on the roadmap and shouldn't shape the v1 interface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Keep the foreground (session-bound) path intact during dyncfg-gated rollout, remove it once the background model is fully enabled. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…achinery Preserving the foreground (session-bound) experience during rollout doesn't require keeping the pending flag or the three-stage state machine. The foreground UX is implemented as a thin session-side wait over the background mechanism; deprecating it later is removing the shim, not unwinding code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bullet leads and numbered-list labels stay bold (they act as mini-headers); inline emphasis on terms or phrases within prose becomes italics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This reverts commit 4ff6832. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
|
||
| Two user-facing capabilities motivate this work: | ||
|
|
||
| 1. **Background graceful cluster reconfiguration.** Today, `ALTER CLUSTER ... SET (SIZE = ...)` with the graceful (zero-downtime) strategy requires the SQL session to remain open for the duration of the reconfiguration — the session holds the wait-for-hydration stage. Long-running reconfigurations are fragile: any process or session interruption — a network blip, a client timeout, an SQL tool closing, an `environmentd` restart — aborts the reconfiguration. The user experience we want is: the statement returns immediately, and the reconfiguration continues in the background, surviving restarts and disconnects. |
|
|
||
| ## Out of Scope | ||
|
|
||
| - `HYDRATION_SIZE` combined with `SCHEDULE = ('on-refresh', ...)`. Initial version rejects; see [Open Questions](#open-questions). |
| ## Out of Scope | ||
|
|
||
| - `HYDRATION_SIZE` combined with `SCHEDULE = ('on-refresh', ...)`. Initial version rejects; see [Open Questions](#open-questions). | ||
| - More than one concurrent burst replica per cluster. Initial version supports exactly one; revisit if needed. |
| - A new autoscaling strategy can be added without restructuring the framework. | ||
| - Operators can disable the burst behavior across an environment via a break-glass flag without disabling other autoscaling. | ||
|
|
||
| ## Out of Scope |
There was a problem hiding this comment.
I agree with the calls here
|
|
||
| 1. **Stuck-reconfiguration recovery policy.** When a reconfiguration's pending replicas have not hydrated within the system timeout, do we (a) park the reconfiguration indefinitely with a clear signal in the introspection view for an operator to act, (b) auto-cancel and revert to the prior steady state, or (c) make the policy a dyncfg with one of (a)/(b) as the default? Same question for stuck burst replicas. | ||
|
|
||
| 2. **`HYDRATION_SIZE` + `SCHEDULE = ('on-refresh', ...)` combination.** v1 rejects this combination. Semantically the combination is interesting (every refresh window, burst comes up first to accelerate hydration, steady catches up, then the schedule turns the cluster off), but our strong recommendation is to keep this rejected indefinitely: there is currently no appetite to invest further in the `SCHEDULE` syntax, and supporting the combination would expand its surface area. |
|
|
||
| 2. **`HYDRATION_SIZE` + `SCHEDULE = ('on-refresh', ...)` combination.** v1 rejects this combination. Semantically the combination is interesting (every refresh window, burst comes up first to accelerate hydration, steady catches up, then the schedule turns the cluster off), but our strong recommendation is to keep this rejected indefinitely: there is currently no appetite to invest further in the `SCHEDULE` syntax, and supporting the combination would expand its surface area. | ||
|
|
||
| 3. **Multiple burst replicas (one per steady replica vs. one total).** v1 supports exactly one burst replica per cluster regardless of replication factor. Our strong recommendation is to keep it that way: the burst replica is by design transient, and provisioning N burst replicas to mirror an N-replica steady set would multiply cost for diminishing benefit — burst tear-down only requires one steady replica to have caught up. Revisit only if real-world usage proves the single-burst model insufficient. |
|
|
||
| 7. **Foreground/synchronous graceful reconfiguration retention.** Our strong recommendation is to deprecate the current foreground (session-bound) mechanism in favor of the background model. During rollout, the foreground experience is preserved as a thin session-side wait shim over the background mechanism (see [SQL surface](#sql-surface)) — *not* by retaining the existing parallel state machine. This means the `pending: bool` flag on replicas and the associated three-stage machinery can be removed up front; deprecating the foreground experience later is simply deleting the wait shim. The one behavioral difference vs. today is that session disconnect during the wait no longer aborts the reconfiguration (arguably a feature; the durable target stays set and the controller continues). | ||
|
|
||
| 8. **Hydration burst during graceful reconfiguration.** Should burst kick in while a graceful reconfig is in flight (target size differs from current replicas)? Leaning toward no: the new-size replicas are themselves transient hydration capacity, and stacking burst on top risks confusing billing and behavior. Burst resumes once the reconfig settles. |
There was a problem hiding this comment.
Nope. If a user type ALTER CLUSTER SET SIZE 200cc, that shouldn't trigger a burst. It should trigger a 200cc replica. Once the 200cc replica is hydrated, retire the original replica.
There was a problem hiding this comment.
So the nope is a confirmation of my "leaning towards no", yes? 😅
|
|
||
| - **Burst and reconfiguration-transient replicas appear in billing and metering identically to ordinary replicas.** A user with `HYDRATION_SIZE` set sees additional billing during hydration windows; a user issuing a background `ALTER CLUSTER` sees additional billing during the overlap between the old and new replica sets. | ||
| - **Background `ALTER CLUSTER` returns immediately** after writing the new target to the catalog. The actual replica transition happens asynchronously and is observable via the new introspection view. This matches the existing pattern for other async DDL (e.g., `CREATE INDEX` returns once the catalog entry exists; hydration happens afterwards). | ||
| - **`SHOW CLUSTERS` reports the new (target) size immediately on ALTER**, not the old size. Mid-reconfiguration the durable cluster configuration already reflects the user's intent, so `SHOW CLUSTERS` does too. This is a change from today's behavior, where the old size is reported until the graceful reconfiguration finalizes. |
There was a problem hiding this comment.
Hmm. What I would really like is for SHOW CLUSTERS to tell me whether a reconfiguration is in flight or not, tell me what the current size is, and what the target size is
There was a problem hiding this comment.
Yeah, I can see why you would like that. I think we can do something good there!
|
|
||
| The following behaviors fall out of the design rather than being its headline outcomes. They are user-observable and worth flagging in release notes and user-facing documentation. | ||
|
|
||
| - **Burst and reconfiguration-transient replicas appear in billing and metering identically to ordinary replicas.** A user with `HYDRATION_SIZE` set sees additional billing during hydration windows; a user issuing a background `ALTER CLUSTER` sees additional billing during the overlap between the old and new replica sets. |
There was a problem hiding this comment.
Hmm this makes sense. But is this new behavior? I assumed that this was how it always worked!
There was a problem hiding this comment.
Nah, it is how it worked, but put it in there because the bursting is new. For graceful reconfig it was always like this
There was a problem hiding this comment.
Gotcha gotcha. Ok, no concerns!
Rendered
Resolves SQL-315