design: add doc for cluster autoscaling and background reconfiguration by aljoscha · Pull Request #36691 · MaterializeInc/materialize

aljoscha · 2026-05-22T18:07:30Z

Resolves SQL-315

…guration Proposes a cluster controller that runs alongside the Coordinator as a reconciler over durable cluster config. Reshapes graceful reconfiguration to run in the background by making the user's target the durable cluster config and removing session-bound intent. Introduces HYDRATION_SIZE for burst replicas during hydration, with the existing ON REFRESH scheduling lifted into the same strategy framework. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…cess hydration Removes the carve-out that excluded HYDRATION_SIZE on storage-only clusters, adds a callout for the storage-side hydration signal, and reframes the hydration consumption pattern around in-process compute-controller state rather than the builtin view, with guidance to avoid the existing graceful-reconfig 1-second polling cadence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

In practice they're the same mechanism for our purposes; the doc shouldn't draw a distinction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The doc as a whole describes the v1, and the individual steps don't make sense as standalone shippable units. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ing changes Drop the peek-routing and hydration-signal items (handled in the design body), and reframe the remaining three items as user-observable behavior changes that are corollaries of the design. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Drops the stiff "Corollary" framing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…sh combo No appetite to invest further in SCHEDULE syntax; the combination should stay rejected indefinitely rather than be left open as a possible follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ss of RF Burst is transient and N burst replicas to mirror an N-replica steady set multiplies cost for diminishing benefit (tear-down only requires one steady replica to have caught up). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…e examples Speculative future strategies (queue-depth, scale-to-zero, time-of-day) are not on the roadmap and shouldn't shape the v1 interface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Keep the foreground (session-bound) path intact during dyncfg-gated rollout, remove it once the background model is fully enabled. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…achinery Preserving the foreground (session-bound) experience during rollout doesn't require keeping the pending flag or the three-stage state machine. The foreground UX is implemented as a thin session-side wait over the background mechanism; deprecating it later is removing the shim, not unwinding code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Bullet leads and numbered-list labels stay bold (they act as mini-headers); inline emphasis on terms or phrases within prose becomes italics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This reverts commit 4ff6832. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

maheshwarip · 2026-05-22T18:13:27Z

+
+Two user-facing capabilities motivate this work:
+
+1. **Background graceful cluster reconfiguration.** Today, `ALTER CLUSTER ... SET (SIZE = ...)` with the graceful (zero-downtime) strategy requires the SQL session to remain open for the duration of the reconfiguration — the session holds the wait-for-hydration stage. Long-running reconfigurations are fragile: any process or session interruption — a network blip, a client timeout, an SQL tool closing, an `environmentd` restart — aborts the reconfiguration. The user experience we want is: the statement returns immediately, and the reconfiguration continues in the background, surviving restarts and disconnects.


maheshwarip · 2026-05-22T18:13:52Z

+
+## Out of Scope
+
+- `HYDRATION_SIZE` combined with `SCHEDULE = ('on-refresh', ...)`. Initial version rejects; see [Open Questions](#open-questions).


maheshwarip · 2026-05-22T18:13:57Z

+## Out of Scope
+
+- `HYDRATION_SIZE` combined with `SCHEDULE = ('on-refresh', ...)`. Initial version rejects; see [Open Questions](#open-questions).
+- More than one concurrent burst replica per cluster. Initial version supports exactly one; revisit if needed.


also agreed

maheshwarip · 2026-05-22T18:14:21Z

+- A new autoscaling strategy can be added without restructuring the framework.
+- Operators can disable the burst behavior across an environment via a break-glass flag without disabling other autoscaling.
+
+## Out of Scope


I agree with the calls here

maheshwarip · 2026-05-22T18:17:49Z

+
+1. **Stuck-reconfiguration recovery policy.** When a reconfiguration's pending replicas have not hydrated within the system timeout, do we (a) park the reconfiguration indefinitely with a clear signal in the introspection view for an operator to act, (b) auto-cancel and revert to the prior steady state, or (c) make the policy a dyncfg with one of (a)/(b) as the default? Same question for stuck burst replicas.
+
+2. **`HYDRATION_SIZE` + `SCHEDULE = ('on-refresh', ...)` combination.** v1 rejects this combination. Semantically the combination is interesting (every refresh window, burst comes up first to accelerate hydration, steady catches up, then the schedule turns the cluster off), but our strong recommendation is to keep this rejected indefinitely: there is currently no appetite to invest further in the `SCHEDULE` syntax, and supporting the combination would expand its surface area.


maheshwarip · 2026-05-22T18:17:59Z

+
+2. **`HYDRATION_SIZE` + `SCHEDULE = ('on-refresh', ...)` combination.** v1 rejects this combination. Semantically the combination is interesting (every refresh window, burst comes up first to accelerate hydration, steady catches up, then the schedule turns the cluster off), but our strong recommendation is to keep this rejected indefinitely: there is currently no appetite to invest further in the `SCHEDULE` syntax, and supporting the combination would expand its surface area.
+
+3. **Multiple burst replicas (one per steady replica vs. one total).** v1 supports exactly one burst replica per cluster regardless of replication factor. Our strong recommendation is to keep it that way: the burst replica is by design transient, and provisioning N burst replicas to mirror an N-replica steady set would multiply cost for diminishing benefit — burst tear-down only requires one steady replica to have caught up. Revisit only if real-world usage proves the single-burst model insufficient.


maheshwarip · 2026-05-22T18:19:16Z

+
+7. **Foreground/synchronous graceful reconfiguration retention.** Our strong recommendation is to deprecate the current foreground (session-bound) mechanism in favor of the background model. During rollout, the foreground experience is preserved as a thin session-side wait shim over the background mechanism (see [SQL surface](#sql-surface)) — *not* by retaining the existing parallel state machine. This means the `pending: bool` flag on replicas and the associated three-stage machinery can be removed up front; deprecating the foreground experience later is simply deleting the wait shim. The one behavioral difference vs. today is that session disconnect during the wait no longer aborts the reconfiguration (arguably a feature; the durable target stays set and the controller continues).
+
+8. **Hydration burst during graceful reconfiguration.** Should burst kick in while a graceful reconfig is in flight (target size differs from current replicas)? Leaning toward no: the new-size replicas are themselves transient hydration capacity, and stacking burst on top risks confusing billing and behavior. Burst resumes once the reconfig settles.


Nope. If a user type ALTER CLUSTER SET SIZE 200cc, that shouldn't trigger a burst. It should trigger a 200cc replica. Once the 200cc replica is hydrated, retire the original replica.

So the nope is a confirmation of my "leaning towards no", yes? 😅

maheshwarip · 2026-05-22T18:20:43Z

+
+- **Burst and reconfiguration-transient replicas appear in billing and metering identically to ordinary replicas.** A user with `HYDRATION_SIZE` set sees additional billing during hydration windows; a user issuing a background `ALTER CLUSTER` sees additional billing during the overlap between the old and new replica sets.
+- **Background `ALTER CLUSTER` returns immediately** after writing the new target to the catalog. The actual replica transition happens asynchronously and is observable via the new introspection view. This matches the existing pattern for other async DDL (e.g., `CREATE INDEX` returns once the catalog entry exists; hydration happens afterwards).
+- **`SHOW CLUSTERS` reports the new (target) size immediately on ALTER**, not the old size. Mid-reconfiguration the durable cluster configuration already reflects the user's intent, so `SHOW CLUSTERS` does too. This is a change from today's behavior, where the old size is reported until the graceful reconfiguration finalizes.


Hmm. What I would really like is for SHOW CLUSTERS to tell me whether a reconfiguration is in flight or not, tell me what the current size is, and what the target size is

Yeah, I can see why you would like that. I think we can do something good there!

maheshwarip · 2026-05-22T18:20:58Z

+
+The following behaviors fall out of the design rather than being its headline outcomes. They are user-observable and worth flagging in release notes and user-facing documentation.
+
+- **Burst and reconfiguration-transient replicas appear in billing and metering identically to ordinary replicas.** A user with `HYDRATION_SIZE` set sees additional billing during hydration windows; a user issuing a background `ALTER CLUSTER` sees additional billing during the overlap between the old and new replica sets.


Hmm this makes sense. But is this new behavior? I assumed that this was how it always worked!

Maybe I was wrong!

Nah, it is how it worked, but put it in there because the bursting is new. For graceful reconfig it was always like this

Gotcha gotcha. Ok, no concerns!

aljoscha and others added 15 commits May 22, 2026 12:46

design: consolidate LaunchDarkly references into dyncfg

e0a5c67

In practice they're the same mechanism for our purposes; the doc shouldn't draw a distinction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

design: drop MVP section from cluster autoscaling doc

f0d83b0

The doc as a whole describes the v1, and the individual steps don't make sense as standalone shippable units. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

design: rename section to "Notable user-facing changes"

33c9895

Drops the stiff "Corollary" framing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

design: recommend deprecating foreground reconfiguration after rollout

3ff98e9

Keep the foreground (session-bound) path intact during dyncfg-gated rollout, remove it once the background model is fully enabled. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

design: convert inline bold emphasis to italics

4ff6832

Bullet leads and numbered-list labels stay bold (they act as mini-headers); inline emphasis on terms or phrases within prose becomes italics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Revert "design: convert inline bold emphasis to italics"

011c8a1

This reverts commit 4ff6832. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

small fixups

fc387c1

fixups

1e2fd58

maheshwarip reviewed May 22, 2026

View reviewed changes


		Two user-facing capabilities motivate this work:

		1. Background graceful cluster reconfiguration. Today, `ALTER CLUSTER ... SET (SIZE = ...)` with the graceful (zero-downtime) strategy requires the SQL session to remain open for the duration of the reconfiguration — the session holds the wait-for-hydration stage. Long-running reconfigurations are fragile: any process or session interruption — a network blip, a client timeout, an SQL tool closing, an `environmentd` restart — aborts the reconfiguration. The user experience we want is: the statement returns immediately, and the reconfiguration continues in the background, surviving restarts and disconnects.


		## Out of Scope

		- `HYDRATION_SIZE` combined with `SCHEDULE = ('on-refresh', ...)`. Initial version rejects; see [Open Questions](#open-questions).


		1. Stuck-reconfiguration recovery policy. When a reconfiguration's pending replicas have not hydrated within the system timeout, do we (a) park the reconfiguration indefinitely with a clear signal in the introspection view for an operator to act, (b) auto-cancel and revert to the prior steady state, or (c) make the policy a dyncfg with one of (a)/(b) as the default? Same question for stuck burst replicas.

		2. `HYDRATION_SIZE` + `SCHEDULE = ('on-refresh', ...)` combination. v1 rejects this combination. Semantically the combination is interesting (every refresh window, burst comes up first to accelerate hydration, steady catches up, then the schedule turns the cluster off), but our strong recommendation is to keep this rejected indefinitely: there is currently no appetite to invest further in the `SCHEDULE` syntax, and supporting the combination would expand its surface area.


		2. `HYDRATION_SIZE` + `SCHEDULE = ('on-refresh', ...)` combination. v1 rejects this combination. Semantically the combination is interesting (every refresh window, burst comes up first to accelerate hydration, steady catches up, then the schedule turns the cluster off), but our strong recommendation is to keep this rejected indefinitely: there is currently no appetite to invest further in the `SCHEDULE` syntax, and supporting the combination would expand its surface area.

		3. Multiple burst replicas (one per steady replica vs. one total). v1 supports exactly one burst replica per cluster regardless of replication factor. Our strong recommendation is to keep it that way: the burst replica is by design transient, and provisioning N burst replicas to mirror an N-replica steady set would multiply cost for diminishing benefit — burst tear-down only requires one steady replica to have caught up. Revisit only if real-world usage proves the single-burst model insufficient.


		7. Foreground/synchronous graceful reconfiguration retention. Our strong recommendation is to deprecate the current foreground (session-bound) mechanism in favor of the background model. During rollout, the foreground experience is preserved as a thin session-side wait shim over the background mechanism (see [SQL surface](#sql-surface)) — not by retaining the existing parallel state machine. This means the `pending: bool` flag on replicas and the associated three-stage machinery can be removed up front; deprecating the foreground experience later is simply deleting the wait shim. The one behavioral difference vs. today is that session disconnect during the wait no longer aborts the reconfiguration (arguably a feature; the durable target stays set and the controller continues).

		8. Hydration burst during graceful reconfiguration. Should burst kick in while a graceful reconfig is in flight (target size differs from current replicas)? Leaning toward no: the new-size replicas are themselves transient hydration capacity, and stacking burst on top risks confusing billing and behavior. Burst resumes once the reconfig settles.


		The following behaviors fall out of the design rather than being its headline outcomes. They are user-observable and worth flagging in release notes and user-facing documentation.

		- Burst and reconfiguration-transient replicas appear in billing and metering identically to ordinary replicas. A user with `HYDRATION_SIZE` set sees additional billing during hydration windows; a user issuing a background `ALTER CLUSTER` sees additional billing during the overlap between the old and new replica sets.

Conversation

aljoscha commented May 22, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maheshwarip May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

maheshwarip May 22, 2026 •

edited

Loading