Introduce cost-based tasks autoscaler for streaming ingestion by Fly-Style · Pull Request #18819 · apache/druid

Fly-Style · 2025-12-05T22:42:07Z

Cost-Based Autoscaler for Seekable Stream Supervisors

Overview

Implements a cost-based autoscaling algorithm for seekable stream supervisor tasks that optimizes task count by balancing lag reduction against resource efficiency.

Note: this patch doesn't support autoscaling (down) during task rollover. Temporarily, it scales down in the same manner as scales up.
Introduces WeightedCostFunction for cost-based autoscaling decisions. The function computes a cost score (in seconds) for each candidate task count, balancing lag recovery time against idle resource waste.

Key Design Decisions

Cost Formula

totalCost = lagWeight × lagRecoveryTime + idleWeight × idlenessCost

lagRecoveryTime = aggregateLag / (taskCount × avgProcessingRate) — time to clear backlog
idlenessCost = taskCount × taskDuration × predictedIdleRatio — wasted compute time

Idle Prediction Model

Uses capacity-based linear scaling:

predictedIdle = 1 - (1 - currentIdle) / (proposedTasks / currentTasks)

More tasks → more idle per task; fewer tasks → busier tasks.

Ideal Idle Range

Defines optimal utilization as idle ratio within [0.2, 0.6]:

Below 0.2: overloaded → scale up
Within range: optimal → no action
Above 0.6: underutilized → scale down

Conservative Cold Start Behavior

When processing rate is unavailable (cold start, new tasks):

Current task count: cost = 0.01 (allowed)
Any scaling: cost = +∞ (prohibited)

This prevents scaling decisions based on incomplete data.

Additionally, we add reading poll-idle ratio-avg from /rowStats task endpoint.

This PR has:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce cost-based tasks autoscaler for streaming ingestion#18819

Introduce cost-based tasks autoscaler for streaming ingestion#18819
kfaraz merged 23 commits into
apache:masterfrom
Fly-Style:new-autoscaler

Fly-Style commented Dec 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

Fly-Style commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Cost-Based Autoscaler for Seekable Stream Supervisors

Overview

Key Design Decisions

Cost Formula

Idle Prediction Model

Ideal Idle Range

Conservative Cold Start Behavior

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Fly-Style commented Dec 5, 2025 •

edited

Loading