Introduce cost-based tasks autoscaler for streaming ingestion#18819
Merged
Conversation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cost-Based Autoscaler for Seekable Stream Supervisors
Overview
Implements a cost-based autoscaling algorithm for seekable stream supervisor tasks that optimizes task count by balancing lag reduction against resource efficiency.
Note: this patch doesn't support autoscaling (down) during task rollover. Temporarily, it scales down in the same manner as scales up.
Introduces
WeightedCostFunctionfor cost-based autoscaling decisions. The function computes a cost score (in seconds) for each candidate task count, balancing lag recovery time against idle resource waste.Key Design Decisions
Cost Formula
aggregateLag / (taskCount × avgProcessingRate)— time to clear backlogtaskCount × taskDuration × predictedIdleRatio— wasted compute timeIdle Prediction Model
Uses capacity-based linear scaling:
More tasks → more idle per task; fewer tasks → busier tasks.
Ideal Idle Range
Defines optimal utilization as idle ratio within [0.2, 0.6]:
Conservative Cold Start Behavior
When processing rate is unavailable (cold start, new tasks):
This prevents scaling decisions based on incomplete data.
Additionally, we add reading
poll-idle ratio-avgfrom/rowStatstask endpoint.This PR has: