[Swarming] Add backpressure mechanism to SwarmingService#5284
Conversation
letitz
left a comment
There was a problem hiding this comment.
One substantive comment about the queue size limit being per-OS, otherwise a bunch of small comments.
letitz
left a comment
There was a problem hiding this comment.
LGTM % a couple remaining comments to avoid blocking on me further.
Co-authored-by: Titouan Rigoudy <titouan.rigoudy@gmail.com>
| return True | ||
|
|
||
| count = response.count | ||
| max_pending_tasks = get_max_size_for_queue( |
There was a problem hiding this comment.
Note for reviewer:
Since the swarming queue is not a pub/sub queue i extracted some of the logic from the data class to the public to reuse it here.
The other alternative to this is create a PubSubTaskQueue instance for the swarming queue(im not against this!) but if i do so i would also like to rename the PubSubTaskQueue module, since it would be confusing to have a object instance that its not really pub/sub queue.
There was a problem hiding this comment.
That makes sense, and it's good enough for me to land this PR. I think ideally you would have something like:
# task_queue.py
class TaskQueueLimiter:
default_limit: int
feature_flag: FeatureFlags
def max_pending_tasks(self) -> int:
...
class PubSubTaskQueue:
name: str
limiter: TaskQueueLimiterSince the module name here is still awkward. I don't think you should actually do it though, this is good enough and we have other things to do :)
letitz
left a comment
There was a problem hiding this comment.
Thanks, we're nearly there!
| return True | ||
|
|
||
| count = response.count | ||
| max_pending_tasks = get_max_size_for_queue( |
There was a problem hiding this comment.
That makes sense, and it's good enough for me to land this PR. I think ideally you would have something like:
# task_queue.py
class TaskQueueLimiter:
default_limit: int
feature_flag: FeatureFlags
def max_pending_tasks(self) -> int:
...
class PubSubTaskQueue:
name: str
limiter: TaskQueueLimiterSince the module name here is still awkward. I don't think you should actually do it though, this is good enough and we have other things to do :)
letitz
left a comment
There was a problem hiding this comment.
LGTM % tiny fixes to SwarmingApi.count_tasks().
Overview
This PR implements a backpressure mechanism in
SwarmingServiceto prevent overloading the Swarming pool with too many tasks.Changes
MAX_PENDING_TASKS = 25to limit the number of pending tasks.create_utask_main_jobsto check the queue size before pushing each task using Swarming'sCountTasksRPC.CountTasksrpc call fails, we assume the queue is full and stop scheduling further tasks in the batch.Tests performed
In this image we can appreciate that the utask main schedulers picks tasks from both the

utask_mainqueue and the newutask_main-swarmingqueueHeres another logs showing us that non swarming tasks are still being scheduled:

Here are some logs of the scheduler pushing tasks into swarming:
