Skip to content

Cluster.scale is not robust to multiple calls #2257

Description

@guillaumeeb

As experienced in dask/dask-jobqueue#112 and a related PR dask/dask-jobqueue#97, Cluster.scale behavior is unstable if called multiple times in a row.

I suspect part of this problem is due to how asynchronism is used here:

If we want scale to run asynchronously, I propose to just add a _scale() method here (a corountine?) to be called in an async manner from scale(). In this scale, we would get the state and perform the modifications at the same time:

def _scale(self, n):
        with log_errors():
            if n >= len(self.scheduler.workers):
                self.scale_up(n)
            else:
                to_close = self.scheduler.workers_to_close(
                    n=len(self.scheduler.workers) - n)
                logger.debug("Closing workers: %s", to_close)
                self.scheduler.retire_workers(workers=to_close)
                self.scale_down(to_close)

@jhamman @mrocklin any opinion, advice?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions