Adaptive stops when workers preempted

**What happened**:
I'm using a custom dask provider to run a distributed dask cluster with on-prem compute. These compute nodes can be preempted. I've switched on autoscaling with min workers 0 and max workers 500. Whenever I persist DF (200GB+) onto the cluster, and when workers are preempted, this is observed in the logs:

```
distributed.deploy.adaptive_core - INFO - Adaptive stop
```
Looking at [pull 4422](https://github.com/dask/distributed/pull/4422), I noticed this was made more verbose. I made that change locally, and was seeing this in the log:

```
distributed.deploy.adaptive_core - ERROR - Adaptive stopping due to error in <closed TCP>: Stream is closed: while trying to call remote method 'retire_workers'
```

When this happens, the cluster is unable to scale down to the minimum number of workers even after the persisted DF is deleted.

**What you expected to happen**:
Adaptive to be resilient to preemption.

**Minimal Complete Verifiable Example**:
You can repro this by simply generating 200+GB worth of random timeseries, and persisting this to a dask cluster. Kill a few workers - to simulate preemption - once the DF is about 90% persisted (from the Dask scheduler's dashboard status tab).

**Anything else we need to know?**:
Are there good reasons for calling stop on adaptive when this happens? I've tried removing the call to `.stop()` locally, and the cluster seem to run fine afterwards.

I am also aware of [issue 4421](4421). The solution was for the workers to poll a metaurl from public cloud providers for an early warning signal on preemption before shutting down gracefull. However, that is not an option in my case. 

**Environment**:
- Dask version: 2.30.0
- Distributed version: 2.30.0
- Python version: 3.7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Adaptive stops when workers preempted #4591

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

Adaptive stops when workers preempted #4591

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions