Adaptive.needs_cpu does not depend on number of tasks remaining

## Issue description

We're using distributed (with `KubeCluster`) with client.map to schedule a lot of long-running tasks (right now we're running a Fortran-based hydrological model).

We noticed that clusters don't scale down when the number of tasks remaining falls below the number of workers until _all_ tasks have completed.

I isolated the problem to `Adaptive.needs_cpu()`. [The current method](https://github.com/dask/distributed/blob/master/distributed/deploy/adaptive.py#L119) does not check whether there are any pending tasks on the scheduler:

```python
    def needs_cpu(self):
        """
        Check if the cluster is CPU constrained (too many tasks per core)
        Notes
        -----
        Returns ``True`` if the occupancy per core is some factor larger
        than ``startup_cost``.
        """
        total_occupancy = self.scheduler.total_occupancy
        total_cores = sum([ws.ncores for ws in self.scheduler.workers.values()])

        if total_occupancy / (total_cores + 1e-9) > self.startup_cost * 2:
            logger.info("CPU limit exceeded [%d occupancy / %d cores]",
                        total_occupancy, total_cores)
            return True
        else:
            return False
```

This results in `adapt.recommendations()` returning the error message `Trying to scale up and down simultaneously` whenever there are fewer pending tasks than there are workers, as long as the average task time suggests that more cores are needed (independent of the number of pending tasks).

## Proposed solution

I implemented a quick fix, by finding the total number of pending tasks and only recommending a "scale up" if the number of tasks exceeds the number of existing workers, in addition to the current criteria:

```python
    def needs_cpu(self):
        """
        Check if the cluster is CPU constrained (too many tasks per core)
        Notes
        -----
        Returns ``True`` if the occupancy per core is some factor larger
        than ``startup_cost``.
        """
        total_occupancy = self.scheduler.total_occupancy
        total_cores = sum([ws.ncores for ws in self.scheduler.workers.values()])

        if total_occupancy / (total_cores + 1e-9) > self.startup_cost * 2:
            logger.info("CPU limit exceeded [%d occupancy / %d cores]",
                        total_occupancy, total_cores)

            tasks_processing = sum((len(w.processing) for w in self.scheduler.workers.values()))
            num_workers = len(self.scheduler.workers)

            if tasks_processing > num_workers:
                logger.info("pending tasks exceed number of workers [%d tasks / %d workers]",
                            tasks_processing, num_workers)
                return True

        return False
```
### Pros
* Exhibits the desired behavior (we're using this fix now by subclassing KubeCluster)

### Cons
* May be a limited use case
* Increases overhead of `needs_cpu`. I tested this out on limited cases with between 800 - 100,000 tasks and found the current implementation usually takes ~ 30-40 µs, and the proposed implementation roughly doubles this. There may be faster ways of doing this, but I imagine this may be a critical problem with this implementation, so help would be appreciated in estimating tasks remaining more quickly!

## Testable example

Requires some interactivity, but reliably re-produces the problem

```python
In [1]: import dask.distributed as dd

In [2]: cluster = dd.LocalCluster()

In [3]: adaptive = cluster.adapt(minimum=0, maximum=10)

In [5]: adaptive
Out[5]: <distributed.deploy.adaptive.Adaptive at 0x1153b3668>

In [6]: def wait_a_while(i):
   ...:     import time
   ...:     import random
   ...:     s = (random.random()) ** 6 * 60
   ...:     time.sleep(s)
   ...:
   ...:     return s

In [8]: client = dd.Client(cluster)

In [9]: f = client.map(wait_a_while, range(10))

In [10]: # wait for most futures to finish

In [17]: f
Out[17]:
[<Future: status: finished, type: float, key: wait_a_while-fdc644303e9be2c85edd9201261409af>,
 <Future: status: finished, type: float, key: wait_a_while-97098da3920c7582be062b54ee78efe1>,
 <Future: status: finished, type: float, key: wait_a_while-630e0e1fb8a0f8ede1140368de97ffce>,
 <Future: status: pending, key: wait_a_while-09f09368b6e9555668ab3f82efad91dd>,
 <Future: status: finished, type: float, key: wait_a_while-65d1c81d072269ab477d806d017302e2>,
 <Future: status: finished, type: float, key: wait_a_while-ca96a3b8db585962fc8638066458a815>,
 <Future: status: finished, type: float, key: wait_a_while-0a13c1a4f503a08e1edaf79dba3c94c5>,
 <Future: status: finished, type: float, key: wait_a_while-549f788086c75f350390b4a6131ae6cb>,
 <Future: status: pending, key: wait_a_while-17133623fc213adcb83f3b45e53839c9>,
 <Future: status: pending, key: wait_a_while-41e284f91b0a2bb1c3a33394e51c97fc>]

In [18]: cluster._adaptive.recommendations()
Out[18]: {'status': 'error', 'msg': 'Trying to scale up and down simultaneously'}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Adaptive.needs_cpu does not depend on number of tasks remaining #2329

Issue description

Proposed solution

Pros

Cons

Testable example

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

Adaptive.needs_cpu does not depend on number of tasks remaining #2329

Description

Issue description

Proposed solution

Pros

Cons

Testable example

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions