Skip to content

Flaky test_adapt_then_manual: race condition in SpecCluster #7079

Description

@crusaderky

test_adapt_then_manual is midly flaky. It looks like a race condition in the tested code.
https://github.com/dask/distributed/actions/runs/3143480989/jobs/5108282616

There are two separate tracebacks in the failed test.

2022-09-28 13:18:40,941 - tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOMainLoop object at 0x00000284829BC0A0>>, <Task finished name='Task-38921' coro=<SpecCluster._correct_state_internal() done, defined at d:\a\distributed\distributed\distributed\deploy\spec.py:330> exception=KeyError(2)>)
Traceback (most recent call last):
  File "C:\Miniconda3\envs\dask-distributed\lib\site-packages\tornado\ioloop.py", line 741, in _run_callback
    ret = callback()
  File "C:\Miniconda3\envs\dask-distributed\lib\site-packages\tornado\ioloop.py", line 765, in _discard_future_result
    future.result()
  File "d:\a\distributed\distributed\distributed\deploy\spec.py", line 351, in _correct_state_internal
    d = self.worker_spec[name]
KeyError: 2
distributed\deploy\spec.py:437: AssertionError

    async def _close(self):
        [...]
            for w in self._created:
>               assert w.status in {
                    Status.closing,
                    Status.closed,
                    Status.failed,
                }, w.status
E               AssertionError: Status.init

Metadata

Metadata

Assignees

No one assigned

    Labels

    flaky testIntermittent failures on CI.

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions