Skip to content

Scheduler crashes in SSHCluster in 2023.3.2 but not in 2023.3.1 #7724

Description

@jabbera

Describe the issue: Attempting to use the SSHCluster does not work in 2023.3.2 because the scheduler exits early with an exit code of 1

INFO:distributed.deploy.ssh:2023-03-29 18:21:07,199 - distributed.http.proxy - INFO - To route to workers diagnostics web server please install jupyter-server-proxy: python -m pip install jupyter-server-proxy
2023-03-29 18:21:07,204 - distributed.deploy.ssh - INFO - 2023-03-29 18:21:07,204 - distributed.scheduler - INFO - State start
INFO:distributed.deploy.ssh:2023-03-29 18:21:07,204 - distributed.scheduler - INFO - State start
2023-03-29 18:21:07,207 - distributed.deploy.ssh - INFO - 2023-03-29 18:21:07,206 - distributed.scheduler - DEBUG - Clear task state
INFO:distributed.deploy.ssh:2023-03-29 18:21:07,206 - distributed.scheduler - DEBUG - Clear task state
2023-03-29 18:21:07,207 - distributed.deploy.ssh - INFO - 2023-03-29 18:21:07,207 - distributed.scheduler - INFO -   Scheduler at:   tcp://10.15.40.68:36143
INFO:distributed.deploy.ssh:2023-03-29 18:21:07,207 - distributed.scheduler - INFO -   Scheduler at:   tcp://10.15.40.68:36143
INFO:asyncssh:[conn=0, chan=1] Received exit status 1
INFO:asyncssh:[conn=0, chan=1] Received channel close
INFO:asyncssh:[conn=0, chan=1] Channel closed
INFO:asyncssh:[conn=0, chan=1] Sending KILL signal

When rolling back to 2023.3.1 the scheduler starts sucessfully:

2023-03-29 18:23:33,874 - distributed.deploy.ssh - INFO - 2023-03-29 18:23:33,873 - distributed.http.proxy - INFO - To route to workers diagnostics web server please install jupyter-server-proxy: python -m pip install jupyter-server-proxy
INFO:distributed.deploy.ssh:2023-03-29 18:23:33,873 - distributed.http.proxy - INFO - To route to workers diagnostics web server please install jupyter-server-proxy: python -m pip install jupyter-server-proxy
2023-03-29 18:23:33,878 - distributed.deploy.ssh - INFO - 2023-03-29 18:23:33,878 - distributed.scheduler - INFO - State start
INFO:distributed.deploy.ssh:2023-03-29 18:23:33,878 - distributed.scheduler - INFO - State start
2023-03-29 18:23:33,882 - distributed.deploy.ssh - INFO - 2023-03-29 18:23:33,881 - distributed.scheduler - DEBUG - Clear task state
INFO:distributed.deploy.ssh:2023-03-29 18:23:33,881 - distributed.scheduler - DEBUG - Clear task state
2023-03-29 18:23:33,883 - distributed.deploy.ssh - INFO - 2023-03-29 18:23:33,882 - distributed.scheduler - INFO -   Scheduler at:   tcp://10.15.40.68:40305
INFO:distributed.deploy.ssh:2023-03-29 18:23:33,882 - distributed.scheduler - INFO -   Scheduler at:   tcp://10.15.40.68:40305
INFO:asyncssh:Opening SSH connection to localhost, port 22
INFO:asyncssh:[conn=1] Connected to SSH server at localhost, port 22

Minimal Complete Verifiable Example:

import logging
logging.basicConfig(level=logging.DEBUG)

from distributed import SSHCluster
cluster = SSHCluster(["localhost", "localhost"])

Anything else we need to know?: Full repro here:

git clone https://github.com/jabbera/distributed-bug.git
cd distributed-bug
python3.10 -m venv .venv
source .venv/bin/activate
pip install -r requirements-bug.txt
python demo.py

Environment:

  • Dask version: 2023.3.2
  • Python version: 3.10.5
  • Operating System: Ubuntu 20.04.5 LTS
  • Install method (conda, pip, source): pip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions