Start workers are "not working" (partially) for me on a SLURMCluster but at the SLURM status I can still see the jobs running, but no workers info available in client.scheduler_info() !! so, I can not stop, manage or run graphs code using Dask API.
In [1]: from dask_jobqueue import SLURMCluster
In [2]: cluster = SLURMCluster(job_cpu=2,memory='50MB',cores=1,interface='enp0s8')
In [3]: cluster
Out [3]: SLURMCluster(cores=0, memory=0 B, workers=0/0, jobs=0/0)
In [4]: from dask.distributed import Client
In [6]: client = Client(cluster)
In [7]: client
Out [7]: <Client: scheduler='tcp://10.0.1.8:35619' processes=0 cores=0>
In [7]: workers = cluster.start_workers(2)
In [8]: workers
Out[9]: # NO OUTPUT OF WORKERS instance
In [9]: client.scheduler_info()
Out[9]:
{'type': 'Scheduler',
'id': 'Scheduler-b04cb227-caec-44fe-9e0d-9180908c1b6e',
'address': 'tcp://10.0.1.8:35619',
'services': {'bokeh': 8787},
'workers': {}}
- dask.version is 1.1.2 and tornado version is 5.1.1
from SLURM perspective, dask workers are running correctly on the HPC:
[admin@master ~]$ qstat
Job id Name Username Time Use S Queue
------------------- ---------------- --------------- -------- - ---------------
166 dask-worker root 00:00:00 R VM-CPU-Node
167 dask-worker root 00:00:00 R VM-CPU-Node
[admin@master ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
167 VM-CPU-No dask-wor root R 0:01 1 worker3
166 VM-CPU-No dask-wor root R 0:01 1 worker2
Well, i'm not sure if I'm causing this problem or samething goes wrong with Dask.
Start workers are "not working" (partially) for me on a SLURMCluster but at the SLURM status I can still see the jobs running, but no workers info available in client.scheduler_info() !! so, I can not stop, manage or run graphs code using Dask API.
from SLURM perspective, dask workers are running correctly on the HPC:
Well, i'm not sure if I'm causing this problem or samething goes wrong with Dask.