I have a very similar issue with #246. After I create the cluster and scale it the nodes are running but no CPUs are available.
import logging
logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.DEBUG)
from dask_jobqueue import SLURMCluster
cluster = SLURMCluster()
DEBUG:Using selector: EpollSelector
DEBUG:Using selector: EpollSelector
DEBUG:Starting worker: 1
DEBUG:writing job script:
#!/usr/bin/env bash
#SBATCH -J dask-worker
#SBATCH -p compute
#SBATCH -A pr008033
#SBATCH -n 1
#SBATCH --cpus-per-task=20
#SBATCH --mem=45G
#SBATCH -t 00:30:00
#SBATCH -o myjob.%j.out
#SBATCH -e myjob.%j.err
JOB_ID=${SLURM_JOB_ID%;*}
/users/pr008/...
tcp://195.251.23.79:33491 --nthreads 20 --memory-limit 48.00GB --name 1 --nanny --death-timeout 60 --interface ib0
DEBUG:Executing the following command to command line
sbatch /tmp/tmpoals04ok.sh
DEBUG:Starting job: 804606
DEBUG:Starting worker: 0
DEBUG:writing job script:
#!/usr/bin/env bash
#SBATCH -J dask-worker
#SBATCH -p compute
#SBATCH -A pr008033
#SBATCH -n 1
#SBATCH --cpus-per-task=20
#SBATCH --mem=45G
#SBATCH -t 00:30:00
#SBATCH -o myjob.%j.out
#SBATCH -e myjob.%j.err
JOB_ID=${SLURM_JOB_ID%;*}
/users/pr008/...
tcp://195.251.23.79:33491 --nthreads 20 --memory-limit 48.00GB --name 0 --nanny --death-timeout 60 --interface ib0
DEBUG:Executing the following command to command line
sbatch /tmp/tmpsuq3ysbm.sh
DEBUG:Starting job: 804607
from dask.distributed import Client
client = Client(cluster)
client
{'type': 'Scheduler',
'id': 'Scheduler-5198da07-3700-42aa-a068-036146073ef6',
'address': 'tcp://195.251.23.79:33491',
'services': {'dashboard': 8787},
'workers': {}}
When I run squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
804606 compute dask-wor ... R 6:04 1 node287
804607 compute dask-wor ... R 6:04 1 node003
The HPC has infiniband and each node has 20 CPUs. If you need any other info please tell me.
Thank you
I have a very similar issue with #246. After I create the cluster and scale it the nodes are running but no CPUs are available.
When I run squeue
The HPC has infiniband and each node has 20 CPUs. If you need any other info please tell me.
Thank you