Skip to content

Interface not loaded from config file #358

Description

@griverat

Hello!
I've been using dask-jobqueue for quite some time without messing with the default configuration file localted in ~/.config/dask. Recently. I decided to change the config file jobqueue.yaml to avoid copy/pasting the same jupyter cell across notebooks and just call

from dask_jobqueue import SLURMCluster

cluster = SLURMCluster()

with this configuration

jobqueue:
  slurm:
    name: dask-worker
    cores: 12
    memory: 60GB 
    processes: 1
    interface: 'ib0' 
    local-directory: /home/grivera/scratch
    queue: mpi_short2
    walltime: '01:00:00'
    log-directory: /home/grivera/slurm_logs

instead of

from dask_jobqueue import SLURMCluster

cluster = SLURMCluster(
    queue="mpi_short2",
    cores=12,
    memory="60GB",
    processes=1,
    interface="ib0",
    dashboard_address=":6767",
)

When using the config file, my workers don't seem to connect at all and eventually they get killed due to time out. To debug this, I printed cluster.job_script() for both cases and found the ip used is different, even thought I'm using the same values in both cases.

output of job_script when using kwargs
#!/usr/bin/env bash

#SBATCH -J dask-worker
#SBATCH -e /home/grivera/slurm_logs/dask-worker-%J.err
#SBATCH -o /home/grivera/slurm_logs/dask-worker-%J.out
#SBATCH -p mpi_short2
#SBATCH -n 1
#SBATCH --cpus-per-task=12
#SBATCH --mem=56G
#SBATCH -t 01:00:00

JOB_ID=${SLURM_JOB_ID%;*}

/home/grivera/miniconda3/envs/Work/bin/python -m distributed.cli.dask_worker tcp://192.168.0.15:33289 --nthreads 12 --memory-limit 60.00GB --name name --nanny --death-timeout 60 --local-directory /home/grivera/scratch --interface ib0
output of job_script when using the yaml file
#!/usr/bin/env bash

#SBATCH -J dask-worker
#SBATCH -e /home/grivera/slurm_logs/dask-worker-%J.err
#SBATCH -o /home/grivera/slurm_logs/dask-worker-%J.out
#SBATCH -p mpi_short2
#SBATCH -n 1
#SBATCH --cpus-per-task=12
#SBATCH --mem=56G
#SBATCH -t 01:00:00

JOB_ID=${SLURM_JOB_ID%;*}

/home/grivera/miniconda3/envs/Work/bin/python -m distributed.cli.dask_worker tcp://127.0.0.1:43852 --nthreads 12 --memory-limit 60.00GB --name name --nanny --death-timeout 60 --local-directory /home/grivera/scratch --interface ib0

If I omit the interface kwarg, the result is the same for both cases (workers can't connect and ip is 127.0.0.1) so I think this parameter is not being loaded from the config file.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions