Skip to content

Modification to run on Summit and support login / batch / compute architecture#467

Closed
dustinvanstee wants to merge 1 commit into
dask:masterfrom
dustinvanstee:master
Closed

Modification to run on Summit and support login / batch / compute architecture#467
dustinvanstee wants to merge 1 commit into
dask:masterfrom
dustinvanstee:master

Conversation

@dustinvanstee

Copy link
Copy Markdown

I am a user on Summit and it uses LSF for job submittal. It has a unique architecture that has a login / batch / compute node setup such that a job submitted via LSF needs to have this jsrun wrapper script precede any batch job run on cluster as show in this sample job script (otherwise the job just stays on the batch node and never gets to the compute node).

#!/usr/bin/env bash

#BSUB -J dask-worker
#BSUB -P xxx201
#BSUB -W 00:30
#BSUB -nnodes 1

jsrun -n1 -a1 -g0 -c1 /ccs/home/vanstee/.conda/envs/powerai-ornl/bin/python -m distributed.cli.dask_worker tcp://10.41.0.33:40063 --nthreads 8 --memory-limit 4.00GB --name dummy-name --nanny --death-timeout 60 --interface ib0 --protocol tcp://

I modified core.py to achieve this goal, and would like to see if some version of this idea could make it into dask_jobqueue library. thanks

@dustinvanstee

Copy link
Copy Markdown
Author

Here is a sample of how I create LSFCluster using this idea ..

from dask_jobqueue import LSFCluster
cluster = LSFCluster(
scheduler_options={"dashboard_address": ":3762"},
cores=8,
processes=1,
memory="4 GB",
project="xxx201",
walltime="00:30",
job_extra=["-nnodes 1"], # <--- new!
header_skip=["-R", "-n ", "-M"], # <--- new!
interface='ib0',
use_stdin=False,
dask_worker_prefix="jsrun -n1 -a1 -g0 -c1"
)

@lesteve

lesteve commented Oct 1, 2020

Copy link
Copy Markdown
Member

Thanks a lot for your PR!

This is certainly a bit of a hack but I think what you want can currently be achieved by doing:

dask_worker_prefix = "jsrun -n1 -a1 -g0 -c1"
cluster = LSFCluster(
    ...,
    python= f"{dask_worker_prefix} {sys.executable}")

For how to do it in a cleaner way in the longer-term, using Jinja templates (and allowing user to tweak the Jinja template) seems a good way forward. Unfortunately I don't think I will find time to look at #370 any time soon ....

@dustinvanstee

Copy link
Copy Markdown
Author

@lesteve thanks for your feedback. this does exactly what I need, appreciate it ! closing PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants