Add configuration file#48
Conversation
|
I'm generally a fan of this approach. Without reading all of the dask PR, do I understand correctly that these config options could go in the |
|
They could. They could also go into a special |
|
This looks good! Adding a way to set standard configuration without breaking current use. How do you deploy the default jobqueue.yml file if it is not present? |
|
Happy to answer. However, if people have the time to read the proposed
configuration.rst file in https://github.com/dask/dask/pull/3432/files and
then ask questions that still remain that would be very helpful in
identifying problems with the proposed documentation
…On Mon, Apr 30, 2018 at 5:07 PM, Guillaume EB ***@***.***> wrote:
This looks good! Adding a way to set standard configuration without
breaking current use.
It is maybe too open on the file names, but I guess it is simpler this
way, in charge of admin or users not to mess with same parameter in
different files.
How do you deploy the default jobqueue.yml file if it is not present?
How do you deal with incomplete yml file? config parameters are overloaded
with the default one?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#48 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszPh-hflPNjqz3POIJJsTJdfoEMnrks5tt30rgaJpZM4Tsyl4>
.
|
|
Hmm, I probably missed something, but I still don't see where the dask_jobqueue/jobqueue.yaml is used or copied by the dask.config submodule. |
Right, sorry, I missed the config.py file. This should be more clear now. There is also now documentation for downstream libraries in https://github.com/mrocklin/dask/blob/c40d27a6cb606bbee6200a2bf6eba1869816398d/docs/source/configuration.rst So, the dask config PR will probably go in sometime tomorrow. If we merge this PR as-is it will only work with the dask master version. I think that we have a few options:
I'm somewhat inclined towards option 3, and averse to option 2, happy to go either way though. |
|
OK, it looks like dask-jobqueue was released quite recently. Almost no new features have been added since @jhamman released two weeks ago. I'm going to go ahead and merge this once upstream libraries are ready and we pass tests here if there are no objections. |
de0ffed to
8a09d49
Compare
guillaumeeb
left a comment
There was a problem hiding this comment.
I'm fine with option 3 or 1. Do you plan to add something about dask master branch dependency in the documentation?
Would this be a general rule for dask related modules: master depend on master?
We try to provide some leeway when possible, but it's common to have master-dependence when we're about to make 0.X.0 version changes. |
|
SGE tests are failing in a way that I don't currently understand. Trying to reproduce it locally but running into #53 |
|
Issues resolved. This now works for me. Critical feedback on this is welcome. |
|
Live testing is also quite welcome. I haven't actually used this on a live cluster. It would be good to get usability feedback from anyone who uses dask-jobqueue semi-regularly. |
|
Just curious (I don't know very much about cluster admin), what would the recommended approach for cluster admins? Would it be to drop the same yaml file in |
|
Yes, I think that you would want to drop the subset of the configuration that made sense for all users into # /etc/dask
jobqueue:
name: dask-worker
threads: 8
processes: 4
memory: 192GB # should we change this to be for the full job?
interface: ib0
local-directory: /local/scratch/$USER # should we support environment variables?
queue: standard
sge:
resource-spec: ...Then they can leave the other parts for users, or users can overwrite elements if they wish (user directories are given more precedence). The # ~/.config/dask/jobqueue.yaml
jobqueue:
queue: priority
project: PB382938 |
jhamman
left a comment
There was a problem hiding this comment.
This all looks good to me. I have not tried this out yet and wont be able to for a few days so if others are pleased with how it works, I'm fine to see it merged.
|
I'm happy to wait until people have a chance to try things out.
…On Tue, May 8, 2018 at 6:55 PM, Joe Hamman ***@***.***> wrote:
***@***.**** approved this pull request.
This all looks good to me. I have not tried this out yet and wont be able
to for a few days so if others are pleased with how it works, I'm fine to
see it merged.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#48 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszEjGIaOiyXN9W3yaMSMLD-8CVA4Aks5twiJVgaJpZM4Tsyl4>
.
|
|
I gave this a try today. Things were not as smooth as I had hoped (probably user error). Here's a screen grab: The details:
What worked:
What didn't work:
|
|
Thanks for trying things out @jhamman . If you're around and have any interest in trying things out on a screenshare I'd be interested. If not then can I ask you for the value of |
|
Sure, happy to walk you through this. Let's connect in the next couple days. I'm traveling today. |
|
I'm around most of today and tomorrow. Ping me on gchat when convenient? |
|
@jhamman I've reproduced your error, identified the bug, and fixed it in dask/dask#3500 Thank you for testing and reporting. |
|
@mrocklin, with dask/dask#3500, this seems to be working quite well. Thanks for working out the remaining issues without me :) |
|
As always, thanks for reporting them. This was a pretty important fix
upstream.
…On Tue, May 15, 2018 at 12:46 PM, Joe Hamman ***@***.***> wrote:
@mrocklin <https://github.com/mrocklin>, with dask/dask#3500
<dask/dask#3500>, this seems to be working quite
well. Thanks for working out the remaining issues without me :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#48 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszA3XgPB2bBqt24V_awyUZ524Xd0fks5tywZmgaJpZM4Tsyl4>
.
|
|
@kmpaul and @davide - What's the best way to get CISL to put some generic dask configs in jobqueue:
name: dask-worker
threads: 4
processes: 1
memory: 3GB
interface: ib0
death-timeout: 60
local-directory: null
# extra: ""
# env-extra: []
queue: share
walltime: '01:00:00'
pbs:
resource-spec: select=1:ncpus=4
# job-extra: []This would just require users to just specify a project number to get started... from dask_jobqueue import PBSCluster
from distributed import Client
cluster = PBSCluster(project='ABC')
client = Client(cluster)
cluster.scale_up(4) |
|
@mrocklin - once dask/dask#3500 is merged, I'd like to merge this as well. You just have a few merge conflicts to sort out before then. |
|
Trying this, but I must be doing something wrong. Here's what I've done in my conda env: What I get: |
|
Ah, thank you for trying things and reporting back @guillaumeeb . I may have addressed this in 533c871 . Can you try again? |
|
Thanks for the fix! So I am able to start a cluster and see the config files: In [2]: !ls /home/eh/eynardbg/.config/dask
distributed.yaml jobqueue.yaml
In [6]: cluster = PBSCluster()
In [8]: client = Client(cluster)
In [10]: client
Out[10]: <Client: scheduler='tcp://10.120.40.61:44310' processes=0 cores=0>
In [11]: cluster.start_workers(4)
Out[11]: [2, 3, 4, 5]
In [12]: client
Out[12]: <Client: scheduler='tcp://10.120.40.61:44310' processes=16 cores=32>Then I tried to change the configuration, and set: $ cat /home/eh/eynardbg/.config/dask/jobqueue.yaml
jobqueue:
# name: dask-worker
# threads: 2
processes: 12
memory: 10GB
# interface: null
# death-timeout: 60
# local-directory: null
# extra: ""
# env-extra: []
# queue: null
project: DaskjobqueueTest
walltime: '01:00:00'
pbs:
resource-spec: select=1:ncpus=24:mem=120GB
# job-extra: []I did not refresh the config at first, so obviously it did not work, but then I found the correct method into dask.config thanks to the dask documentation in your PR. Maybe there should be a refresh method on every config object? Anyway I called it, and tried to launch another cluster: In [35]: dask.config.config
Out[35]:
(...)
'jobqueue': {'death-timeout': 60,
'env-extra': [],
'extra': '',
'interface': None,
'local-directory': None,
'memory': '8GB',
'name': 'dask-worker',
'pbs': {'job-extra': [], 'resource-spec': None},
'processes': 4,
'project': None,
'queue': None,
'sge': {'resource-spec': None},
'slurm': {'job-cpu': None, 'job-extra': {}, 'job-mem': None},
'threads': 2,
'walltime': '00:30:00'},
(...)
In [38]: dask.config.refresh()
In [39]: dask.config.config.get('jobqueue')
Out[39]:
{'death-timeout': 60,
'env-extra': [],
'extra': '',
'interface': None,
'local-directory': None,
'memory': '10GB',
'name': 'dask-worker',
'pbs': {'job-extra': [], 'resource-spec': 'select=1:ncpus=24:mem=120GB'},
'processes': 12,
'project': 'DaskjobqueueTest',
'queue': None,
'sge': {'resource-spec': None},
'slurm': {'job-cpu': None, 'job-extra': {}, 'job-mem': None},
'threads': 2,
'walltime': '01:00:00'}
In [41]: cluster = PBSCluster()
In [42]: client = Client(cluster)
In [43]: client
Out[43]: <Client: scheduler='tcp://10.120.40.61:43734' processes=0 cores=0>
In [44]: cluster.start_workers(2)
Out[44]: [2, 3]
In [45]: client
Out[45]: <Client: scheduler='tcp://10.120.40.61:43734' processes=8 cores=16>
In [46]: dask.config.get('jobqueue.processes')
Out[46]: 12
In [47]: print(cluster.job_script())
...:
#!/bin/bash
#PBS -N dask-worker
#PBS -l select=1:ncpus=8:mem=30GB
#PBS -l walltime=00:30:00
/work/ADM/hpc/eynardbg/Outils/miniconda3/envs/dask_dev/bin/dask-worker tcp://10.120.40.61:43734 --nthreads 2 --nprocs 4 --memory-limit 8GB --name dask-worker-4 --death-timeout 60The new config don't seem to be taken into account by the newly created PBSCluster. Restarting the ipython process is OK though, so this may be expected: In [2]: from distributed import Client
In [3]: from dask_jobqueue import PBSCluster
In [4]: cluster = PBSCluster()
In [5]: client = Client(cluster)
In [6]: client
Out[6]: <Client: scheduler='tcp://10.120.40.61:57283' processes=0 cores=0>
In [7]: print(cluster.job_script())
#!/bin/bash
#PBS -N dask-worker
#PBS -A DaskjobqueueTest
#PBS -l select=1:ncpus=24:mem=120GB
#PBS -l walltime=01:00:00
/work/ADM/hpc/eynardbg/Outils/miniconda3/envs/dask_dev/bin/dask-worker tcp://10.120.40.61:57283 --nthreads 2 --nprocs 12 --memory-limit 10GB --name dask-worker-2 --death-timeout 60One last thing, it is a bit weird to find the config file all commented by default, but I understand it is required for other mecanisms to work. |
|
Thanks for going through things @guillaumeeb ! Right, so currently we check the config at import time in code like the following: def __init__(self,
name=dask.config.get('jobqueue.name'),
threads=dask.config.get('jobqueue.threads'),
processes=dask.config.get('jobqueue.processes'),
memory=dask.config.get('jobqueue.memory'),
interface=dask.config.get('jobqueue.interface'),
death_timeout=dask.config.get('jobqueue.death-timeout'),
local_directory=dask.config.get('jobqueue.local-directory'),
extra=dask.config.get('jobqueue.extra'),
env_extra=dask.config.get('jobqueue.env-extra'),
**kwargs
):So if you update the config then our defaults won't update. Instead, we could do something like the following: def __init__(self,
name=None,
threads=None,
processes=None,
...
):
if name is None:
name = dask.config.get('jobqueue.name')
if threads is None:
threads = dask.config.get('jobqueue.threads')
...You would still have to explicitly call I think that the active question here is how much we care about refreshing the configuration within a single session, vs the code cleanliness of keeping config items in the function definition. I think that this choice depends on how frequently people change their configuration. |
I agree, and I tend to prefer the code cleanliness! Configuration should not change a lot, and in this case one should use directly keyword args. |
Yeah, not very useful in this particular case. Different subprojects may make different choices here. It's not yet clear what common practice will be. |
|
Maybe you should add something in the dask documentation for the configuration refresh part. Telling this is not always effective, and the safer way is to restart the python process? |
|
Just to say, I much prefer the current behavior where the defaults are determined at import. I like this because it then gives function signatures (tab expansion in jupyter notebooks for instance) that reflect the config defaults. These are a useful template for new (and old) users. |
|
OK, it sounds like we resolve this with @guillaumeeb 's suggestion to add to the documentation and docstring. |
|
Are there other things to address here? Should we start depending on git-master in tests and merge this? Should we wait for more people to test it? |
|
Hm, I don't quite believe you will have other testers 🙂 ... Maybe @lesteve if he's around tomorrow? I'm fine with merging this. |
|
I'm also fine with merging now if @lesteve doesn't have any specific comments. |
This builds on dask/dask#3432 to centralize configuration within Dask. This adds a new jobqueue.yaml config file that allows users to specify configuration in a file and have that be used across all of their use of the dask-jobqueue project. This is also something that might allow IT to help standardize configuration options for a given cluster.
|
Quick question that just occurred to me, once merged, is this going to pin us to dask >= 0.18? Or does |
|
It will pin us to dask >= 0.18
…On Wed, May 16, 2018 at 5:33 PM, Joe Hamman ***@***.***> wrote:
Quick question that just occurred to me, once merged, is this going to pin
us to dask >= 0.18? Or does dask.config.get() already exist to some
extent?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#48 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszPPL74DrfaY3aS5CMRziha703vkNks5tzJs1gaJpZM4Tsyl4>
.
|
|
Merging this in four hours if there are no further comments. |
|
Just one comment: from
from dask_jobqueue import FooCluster
cluster = FooCluster(...)
# edit the config file
cluster = FooCluster(...) # new settings should be taken into accountI agree that the signature with |
|
I agree that this would be nice behavior to have. I'm not sure how best to accomplish it. Suggestions here would be welcome. |

In dask/dask#3432 we're considering centralizing configuration in Dask projects. This PR is a proof of concept for what that would mean within a project like dask-jobqueue. This could use feedback both from dask/dask-jobqueue maintainers on what would be most comfortable for this user community, and also from dask/dask maintainers on how this constrains central development.
This adds a new jobqueue.yaml config file that allows users to specify
configuration in a file and have that be used across all of their use of the
dask-jobqueue project. This is also something that might allow IT to help
standardize configuration options for a given cluster.