-
-
Notifications
You must be signed in to change notification settings - Fork 150
Add scheduler_options to pass scheduler-specific parameters #384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
56fe96c
bb9de95
21b2487
7acaae8
e30a0fa
2d20ad5
82c84f7
deb4712
8bfaee9
6c6d01f
0a9d7c0
b203326
d215810
e5eeab6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,9 +1,17 @@ | ||
| #!/bin/bash | ||
|
|
||
| docker-compose up --build -d | ||
|
|
||
| while [ `./register_cluster.sh 2>&1 | grep "sacctmgr: error" | wc -l` -ne 0 ] | ||
| do | ||
| echo "Waiting for SLURM cluster to become ready"; | ||
| sleep 2 | ||
| done | ||
| echo "SLURM properly configured" | ||
|
|
||
| # On some clusters the login node does not have the same interface as the | ||
| # compute nodes. The next three lines allow to test this edge case by adding | ||
| # separate interfaces on the worker and on the scheduler nodes. | ||
| docker exec slurmctld ip addr add 10.1.1.20/24 dev eth0 label eth0:scheduler | ||
| docker exec c1 ip addr add 10.1.1.21/24 dev eth0 label eth0:worker | ||
| docker exec c2 ip addr add 10.1.1.22/24 dev eth0 label eth0:worker |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -15,7 +15,7 @@ | |
| from distributed.deploy.spec import ProcessInterface, SpecCluster | ||
| from distributed.deploy.local import nprocesses_nthreads | ||
| from distributed.scheduler import Scheduler | ||
| from distributed.utils import format_bytes, parse_bytes, tmpfile, get_ip_interface | ||
| from distributed.utils import format_bytes, parse_bytes, tmpfile | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
@@ -30,7 +30,11 @@ | |
| By default, ``process ~= sqrt(cores)`` so that the number of processes | ||
| and the number of threads per process is roughly the same. | ||
| interface : str | ||
| Network interface like 'eth0' or 'ib0'. | ||
| Network interface like 'eth0' or 'ib0'. This will be used both for the | ||
| Dask scheduler and the Dask workers interface. If you need a different | ||
| interface for the Dask scheduler you can pass it through | ||
| the ``scheduler_options`` argument: | ||
| ``interface=your_worker_interface, scheduler_options={'interface': your_scheduler_interface}``. | ||
| nanny : bool | ||
| Whether or not to start a nanny process | ||
| local_directory : str | ||
|
|
@@ -69,8 +73,12 @@ | |
| Whether or not to run this cluster object with the async/await syntax | ||
| security : Security | ||
| A dask.distributed security object if you're using TLS/SSL | ||
| dashboard_address : str or int | ||
| An address like ":8787" on which to host the Scheduler's dashboard | ||
| scheduler_options : dict | ||
| Used to pass additional arguments to Dask Scheduler. For example use | ||
| ``scheduler_options={'dasboard_address': ':12435'}`` to specify which | ||
| port the web dashboard should use or ``scheduler_options={'host': 'your-host'}`` | ||
| to specify the host the Dask scheduler should run on. See | ||
| :class:`distributed.Scheduler` for more details. | ||
| """.strip() | ||
|
|
||
|
|
||
|
|
@@ -202,9 +210,6 @@ def __init__( | |
|
|
||
| if interface: | ||
| extra = extra + ["--interface", interface] | ||
| kwargs.setdefault("host", get_ip_interface(interface)) | ||
| else: | ||
| kwargs.setdefault("host", "") | ||
|
|
||
| # Keep information on process, cores, and memory, for use in subclasses | ||
| self.worker_memory = parse_bytes(memory) if memory is not None else None | ||
|
|
@@ -420,13 +425,15 @@ def __init__( | |
| silence_logs="error", | ||
| name=None, | ||
| asynchronous=False, | ||
| # Scheduler keywords | ||
| interface=None, | ||
| # Scheduler-only keywords | ||
| dashboard_address=None, | ||
| host=None, | ||
| scheduler_options=None, | ||
| # Options for both scheduler and workers | ||
| interface=None, | ||
| protocol="tcp://", | ||
| dashboard_address=":8787", | ||
| config_name=None, | ||
| # Job keywords | ||
| config_name=None, | ||
| **kwargs | ||
| ): | ||
| self.status = "created" | ||
|
|
@@ -447,22 +454,47 @@ def __init__( | |
| ) | ||
| ) | ||
|
|
||
| if dashboard_address is not None: | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not allowing to pass Also if we keep If you think that's too harsh, I can make it a warning until the 0.8 release!
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What feel's weird here is to keep the I'm not really familliar with Python best practices in this case.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can see two solutions:
I tend to think 1. is easier because you don't need the combining logic between Also my feeling is that people ignore warnings until it breaks so that 2. is not that helpful for users in the end.
It completely depends on the project. For example, in scikit-learn we have a strong backward-compatibility policy: we would have warnings for two major releases before we can break user code. I feel in |
||
| raise ValueError( | ||
| "Please pass 'dashboard_address' through 'scheduler_options': use\n" | ||
| 'cluster = {0}(..., scheduler_options={{"dashboard_address": ":12345"}}) rather than\n' | ||
| 'cluster = {0}(..., dashboard_address="12435")'.format( | ||
| self.__class__.__name__ | ||
| ) | ||
| ) | ||
|
|
||
| if host is not None: | ||
| raise ValueError( | ||
| "Please pass 'host' through 'scheduler_options': use\n" | ||
| 'cluster = {0}(..., scheduler_options={{"host": "your-host"}}) rather than\n' | ||
| 'cluster = {0}(..., host="your-host")'.format(self.__class__.__name__) | ||
| ) | ||
|
|
||
| default_config_name = self.job_cls.default_config_name() | ||
| if config_name is None: | ||
| config_name = default_config_name | ||
|
|
||
| if interface is None: | ||
| interface = dask.config.get("jobqueue.%s.interface" % config_name) | ||
| if scheduler_options is None: | ||
| scheduler_options = {} | ||
|
|
||
| default_scheduler_options = { | ||
| "protocol": protocol, | ||
| "dashboard_address": ":8787", | ||
| "security": security, | ||
| } | ||
| # scheduler_options overrides parameters common to both workers and scheduler | ||
| scheduler_options = dict(default_scheduler_options, **scheduler_options) | ||
|
|
||
| # Use the same network interface as the workers if scheduler ip has not | ||
| # been set through scheduler_options via 'host' or 'interface' | ||
| if "host" not in scheduler_options and "interface" not in scheduler_options: | ||
| scheduler_options["interface"] = interface | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should'nt we just put interface in the Or is this because of a problem in distributed if host and interface do not match?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I should probably test this edge case indeed. My line of reasoning is like this (as I tried to explain in the comment but the wording can probably be improved ...): |
||
|
|
||
| scheduler = { | ||
| "cls": Scheduler, # Use local scheduler for now | ||
| "options": { | ||
| "protocol": protocol, | ||
| "interface": interface, | ||
| "host": host, | ||
| "dashboard_address": dashboard_address, | ||
| "security": security, | ||
| }, | ||
| "options": scheduler_options, | ||
| } | ||
|
|
||
| kwargs["config_name"] = config_name | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After having looked a bit more in details, I am reasonably sure that
hostin kwargs is not used at all, so this code can be safely removed. In other words this is another manifestation of #386.This code was introduced in #135. At that time the
kwargswere passed toLocalCluster. Nowadays the kwargs are passed to thejob_cls. This does not make any sense to passhost=somethingto the worker arguments wheresomethingis computed on the node the scheduler runs (similar comment as #382 (comment)) ...