Add a nested folder for storing jobqueue log file, use separate files… by guillaumeeb · Pull Request #145 · dask/dask-jobqueue

guillaumeeb · 2018-08-30T09:09:31Z

… for each job with LSF and SLURM

Closes #141

… for each job with LSF and SLURM

lesteve

A few comments, I have not a super huge fan of the cwd approach ... I don't really have a better suggestion.

You could imagine having a "degraded" behaviour for PBSCluster (essentially you can not specify a log_directory).

Slightly unrelated question, is PBSCluster supposed to work on a Torque cluster? It seems like Torque supports variable expansion according to http://docs.adaptivecomputing.com/torque/4-0-2/Content/topics/commands/qsub.htm.

lesteve · 2018-09-01T05:33:26Z

+        if log_directory is not None:
+            self.log_directory = log_directory
+            if not os.path.exists(self.log_directory):
+                os.mkdir(self.log_directory)


Slightly better to use os.makedirs (i.e. the same as mkdir -p)

lesteve · 2018-09-01T06:01:15Z

    def _submit_job(self, script_filename):
-        return self._call(shlex.split(self.submit_command) + [script_filename])
+        return self._call(shlex.split(self.submit_command) + [script_filename],
+                          cwd=self.log_directory)


Hmmm, this may have unintended side-effects, although I am not 100% sure ... for example if I specify some relative path (e.g. for data files) in my notebook they may not be found in the dask-worker process.

Is it not possible to pass a folder to -e and -o (SGE allows this but maybe this is very specific to SGE ...)?

I guess if I understand #141 (comment), there is no way to have variable expansion in PBS directive. I googled a bit and found http://community.pbspro.org/t/pbs-jobid-variable/176 which does not seem to leave too much hope ... I have seen "solutions" that used qalter after the submission e.g. this maybe that's an option but that seems quite complicated ...

Just curious, so what do people do with PBSPro? They don't set the stdout/stderr file, they remember to change it each time with a different name?

Yes you are right, this may not be the best approach...

You've understood the problem with PBS, using -e or -o is possible but quite limited. Default output file name is $JOB_NAME.o$JOB_ID. So usually I just leave this and don't set anything.

Another solution is to just specify a folder. In this case PBS will fill it with $JOB_ID.OUT files.

Another solution is to just specify a folder. In this case PBS will fill it with $JOB_ID.OUT files.

If a specifying a folder is possible I think I would just do that for PBS (and SGE as well).

lesteve · 2018-09-01T06:09:05Z

            cluster._job_id_from_submit_output(return_string)
+
+
+def test_log_directory():


Nice pytest trick: you can use the tmpdir fixture. This gives you a unique folder on each run but pytest keeps it after the execution, allowing you to look at it in case something goes wrong. pytest keeps the last N (maybe 5) tmpdir folders IIRC.

def test_log_directory(tmpdir): with PBSCluster(..., log_directory=tmpdir.strpath as cluster: ...

lesteve · 2018-09-01T06:09:11Z

            header_lines.append('#SBATCH -J %s' % self.name)
-            header_lines.append('#SBATCH -e %s.err' % self.name)
-            header_lines.append('#SBATCH -o %s.out' % self.name)
+            header_lines.append('#SBATCH -e %s-%%j.err' % self.name)


Just curious, I have wondered before, why not keep the default name? Is the default name not something very similar?

It was my first solution too, but I found that Slurm uses slurm-$JOB_ID.out as default. This is not too bad, but it misses the job name... So I'm hesitating here.

guillaumeeb · 2018-09-01T08:59:58Z

So you're right, maybe cwd is not the best choice, and I've missed some degraded cases too. I will try to fix that.

For Torque vs PBS, they were the same a long time ago, but they've diverged so are not fully compatible now. It maybe needed to implement a TorqueCluster at some point.

guillaumeeb · 2018-09-05T14:29:35Z

So I used the -e and -o solution for specifying a folder. My only concern is that it adds some complexity (3 lines) in every JobQueueCluster implementation.

Should we leave it to user setting job_extra instead (with some documentation maybe ?), and removing -e and -o from every implementation?

guillaumeeb · 2018-10-16T08:03:47Z

I'm gonna merge this one soon.

lesteve · 2018-10-23T02:39:34Z

I just saw this now, very nice to see this merged!

guillaumeeb · 2018-10-23T07:59:44Z

Thanks, I had doubts with this one, but it's really nice to not have undreds of output files inside my notebook folder!

Add a nested folder for storing jobqueue log file, use separate files…

966d8e1

… for each job with LSF and SLURM

lesteve reviewed Sep 1, 2018

View reviewed changes

guillaumeeb mentioned this pull request Sep 2, 2018

Change stdout and stderr default setting in LSF and Slurm cluster implementation #141

Closed

Use -e and -o for log folder instead of Popen(cwd)

5335ea3

guillaumeeb mentioned this pull request Oct 15, 2018

0.4.1 release #174

Closed

small fix on remaining lsf cwd arg

3710d36

Trigger CI

a276008

guillaumeeb merged commit b510bb1 into dask:master Oct 16, 2018

		cluster._job_id_from_submit_output(return_string)


		def test_log_directory():

Uh oh!

Uh oh!

Conversation

guillaumeeb commented Aug 30, 2018

Uh oh!

lesteve left a comment

Choose a reason for hiding this comment

Uh oh!

lesteve Sep 1, 2018

Choose a reason for hiding this comment

Uh oh!

lesteve Sep 1, 2018

Choose a reason for hiding this comment

Uh oh!

guillaumeeb Sep 1, 2018

Choose a reason for hiding this comment

Uh oh!

lesteve Sep 1, 2018

Choose a reason for hiding this comment

Uh oh!

lesteve Sep 1, 2018

Choose a reason for hiding this comment

Uh oh!

lesteve Sep 1, 2018

Choose a reason for hiding this comment

Uh oh!

guillaumeeb Sep 1, 2018

Choose a reason for hiding this comment

Uh oh!

guillaumeeb commented Sep 1, 2018

Uh oh!

guillaumeeb commented Sep 5, 2018

Uh oh!

guillaumeeb commented Oct 16, 2018

Uh oh!

lesteve commented Oct 23, 2018

Uh oh!

guillaumeeb commented Oct 23, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants