Template job scripts with Jinja#370
Conversation
5330e1b to
fcb2d58
Compare
|
I don't have the time to check this right now, but I just wanted to say thanks for working on this!! |
|
@guillaumeeb no problem at all. I understand that PRs of this size require a lot of effort on the part of maintainers! It looks like the failing SGE job is due to problem with installing some packages from apt. Could someone restart that job to see if that fixes things? I don't think the failure is related to anything in this PR. |
|
Thanks a lot, as for @guillaumeeb I may not have a lot of time to look at this in detail before some time. What I would like in an ideal world:
|
|
Hmmm I have restarted the build and the SGE error is similar. Probably something to look into ... |
|
OK restarting the SGE job a few more times fixed the issue ... Note this intermittent issue has been seen on master from time to time, e.g. https://travis-ci.org/dask/dask-jobqueue/jobs/617519102. |
|
Thanks @lesteve! I know this is a big PR and will take some time to review. I currently have access to a PBS cluster and will test this out "in the wild." It'd be great if anyone with access to systems running Condor, LSF, or OAR could give this a go as well. I'm sure there are some hangups related to job script formatting that need to be worked out. |
|
Just wanted to check back in and see if anyone has had time to test this out or look it over 🙂 Happy to incorporate any feedback! If accepted, is the plan still to include this in the 0.8 release and is there a rough timeline for that release? |
|
Hey @wtbarnes, we're not forgetting you and are again really happy you took a stab at this. I guess the first planning describe by @lesteve above got a bit delayed, we're probably too busy currently to make the small progress we wanted to achieve. It is thus difficult to give you a timeline as we're already late... We definitely plan to take a closer look at this one for the 0.8 release. In the mean time and if you feel like it, you could take a look at the few issues identified by @lesteve for a 0.7.1 release and try to fix them? I think there are easy enough to fix (but may be wrong). |
guillaumeeb
left a comment
There was a problem hiding this comment.
A first review with a couple of questions:
- Shouldn't we add jinja2 in the requirement file?
- If a user wants to customize the jinja templates for any reason (which is the final goal here, if a cluster job queuing system has some specific quirk, we want a elegant solution to bypass it), how is that done? Maybe this is already feasible by dropping a new template somewhere, or we need to add a way to override the default templates. In any case this should be documented somewhere in the docs. Can you work on this?
Thanks for the quick response @guillaumeeb. I'm not really in any hurry here, just wanted to check in. I know that release timelines are always a bit optimistic 😄, especially when all the work is done by volunteers!
Sure, I can try to look these over. These would be just those issues/PRs milestoned for 0.7.1? |
Yes! I meant to do this.
Jinja allows you to specify multiple possible template loaders. In my current implementation, the templates are only being loading from the package, This can be easily modified to accept custom templates that override the default choices, loader = ChoiceLoader([
DictLoader(custom_templates),
PackageLoader("dask-jobqueue", "templates")
])
env = Environment(loader=loader)where The bigger question is what is the best entry point for the user to specify these templates? The Jinja environment lives on the My personal opinion is that this could probably be its own PR as it is complicated enough and really tackles a separate (though certainly related) issue. However, if people would rather see these changes all merged in at once, I'm happy to give that a go as well. |
Okay, that sounds fair! |
|
@wtbarnes sorry I haven't had time to look at this one indeed ... This is the kind of PR that is quite hard to review because the diff is quite sizeable. One thing that would make me a bit a lot more relax about this PR would be if you could generate some reference submission scripts (with If you manage to get something going along these lines, feel free to post a snippet here! |
|
Also just for completeness, my current dask-jobqueue priority is to try to make some progress towards 0.7.1 in the next few weeks! |
|
I think we should take another look at this one. There's been some little change on the master branch code. Would you be willing to update your PR @wtbarnes? |
|
Sorry, I've not had the time to come back to this in a while! I see there are quite a few conflicts now with several of the different scheduler files. I'm happy to rebase this against the master branch and try to fix those if that is the preferred workflow, |
|
@wtbarnes yes please! :) |
|
Ok I've rebased this against master and added Jinja2 to the requirements file, but now it seems all of the tests are failing so I've a bit more work to do to get this into good shape. |
|
@wtbarnes sorry I introduced a conflict by merging a PR. Here is my current thinking:
Sorry this is taking such a long time, but I am afraid this PR is not trivial to review at all and I don't have that much time for the Dask-Jobqueue maintenance these days ... Something that would make this PR easier to review as I mentioned a few months ago (already? 😓)
|
4d181fe to
26a0e70
Compare
|
@wtbarnes if you're stiull around, this would be the good time to finish this one! |
|
I've just done a rebase against I'm not sure what the merge policy is on this repo, but this should probably be squash-merged EDIT: It looks like there are quite a few test failures here, likely because of things that have not been included in the new jinja templates. @sdtaylor I wonder if your proposed changes will take care of those? Re: the seemingly extraneous |
|
for the |
|
I made some small adjustments here to drop JOB_ID and fix the interface double entry https://github.com/sdtaylor/dask-jobqueue/tree/jinja-templates. I don't have write permissions to this repo so if someone can merge them. With those changes the current comparison looks pretty good. The only minor thing is various newlines where entries in the jinja template are skipped: |
|
I've removed the |
|
Thanks everyone for reviving this one!! I've got two questions:
|
I believe so though I would defer to @sdtaylor or @lesteve as well
I think this old comment is still relevant: #370 (comment). It is very easy to modify the Jinja |
|
Yes, I feel it's ready for review. |
guillaumeeb
left a comment
There was a problem hiding this comment.
This looks good, but I have some comments on potentially simplifying things, or relying more on templates.
| return env | ||
|
|
||
|
|
||
| def env_lines_to_dict(env_lines): |
There was a problem hiding this comment.
Couldn't this be done directly in the template file?
| return result | ||
|
|
||
|
|
||
| def set_ncpus(ncpus, worker_cores, logger): |
There was a problem hiding this comment.
Same for LSF, couldn't we code this in the template file?
There was a problem hiding this comment.
It's possible though the templating logic can easily get messy and it is often easier and cleaner to offload more complicated logic to filters (which is exactly what is done here). Additionally, I don't think you can easily log things from within a Jinja template.
There was a problem hiding this comment.
Yeah, I was thinking of removing the log, which don't seem really useful to me.
But actually, in this case, we could have kept the ncpus and mem settings in the __init__ method, don't we?
There was a problem hiding this comment.
Definitely agree regarding ncpus and mem. They're so simple they should just be in the __init__.
The format_memory function should probably remain as a filter because of the complexity in lsf_format_bytes_ceil and lsf_detect_units
| return "%dB" % n | ||
|
|
||
|
|
||
| def pbs_format_resource_spec(resource_spec, worker_cores, worker_memory): |
There was a problem hiding this comment.
Same for PBS, in the template file?
There was a problem hiding this comment.
See above comment on the LSF templating/filtering
There was a problem hiding this comment.
Just giving it a try here. I think in the template it would look something like:
{% if resource_spec is not none -%}
#PBS -l {{ resource_spec }}
{% else -%}
#PBS -l select=1:ncpus={{ worker_cores }}:mem={{ worker_memory | pbs_format_bytes_ceil }}
{% endif -%}I like that because it's clear with a single look at the template file what is done. Thoughts?
There was a problem hiding this comment.
Agreed! That's much more simple. And then pbs_format_bytes_ceil would just need to be added as a filter.
I wonder if we should filter on worker_cores that ensures that it is an integer?
#PBS -l select=1:ncpus={{ worker_cores | round | int }}:mem={{ worker_memory | pbs_format_bytes_ceil }}
|
Unfortunately I don't have much time these days to invest in If anyone would like to revive this, please feel free to reopen. |
|
Thanks a lot for this work which definitely looked useful and sorry I never managed to find the time to review it fully 😢 |
This PR addresses #133 and is a first pass at using Jinja2 to template job submission scripts. See also #338. In general, the approach I've used here is to store the templates as separate files in
templates/such that there is a separate template for the worker command, the top level job script, and then a separate file for the job header for each scheduler. Rather than appending strings to a list, all of the formatting logic is now contained in the templates using the Jinja syntax.Thus far, I've modified the top level job script and the worker command to use Jinja2. The following job scheduler headers have been retemplated with Jinja:
Closes #133, closes #321.