Skip to content

dump temporary security contexts to file in specified shared temporary directory for use in worker submit scripts#524

Merged
guillaumeeb merged 19 commits into
dask:mainfrom
riedel:in_memory_security_contexts_520
Dec 4, 2021
Merged

dump temporary security contexts to file in specified shared temporary directory for use in worker submit scripts#524
guillaumeeb merged 19 commits into
dask:mainfrom
riedel:in_memory_security_contexts_520

Conversation

@riedel

@riedel riedel commented Oct 20, 2021

Copy link
Copy Markdown
Member

Fixes #520
Closes #534

@riedel riedel changed the title dump temporary security contexts to file dump temporary security contexts to file for use in worker submit scripts Oct 20, 2021
@riedel

riedel commented Oct 20, 2021

Copy link
Copy Markdown
Member Author

@guillaumeeb: due to your concerns, I extracted this one from #519 to look at it separately. I guess the biggest question is how to best create temporary files.

One maybe better way may be to create a whole directory together with the job script, which also needs to live somewhere.

One could also try to somehow inject this in the scheduler scripts. E.g. if a bash wrapper is used one could use process substitution . But this seems to be a bit complicated to me.

I stuck with this one as it is rather minimal.

@riedel riedel added bug Something isn't working needs triage This can be kept if the triager is unsure which next steps to take all job schedulers labels Oct 23, 2021
@guillaumeeb

Copy link
Copy Markdown
Member

@guillaumeeb: due to your concerns, I extracted this one from #519 to look at it separately. I guess the biggest question is how to best create temporary files.

That's a good idea! And I agree with the main concern.

I'll add a review.

@guillaumeeb guillaumeeb left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for the work.

@jacobtomlinson we could really use your eyes on this one. Does this file generation looks good to you, and security compliant?

in #496 you said:

I think only supporting file certificates in dask-jobqueue would be reasonable. In places like dask-cloudprovider we are limited to passing the certs in the Dask config because we can't guarantee there will be a shared filesystem. But for HPC I think most cases there will be one.

Did you mean by generating a temporary file? And do you think that in-memory context would be feasible for dask-jobqueue by serialization?

there is also some more information in #520.

Comment thread dask_jobqueue/core.py Outdated
@riedel

riedel commented Oct 30, 2021

Copy link
Copy Markdown
Member Author

Forget about the deprecation: accidentally looked at the wrong file. Thought tmpfile was finally deprecated as I remembered that there was a discussion in dask#7800 that it was not needed anymore due to the new python tmpfile, which I misunderstood

So I guess it makes sense then also to use the built-in tmpfile in this PR. I cannot remember why I didn't do that.

@riedel riedel marked this pull request as draft October 31, 2021 06:41
@riedel riedel removed the needs triage This can be kept if the triager is unsure which next steps to take label Oct 31, 2021
@riedel riedel added the blocked Awaiting the output of some other work label Oct 31, 2021
@riedel

riedel commented Oct 31, 2021

Copy link
Copy Markdown
Member Author

I think #532 should be completed/merged first. Without any testing this is far to shaky...

@guillaumeeb

Copy link
Copy Markdown
Member

So instead of #532, I should look into #533 if I understand?

@riedel

riedel commented Nov 27, 2021

Copy link
Copy Markdown
Member Author

Moved test across all clusters here: #537 (will fail for current implementation)

@riedel riedel force-pushed the in_memory_security_contexts_520 branch from 8cd6a59 to c7174bb Compare November 27, 2021 22:11
@riedel

riedel commented Nov 28, 2021

Copy link
Copy Markdown
Member Author

Now, it basically works, the only problem is identifying a writable shared directory for testing. Local still works, htcondor and slurm xfail due to the missing shared filesystem, sge works, however, pbs seems has the permissions on the test directory set as non-writable: what would a better place to write the temporary files?

Second question: Does it make sense to default to the home dir as a shared directory? I guess most clusters work like this, however e.g. the CI does not...

@riedel

riedel commented Nov 28, 2021

Copy link
Copy Markdown
Member Author

BTW: the last remaining error shows that the permissions on the temporary certificate files are set in a secure manner to 0600 .

…i user

* added context handler to Job Cluster in core.py
* make source dir owned by pbsuser
* dynamically select port to fix sporadic port already in use errors
@riedel riedel force-pushed the in_memory_security_contexts_520 branch from d7ade92 to 86977af Compare November 28, 2021 17:48
@riedel

riedel commented Nov 28, 2021

Copy link
Copy Markdown
Member Author

I thought I was done: now I get an error in the unmodified
FAILED dask_jobqueue/tests/test_jobqueue_core.py::test_security[LocalCluster]
and
"distributed.scheduler - ERROR - Workers don't have promised key: ['tls://10.1.0.206:44087'], lambda-a54fa6a78ad624099ae9b275efae5945"
this is not good! I was careful not to modify this code...

and I cannot reproduce the error locally. It seems however, the error is related to previously running tests that did not clean up correctly (https://github.com/riedel/dask-jobqueue/runs/4346772070)

@riedel riedel changed the title dump temporary security contexts to file for use in worker submit scripts dump temporary security contexts to file in specified shared directory for use in worker submit scripts Nov 28, 2021
@riedel riedel force-pushed the in_memory_security_contexts_520 branch 2 times, most recently from 730cf43 to 1fdbe1d Compare November 29, 2021 11:58
@riedel riedel force-pushed the in_memory_security_contexts_520 branch from 1fdbe1d to bf0c957 Compare November 29, 2021 12:23

@riedel riedel left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a few explainations.

Comment thread ci/pbs.sh
Comment thread dask_jobqueue/core.py Outdated
Comment thread dask_jobqueue/core.py
Comment thread dask_jobqueue/core.py
Comment thread dask_jobqueue/jobqueue.yaml Outdated
@riedel riedel requested a review from guillaumeeb November 29, 2021 12:36
Comment thread dask_jobqueue/core.py Outdated

@guillaumeeb guillaumeeb left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to find an agreement on the configuration name and the default (even that is looks every one agrees for defaulting the config option to None/null).

Comment thread dask_jobqueue/core.py Outdated
Comment thread dask_jobqueue/core.py Outdated
Comment thread dask_jobqueue/core.py Outdated
Comment thread dask_jobqueue/core.py Outdated
Comment thread dask_jobqueue/jobqueue.yaml Outdated
Comment thread dask_jobqueue/tests/test_jobqueue_core.py Outdated
because it xfails anyways and the problem with the ci is unrelated
@riedel riedel changed the title dump temporary security contexts to file in specified shared directory for use in worker submit scripts dump temporary security contexts to file in specified shared temporary directory for use in worker submit scripts Dec 3, 2021
tests_required is deprecated but kept as in distributed
Comment thread dask_jobqueue/core.py Outdated

@guillaumeeb guillaumeeb left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one last suggestion, and then I think we can merge!!

Co-authored-by: Guillaume Eynard-Bontemps <g.eynard.bontemps@gmail.com>
Comment thread dask_jobqueue/core.py Outdated
@riedel riedel force-pushed the in_memory_security_contexts_520 branch from d70544e to 18da416 Compare December 4, 2021 09:57
@riedel

riedel commented Dec 4, 2021

Copy link
Copy Markdown
Member Author

Since #527 (which is the final thing I want to get merged) passes I am happy now.

@guillaumeeb guillaumeeb merged commit 6ce5a6c into dask:main Dec 4, 2021
@guillaumeeb

Copy link
Copy Markdown
Member

Thanks a lot for all the work here @riedel, I know this was long and with a lot of waiting periods.

@riedel riedel deleted the in_memory_security_contexts_520 branch December 4, 2021 14:06
@riedel

riedel commented Dec 4, 2021

Copy link
Copy Markdown
Member Author

Thank you alot, @guillaumeeb! I am just happy that this part is in now and encrypting traffic now easily works. Also @andersy005 for reviewing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

all job schedulers bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add pytest xfails for known issue in CI In memory security contexts are not supported

3 participants