Refactor PackageInstall into a generic InstallPlugin#8343
Conversation
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 27 files ± 0 27 suites ±0 13h 6m 37s ⏱️ + 48m 39s For more details on these failures, see this check. Results for commit ebb4988. ± Comparison against base commit eeec6ce. This pull request removes 10 and adds 8 tests. Note that renamed tests count towards both.♻️ This comment has been updated with latest results. |
NannyPlugin when restarting workers for PackageInstall pluginPackageInstall into a generic InstallPlugin
PackageInstall into a generic InstallPluginPackageInstall into a generic SetupPlugin
PackageInstall into a generic SetupPluginPackageInstall into a generic InstallPlugin
| loop=worker.loop, | ||
| ) | ||
| ): | ||
| if not await self._is_installed(worker): |
There was a problem hiding this comment.
I have dropped all of these elaborate checks in favor of the requirement for install_fn() to be idempotent. This makes the plugin a lot simpler and this logic would still not have caught some issues like flaky installs, etc.
| PipInstall | ||
| """ | ||
|
|
||
| idempotent = True |
| Environ, | ||
| InstallPlugin, | ||
| NannyPlugin, | ||
| PackageInstall, |
There was a problem hiding this comment.
We may want to consider a deprecation cycle here, but PackageInstall was never truly public since _PackageInstaller was never public.
|
Update: I've refactored this so that |
| await Semaphore( | ||
| max_leases=1, | ||
| name=socket.gethostname(), | ||
| register=True, | ||
| scheduler_rpc=worker.scheduler, | ||
| loop=worker.loop, | ||
| ) |
There was a problem hiding this comment.
unrelated to the change in this PR but reviewing this again I wonder if a lock file wouldn't be much simpler.
We're typically using locket for this, e.g.
distributed/distributed/diskutils.py
Line 161 in e98dcb1
There was a problem hiding this comment.
Good point, I hadn't thought of this. This would also solve the issue that using a Semaphore from the scheduler isn't straightforward (if not even impossible).
There was a problem hiding this comment.
I'd leave this to a follow-up PR though.
fjetter
left a comment
There was a problem hiding this comment.
high level, LGTM
One functional question aobut the semaphore but this is unrelated and could be broken out in a follow up
| Environ, | ||
| InstallPlugin, | ||
| NannyPlugin, | ||
| PackageInstall, |
Blocked by #8342Refactors PackageInstall into a generic
InstallPluginthat takesan Installer instancea Callable as input; also introduces aNannyPluginthat is used if workers should be restarted.This could potentially be refactored into a genericRunPluginthat executes a callable on the cluster. For this, we should probably make the locking mechanism configurable. Right now, we use a host-based lock, I can imagine that a cluster-wide one and lockless mode would also be useful.pre-commit run --all-files