Skip to content

Refactor PackageInstall into a generic InstallPlugin#8343

Merged
hendrikmakait merged 9 commits into
dask:mainfrom
hendrikmakait:packageinstall-restarts-from-nannyplugin
Nov 17, 2023
Merged

Refactor PackageInstall into a generic InstallPlugin#8343
hendrikmakait merged 9 commits into
dask:mainfrom
hendrikmakait:packageinstall-restarts-from-nannyplugin

Conversation

@hendrikmakait

@hendrikmakait hendrikmakait commented Nov 14, 2023

Copy link
Copy Markdown
Member

Blocked by #8342

Refactors PackageInstall into a generic InstallPlugin that takes an Installer instance a Callable as input; also introduces a NannyPlugin that is used if workers should be restarted.

This could potentially be refactored into a generic RunPlugin that executes a callable on the cluster. For this, we should probably make the locking mechanism configurable. Right now, we use a host-based lock, I can imagine that a cluster-wide one and lockless mode would also be useful.

  • Tests added / passed
  • Passes pre-commit run --all-files

@github-actions

github-actions Bot commented Nov 14, 2023

Copy link
Copy Markdown
Contributor

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       27 files  ±  0         27 suites  ±0   13h 6m 37s ⏱️ + 48m 39s
  3 932 tests  -   2    3 806 ✔️  - 10     117 💤 ±0    9 +  8 
49 462 runs   - 26  47 037 ✔️  - 37  2 413 💤 +1  12 +10 

For more details on these failures, see this check.

Results for commit ebb4988. ± Comparison against base commit eeec6ce.

This pull request removes 10 and adds 8 tests. Note that renamed tests count towards both.
distributed.diagnostics.tests.test_packageinstall ‑ test_conda_install
distributed.diagnostics.tests.test_packageinstall ‑ test_conda_install_fails_on_returncode
distributed.diagnostics.tests.test_packageinstall ‑ test_conda_install_fails_when_conda_not_found
distributed.diagnostics.tests.test_packageinstall ‑ test_conda_install_fails_when_conda_raises
distributed.diagnostics.tests.test_packageinstall ‑ test_package_install_failing_does_not_restart_on_nanny
distributed.diagnostics.tests.test_packageinstall ‑ test_package_install_installs_once_when_reregistered
distributed.diagnostics.tests.test_packageinstall ‑ test_package_install_installs_once_with_multiple_workers
distributed.diagnostics.tests.test_packageinstall ‑ test_package_install_restarts_on_nanny
distributed.diagnostics.tests.test_packageinstall ‑ test_pip_install
distributed.diagnostics.tests.test_packageinstall ‑ test_pip_install_fails
distributed.diagnostics.tests.test_install_plugin ‑ test_conda_install
distributed.diagnostics.tests.test_install_plugin ‑ test_conda_install_fails_on_returncode
distributed.diagnostics.tests.test_install_plugin ‑ test_conda_install_fails_when_conda_not_found
distributed.diagnostics.tests.test_install_plugin ‑ test_conda_install_fails_when_conda_raises
distributed.diagnostics.tests.test_install_plugin ‑ test_package_install_failing_does_not_restart_on_nanny
distributed.diagnostics.tests.test_install_plugin ‑ test_package_install_restarts_on_nanny
distributed.diagnostics.tests.test_install_plugin ‑ test_pip_install
distributed.diagnostics.tests.test_install_plugin ‑ test_pip_install_fails

♻️ This comment has been updated with latest results.

@hendrikmakait hendrikmakait changed the title Use NannyPlugin when restarting workers for PackageInstall plugin Refactor PackageInstall into a generic InstallPlugin Nov 15, 2023
@hendrikmakait hendrikmakait changed the title Refactor PackageInstall into a generic InstallPlugin Refactor PackageInstall into a generic SetupPlugin Nov 15, 2023
@hendrikmakait hendrikmakait changed the title Refactor PackageInstall into a generic SetupPlugin Refactor PackageInstall into a generic InstallPlugin Nov 15, 2023
loop=worker.loop,
)
):
if not await self._is_installed(worker):

@hendrikmakait hendrikmakait Nov 15, 2023

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have dropped all of these elaborate checks in favor of the requirement for install_fn() to be idempotent. This makes the plugin a lot simpler and this logic would still not have caught some issues like flaky installs, etc.

PipInstall
"""

idempotent = True

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread distributed/__init__.py
Environ,
InstallPlugin,
NannyPlugin,
PackageInstall,

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to consider a deprecation cycle here, but PackageInstall was never truly public since _PackageInstaller was never public.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's fine

@hendrikmakait

Copy link
Copy Markdown
Member Author

Update: I've refactored this so that InstallPlugin takes a Callable as input instead of the customer Installer.

Comment on lines +510 to +516
await Semaphore(
max_leases=1,
name=socket.gethostname(),
register=True,
scheduler_rpc=worker.scheduler,
loop=worker.loop,
)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated to the change in this PR but reviewing this again I wonder if a lock file wouldn't be much simpler.
We're typically using locket for this, e.g.

return locket.lock_file(self._global_lock_path, **kwargs)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I hadn't thought of this. This would also solve the issue that using a Semaphore from the scheduler isn't straightforward (if not even impossible).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave this to a follow-up PR though.

@fjetter fjetter left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high level, LGTM

One functional question aobut the semaphore but this is unrelated and could be broken out in a follow up

@hendrikmakait hendrikmakait marked this pull request as ready for review November 15, 2023 15:05
Comment thread distributed/__init__.py
Environ,
InstallPlugin,
NannyPlugin,
PackageInstall,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's fine

@hendrikmakait hendrikmakait merged commit 412a0a9 into dask:main Nov 17, 2023
@hendrikmakait hendrikmakait deleted the packageinstall-restarts-from-nannyplugin branch November 17, 2023 11:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants