Larger than memory images: performant and scalable distributed implementation for workstations and clusters#1062
Merged
carsen-stringer merged 65 commits intoMouseLand:mainfrom Feb 7, 2025
Merged
Conversation
…tributed update my local working branch with package changes, hopefully fixes logging
Merge branch 'main' of https://github.com/MouseLand/cellpose into distributed
…into distributed
Contributor
Author
…version of pytorch; should be conditional and submitted in separate PR
Support custom model
Contributor
Author
|
Recently confirmed this works with custom pre-trained models. Also works with multi-channel inputs, through creative use of the preprocessing_steps argument. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1062 +/- ##
==========================================
- Coverage 53.17% 52.69% -0.49%
==========================================
Files 18 18
Lines 4139 4302 +163
==========================================
+ Hits 2201 2267 +66
- Misses 1938 2035 +97 ☔ View full report in Codecov by Sentry. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR solves #1061 by adding
distributed_segmentation.py, a self-contained module that provides the ability to segment larger-than-memory images on a workstation or cluster. Images are partitioned into overlapping blocks that are each processed separately, in parallel or in series (e.g. if you only have a single gpu). Per-block results are seamlessly stitched into a single segmentation of the entire larger-than-memory image.Windows, Linux, and MacOS workstations, as well as LSF clusters, are automatically supported.
Other cluster managers, such as SLURM or SGE, require implementing your own dask cluster class, which is a good opportunity to submit a PR and be added to the author list. I am happy to advise anyone doing this.
The preferred input is a Zarr or N5 array, however folders full of tiff images are also supported. Single large tiff files can be converted to Zarr with the module itself. Your workstation or cluster can be arbitrarily partitioned into workers with arbitrary resources, e.g. "10 workers, 2 cpu cores each, 1 gpu each" or if you have a workstation with a single gpu, "1 worker with 8 cpu cores and 1 gpu." Computation never exceeds the given worker specification - so you can process huge datasets without occupying your entire machine.
Compatible with any Cellpose model. Small crops can be tested before committing to a big data segmentation by calling the function which runs on each individual block directly. A Foreground mask can be provided ensuring no time is wasted on voxels that do not contain sample. An arbitrary list of preprocessing steps can be distributed along with Cellpose itself, so if you need to smooth or sharpen or anything else before segmenting, you don't need to do it in advance and save a processed version of your large data - you can just distribute those preprocessing functions along with the segmentation.
Installation from scratch in a fresh conda environment tested successfully by @snoreis on a machine with the following specs:
OS: Windows 11 Pro
CPU: 16-core Threadripper PRO 3955WX
GPU: NVIDIA RTX A5000
Of course also tested in my own environments.
Workstation:
OS: Rocky Linux 9.3
CPU: 8-core Intel Sky Lake
GPU: 1x NVIDIA Tesla L4 15GB
Cluster:
OS: Rocky Linux 9.3
CPU: 100 cores Intel Sky Lake
GPU: 100x NVIDIA Tesla L4 15GB
List of functions provided, all have verbose docstrings covering all inputs and outputs:
distributed_eval: run cellpose on a big image on any machineprocess_block: the function that is run on each block from a big dataset, can be called on its own for testingnumpy_array_to_zarr: create a zarr array, preferred input todistributed_evalwrap_folder_of_tiffs: represent folder of tiff files as zarr array without duplicating dataNew dependencies are correctly set and install successfully with source:
pip install -e .[distributed]Examples
Run distributed Cellpose on half the resources of a workstation with 16 cpus, 1 gpu, and 128GB system memory:
Run distributed Cellpose on an LSF cluster with 128 GPUs (e.g. Janelia cluster)
(Note this example is identical to the previous one, with only a few small changes to the
cluster_kwargs; i.e. it is easy to go back and forth between workstations and clusters.)Testing a single block before running a distributed computation:
Wrap a folder of tiff images/tiles into a single Zarr array:
Converting a large single tiff image to Zarr: