Skip to content

Larger than memory images: performant and scalable distributed implementation for workstations and clusters#1062

Merged
carsen-stringer merged 65 commits intoMouseLand:mainfrom
GFleishman:main
Feb 7, 2025
Merged

Larger than memory images: performant and scalable distributed implementation for workstations and clusters#1062
carsen-stringer merged 65 commits intoMouseLand:mainfrom
GFleishman:main

Conversation

@GFleishman
Copy link
Copy Markdown
Contributor

This PR solves #1061 by adding distributed_segmentation.py, a self-contained module that provides the ability to segment larger-than-memory images on a workstation or cluster. Images are partitioned into overlapping blocks that are each processed separately, in parallel or in series (e.g. if you only have a single gpu). Per-block results are seamlessly stitched into a single segmentation of the entire larger-than-memory image.

Windows, Linux, and MacOS workstations, as well as LSF clusters, are automatically supported.
Other cluster managers, such as SLURM or SGE, require implementing your own dask cluster class, which is a good opportunity to submit a PR and be added to the author list. I am happy to advise anyone doing this.

The preferred input is a Zarr or N5 array, however folders full of tiff images are also supported. Single large tiff files can be converted to Zarr with the module itself. Your workstation or cluster can be arbitrarily partitioned into workers with arbitrary resources, e.g. "10 workers, 2 cpu cores each, 1 gpu each" or if you have a workstation with a single gpu, "1 worker with 8 cpu cores and 1 gpu." Computation never exceeds the given worker specification - so you can process huge datasets without occupying your entire machine.

Compatible with any Cellpose model. Small crops can be tested before committing to a big data segmentation by calling the function which runs on each individual block directly. A Foreground mask can be provided ensuring no time is wasted on voxels that do not contain sample. An arbitrary list of preprocessing steps can be distributed along with Cellpose itself, so if you need to smooth or sharpen or anything else before segmenting, you don't need to do it in advance and save a processed version of your large data - you can just distribute those preprocessing functions along with the segmentation.

Installation from scratch in a fresh conda environment tested successfully by @snoreis on a machine with the following specs:
OS: Windows 11 Pro
CPU: 16-core Threadripper PRO 3955WX
GPU: NVIDIA RTX A5000

Of course also tested in my own environments.
Workstation:
OS: Rocky Linux 9.3
CPU: 8-core Intel Sky Lake
GPU: 1x NVIDIA Tesla L4 15GB

Cluster:
OS: Rocky Linux 9.3
CPU: 100 cores Intel Sky Lake
GPU: 100x NVIDIA Tesla L4 15GB

List of functions provided, all have verbose docstrings covering all inputs and outputs:
distributed_eval : run cellpose on a big image on any machine
process_block : the function that is run on each block from a big dataset, can be called on its own for testing
numpy_array_to_zarr : create a zarr array, preferred input to distributed_eval
wrap_folder_of_tiffs : represent folder of tiff files as zarr array without duplicating data

New dependencies are correctly set and install successfully with source: pip install -e .[distributed]

Examples
Run distributed Cellpose on half the resources of a workstation with 16 cpus, 1 gpu, and 128GB system memory:

from cellpose.contrib.distributed_segmentation import distributed_eval

# parameterize cellpose however you like
model_kwargs = {'gpu':True, 'model_type':'cyto3'}
eval_kwargs = {'diameter':30,
               'z_axis':0,
               'channels':[0,0],
               'do_3D':True,
}

# define myLocalCluster parameters
cluster_kwargs = {
    'n_workers':1,    # we only have 1 gpu, so no need for more workers
    'ncpus':8,
    'memory_limit':'64GB',
    'threads_per_worker':1,
}

# run segmentation
# segments: zarr array containing labels
# boxes: list of bounding boxes around all labels
segments, boxes = distributed_eval(
    input_zarr=large_zarr_array,
    blocksize=(256, 256, 256),
    write_path='/where/zarr/array/containing/results/will/be/written.zarr',
    model_kwargs=model_kwargs,
    eval_kwargs=eval_kwargs,
    cluster_kwargs=cluster_kwargs,
)

Run distributed Cellpose on an LSF cluster with 128 GPUs (e.g. Janelia cluster)
(Note this example is identical to the previous one, with only a few small changes to the cluster_kwargs; i.e. it is easy to go back and forth between workstations and clusters.)

from cellpose.contrib.distributed_segmentation import distributed_eval

# parameterize cellpose however you like
model_kwargs = {'gpu':True, 'model_type':'cyto3'}
eval_kwargs = {'diameter':30,
               'z_axis':0,
               'channels':[0,0],
               'do_3D':True,
}

# define myLocalCluster parameters
cluster_kwargs = {
    'ncpus':2,                     # cpus per worker
    'min_workers':8,          # cluster auto allocates and releases workers based on number of blocks left to process
    'max_workers':128,
    'queue':'gpu_l4',
    'job_extra_directives':['-gpu "num=1"'],
}

# run segmentation
# segments: zarr array containing labels
# boxes: list of bounding boxes around all labels
segments, boxes = distributed_eval(
    input_zarr=large_zarr_array,
    blocksize=(256, 256, 256),
    write_path='/where/zarr/array/containing/results/will/be/written.zarr',
    model_kwargs=model_kwargs,
    eval_kwargs=eval_kwargs,
    cluster_kwargs=cluster_kwargs,
)

Testing a single block before running a distributed computation:

from cellpose.contrib.distributed_segmentation import process_block

# define a crop as the distributed function would
starts = (128, 128, 128)
blocksize = (256, 256, 256)
overlap = 60
crop = tuple(slice(s-overlap, s+b+overlap) for s, b in zip(starts, blocksize))

# call the segmentation
segments, boxes, box_ids = process_block(
    block_index=(0, 0, 0),  # when test_mode=True this is just a dummy value
    crop=crop,
    input_zarr=my_zarr_array,
    model_kwargs=model_kwargs,
    eval_kwargs=eval_kwargs,
    blocksize=blocksize,
    overlap=overlap,
    output_zarr=None,
    test_mode=True,
)

Wrap a folder of tiff images/tiles into a single Zarr array:

# Note tiff filenames must indicate the position of each file in the overall tile grid
from cellpose.contrib.distributed_segmentation import wrap_folder_of_tiffs
reconstructed_virtual_zarr_array = wrap_folder_of_tiffs(
    filname_pattern='/path/to/folder/of/*.tiff', block_index_pattern=r'_(Z)(\d+)(Y)(\d+)(X)(\d+)',
)

Converting a large single tiff image to Zarr:

# Note full image will be loaded in system memory
from cellpose.contrib.distributed_segmentation import numpy_array_to_zarr
data_numpy = tifffile.imread('/path/to/image.tiff')
data_zarr = numpy_array_to_zarr('/path/to/output.zarr', data_numpy, chunks=(256, 256, 256))
del data_numpy

fleishmang and others added 30 commits July 2, 2021 14:21
…tributed

update my local working branch with package changes, hopefully fixes logging
@GFleishman
Copy link
Copy Markdown
Contributor Author

@krokicki

@GFleishman
Copy link
Copy Markdown
Contributor Author

Recently confirmed this works with custom pre-trained models. Also works with multi-channel inputs, through creative use of the preprocessing_steps argument.

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 7, 2025

Codecov Report

Attention: Patch coverage is 71.42857% with 2 lines in your changes missing coverage. Please review.

Project coverage is 52.69%. Comparing base (a70f773) to head (c913196).
Report is 101 commits behind head on main.

Files with missing lines Patch % Lines
cellpose/dynamics.py 0.00% 1 Missing ⚠️
cellpose/io.py 83.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1062      +/-   ##
==========================================
- Coverage   53.17%   52.69%   -0.49%     
==========================================
  Files          18       18              
  Lines        4139     4302     +163     
==========================================
+ Hits         2201     2267      +66     
- Misses       1938     2035      +97     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@carsen-stringer carsen-stringer merged commit 942d541 into MouseLand:main Feb 7, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants