contrib: Alternative distributed_segmentation, automated merge/split tools, ellipsoid shape filter#804
contrib: Alternative distributed_segmentation, automated merge/split tools, ellipsoid shape filter#804GFleishman wants to merge 47 commits intoMouseLand:mainfrom
Conversation
…tributed update my local working branch with package changes, hopefully fixes logging
Merge branch 'main' of https://github.com/MouseLand/cellpose into distributed
|
Maybe the only issue to discuss is I added my own dependencies to the setuptools files - like dask, and my little package for building cluster: ClusterWrap. I've left these in for now but I'm happy to remove them if that's preferable for merging. I don't really know the right way to handle these if a contrib has a unique dependency? I guess they should be optional dependencies but I don't know how to set that up. But if you prefer them formatted that way I can learn how to do it. |
|
Hi Greg, thanks for the PR. However, we'll be moving all contrib files into a separate repo and we will let you know where that is so you can pull your code into it directly. |
|
Mi Marius - sounds good I'll wait until the contrib repo is set up. Is it too much to ask for you to notify me here when that is complete, or is there another way that I can find out when you're done setting up that repo without bothering you? |
|
We'll definitely notify you, thanks Greg. |
…into distributed
|
@carsen-stringer @marius10p
This code is all thoroughly tested on both the cluster and a workstation. Going between the cluster and a workstation, one just needs to change a few parameters, it's very simple. I have a Jupyter notebook with several use case examples that I'm happy to share if you want to evaluate yourself. There are a few desirable things which I intend to add but have not yet included:
|
|
Hi Greg, I am working on running cellpose on some large (TB-scale) lightsheet data and came across your dask implementation here. I'm working with an SGE cluster and have tried modifying the script to handle this, but I'm wondering if you'd be willing to share your Jupyter notebook / use case example code? I'm running into some issues that I believe are related to the dask client/scheduler setup, but I want to rule out any other part of my implementation causing the issue. Thanks! |
…ogs. (1) io.logger_setup modified to accept alternative log file to stdout stream (2) distributed_eval creates datetime stamped log directory (3) individual workers create their own log files tagged with their name/index
…by default; no additional coding needed to leverage workstations with gpus
…tifffile - tifffile.imread(..., aszarr=True, ...) returns non-serializable array with single tiff input
…ust releases gpus and hard codes 1 cpu per worker - stitching is cheap, this will always fit
…anelia LSF cluster cases
…cases in best way available given limitations of tiff files
|
Superseded by #1062 |
All contributions are in their own modules in the contrib folder - no cellpose files were modified.
There is an existing distributed_segmentation in contrib well written by the knowledgeable @chrisroat. However, for our own work we have favored an alternative implementation which relies less on dask.array, is more permissive in overlap sizes, and provides tools for the user to set up a cluster object on which to run the distributed computation. This implementation is thoroughly tested in our own environment and already integrated in several work flows. We would now like to make it available to external users (within our institute and some abroad as well) but we'd like to do that by wrapping the primary cellpose repository instead of my fork. So now I'm back to see about getting these tools merged.
Some additional things that have been helpful for us are automated merge and split functions. Our samples typically have ~200K cells so we cannot QC them by hand. We use size and shape to determine which segments have underperformed and then merge or split them as necessary. These tools are not very sophisticated but they do help more than they hurt.
Finally, we are typically segmenting nuclei and like to have some measure of how well the cellpose segments match an elliptical shape - so some tools for fitting ellipsoids to a large number of cellpose segments are also included.