Optimize fill_holes_and_remove_small_masks#1116
Merged
carsen-stringer merged 2 commits intoMouseLand:mainfrom Apr 7, 2025
Merged
Optimize fill_holes_and_remove_small_masks#1116carsen-stringer merged 2 commits intoMouseLand:mainfrom
carsen-stringer merged 2 commits intoMouseLand:mainfrom
Conversation
…ll, a more optimized function for filling in 2D and 3D contours
Contributor
Author
|
I ran some tests using a single-channel z-stack of 10x2048x2048, with ~1700 nuclei/slice. There is definite increase in speed (4-5x), and it should scale a lot better for large datasets with 1000's of cells per slices. Old method:New method:Only replacing
|
Member
|
thanks, I implemented the counts computation with fastremap as well, this is about 2x faster than np.bincount on 2000x2000 images on my computer. also it then makes sense to do the loop in order of mask size in the future, although that slightly changes the output (changing the regression tests) so I will leave it as a to-do. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Identifying and replacing bottlenecks in the
fill_holes_and_remove_small_masksfunctionFor my own projects, I was depending on
fill_holes_and_remove_small_masks, which would often slow down the processing throughput of large datasets. I managed to narrow down the bottleneck toscipy.ndimage.morphology.binary_fill_holes. When this is replaced by a more optimized algorithm likefill_voids.fill(documentation), we can get massive speedups, especially since it can easily be rewritten into a multithreaded calculation in this way.Another change proposed in this PR is the separation of small mask filtering and the fill_holes operation. This approach results in rapid filtering of small masks by counting the labels in the flattened label image (so it supports 2D & 3D) using
np.bincount. If any of the counts are belowmin_size, they are set to 0 through thenp.isinfilter. The advantage of splitting the for-loop into two components is that we can now omit calculating the sum of every individual mask, which saves more time in the end.