Skip to content

Optimize fill_holes_and_remove_small_masks#1116

Merged
carsen-stringer merged 2 commits intoMouseLand:mainfrom
Tomvl117:fill_holes_replacement
Apr 7, 2025
Merged

Optimize fill_holes_and_remove_small_masks#1116
carsen-stringer merged 2 commits intoMouseLand:mainfrom
Tomvl117:fill_holes_replacement

Conversation

@Tomvl117
Copy link
Copy Markdown
Contributor

Identifying and replacing bottlenecks in the fill_holes_and_remove_small_masks function

For my own projects, I was depending on fill_holes_and_remove_small_masks, which would often slow down the processing throughput of large datasets. I managed to narrow down the bottleneck to scipy.ndimage.morphology.binary_fill_holes. When this is replaced by a more optimized algorithm like fill_voids.fill (documentation), we can get massive speedups, especially since it can easily be rewritten into a multithreaded calculation in this way.

Another change proposed in this PR is the separation of small mask filtering and the fill_holes operation. This approach results in rapid filtering of small masks by counting the labels in the flattened label image (so it supports 2D & 3D) using np.bincount. If any of the counts are below min_size, they are set to 0 through the np.isin filter. The advantage of splitting the for-loop into two components is that we can now omit calculating the sum of every individual mask, which saves more time in the end.

@Tomvl117
Copy link
Copy Markdown
Contributor Author

I ran some tests using a single-channel z-stack of 10x2048x2048, with ~1700 nuclei/slice. There is definite increase in speed (4-5x), and it should scale a lot better for large datasets with 1000's of cells per slices.

Old method:

Time to fill holes and remove small masks (dynamics.py): 0.2030031681060791
Time to fill holes and remove small masks (dynamics.py): 0.24500226974487305
Time to fill holes and remove small masks (dynamics.py): 0.22499322891235352
Time to fill holes and remove small masks (dynamics.py): 0.27199268341064453
Time to fill holes and remove small masks (dynamics.py): 0.37999701499938965
Time to fill holes and remove small masks (dynamics.py): 0.25299882888793945
Time to fill holes and remove small masks (dynamics.py): 0.25899791717529297
Time to fill holes and remove small masks (dynamics.py): 0.2679927349090576
Time to fill holes and remove small masks (dynamics.py): 0.2779998779296875
Time to fill holes and remove small masks (dynamics.py): 0.3139975070953369
Time to fill holes and remove small masks (models.py): 2.598935842514038

New method:

Time to fill holes and remove small masks (dynamics.py): 0.054001808166503906
Time to fill holes and remove small masks (dynamics.py): 0.054007768630981445
Time to fill holes and remove small masks (dynamics.py): 0.05601000785827637
Time to fill holes and remove small masks (dynamics.py): 0.05900120735168457
Time to fill holes and remove small masks (dynamics.py): 0.18400335311889648
Time to fill holes and remove small masks (dynamics.py): 0.06300115585327148
Time to fill holes and remove small masks (dynamics.py): 0.06300187110900879
Time to fill holes and remove small masks (dynamics.py): 0.06600785255432129
Time to fill holes and remove small masks (dynamics.py): 0.06800031661987305
Time to fill holes and remove small masks (dynamics.py): 0.07099556922912598
Time to fill holes and remove small masks (models.py): 1.0970180034637451

Only replacing scipy.ndimage.morphology.binary_fill_holes with fill_voids.fill:

Time to fill holes and remove small masks (dynamics.py): 0.06300568580627441
Time to fill holes and remove small masks (dynamics.py): 0.0650029182434082
Time to fill holes and remove small masks (dynamics.py): 0.08400654792785645
Time to fill holes and remove small masks (dynamics.py): 0.06999945640563965
Time to fill holes and remove small masks (dynamics.py): 0.22599387168884277
Time to fill holes and remove small masks (dynamics.py): 0.08199930191040039
Time to fill holes and remove small masks (dynamics.py): 0.10399603843688965
Time to fill holes and remove small masks (dynamics.py): 0.10900402069091797
Time to fill holes and remove small masks (dynamics.py): 0.08099865913391113
Time to fill holes and remove small masks (dynamics.py): 0.0859975814819336
Time to fill holes and remove small masks (models.py): 0.6749813556671143

@carsen-stringer carsen-stringer merged commit 5a89b44 into MouseLand:main Apr 7, 2025
@carsen-stringer
Copy link
Copy Markdown
Member

thanks, I implemented the counts computation with fastremap as well, this is about 2x faster than np.bincount on 2000x2000 images on my computer. also it then makes sense to do the loop in order of mask size in the future, although that slightly changes the output (changing the regression tests) so I will leave it as a to-do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants