fix: drop degenerate/MultiPolygon geometry in polygon workers#145
Conversation
Negative polygon padding (`buffer(<0)`) and `simplify` can turn a cell outline into an empty geometry (small objects shrink to nothing) or split it into a MultiPolygon (objects pinched in two). Both broke uploads: - an empty geometry became `coordinates: []`, which the server rejects with HTTP 400 `data.coordinates must contain at least 1 items` — and since annotations upload as one batch, a single bad polygon failed the *entire* batch (e.g. all 442 in the reported Cellpose-SAM run); - a MultiPolygon has no `.exterior` attribute, so the `.exterior.coords` path raised `AttributeError` and crashed the run. Add a shared `geometry_to_polygon_coords()` helper in annotation_utilities that normalizes any shapely geometry: empty/zero-area pieces are dropped and each MultiPolygon piece becomes its own coordinate list. Wire it into both upload chokepoints (worker_client.create_polygon_annotations and annotation_tools.polygons_to_annotations) so one bad polygon can never fail a whole batch, and into the source loops of the cellpose-family workers (cellposesam, cellpose, condensatenet) and stardist. Covered workers: cellposesam, cellpose, condensatenet (source loops); stardist (direct upload loop); sam2_refine/propagate/video (via the polygons_to_annotations chokepoint). Tests: 9 new in annotation_utilities, 8 new in worker_client (red->green). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01MgYphJWw3jgQoLx5MXz95h
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 399df155fb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| annotation['tags'] = tags | ||
|
|
||
| annotations.append(annotation) | ||
| for ring in geometry_to_polygon_coords(polygon): |
There was a problem hiding this comment.
Preserve one-to-one SAM2 annotation mapping
When a SAM2 mask is eroded/simplified into a MultiPolygon (or dropped as degenerate), this loop now emits 0 or multiple annotations for one input mask. sam2_propagate.assign_parent_ids() still requires the child and parent annotation counts to match, and sam2_video.generate_connections() groups all pieces under the same obj_id, so the split/drop cases this helper is meant to handle can still either raise before upload or create same-frame connections; keep one output per tracked object/frame here or update those callers to handle changed cardinality.
Useful? React with 👍 / 👎.
| if exterior is None or geometry.area <= 0: # not a polygon, or degenerate sliver | ||
| return [] |
There was a problem hiding this comment.
Keep filtering invalid polygon geometries
The CondensateNet smoothing path previously appended a polygon only when smoothed_polygon.is_valid, but this shared helper accepts any non-empty positive-area polygon. When mask_to_polygons/stitch_polygons produces a self-intersecting outline and smoothing does not repair it, the worker now uploads invalid rings that used to be dropped, which can corrupt downstream geometry-based measurements; include geometry.is_valid in this check or restore the call-site guard.
Useful? React with 👍 / 👎.
…olygons Two issues from Codex review of PR #145: P1 (SAM2 cardinality): polygons_to_annotations now emitted 0 (degenerate) or N (MultiPolygon) annotations per input mask, but sam2_propagate.assign_parent_ids() requires child/parent counts to match and sam2_video groups pieces by obj_id per frame — so a split mask raised before upload or created spurious same-frame connections. Add a `keep_largest_only` mode to geometry_to_polygon_coords that collapses a multi-part geometry to its single largest valid piece, and use it in polygons_to_annotations so each input mask maps to at most one annotation. (The fully-degenerate-to-empty case still yields 0; that was already an empty-coords 400 before this branch and fully supporting it needs SAM2 caller changes — flagged as follow-up.) P2 (invalid geometries): the shared helper accepted any non-empty positive-area polygon, dropping the `is_valid` filter that CondensateNet's smoothing path (and the SAM2/SAM source functions) applied. A self-intersecting outline that survives simplify could be uploaded and corrupt downstream geometry measurements. Add `is_valid` to the helper's filter so invalid geometries are dropped everywhere (restores CondensateNet's guard as defense-in-depth). Tests: +3 in annotation_utilities (invalid-drop, keep_largest collapse, polygons_to_annotations invalid/collapse), +1 in worker_client (invalid-drop). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01MgYphJWw3jgQoLx5MXz95h
|
Thanks @codex — both findings were real and are fixed in P1 — SAM2 1:1 mapping ( Branch-wide audit of the 1:1-cardinality assumption across all three
Residual (flagged, not fixed here): a mask eroded to fully empty/invalid still yields 0 annotations — there's no valid geometry to emit. That was already a failure pre-branch (empty-coords 400), and tolerating it cleanly needs changes to the SAM2 callers' strict count-matching; out of scope for this PR. P2 — invalid geometries. Correct — the shared helper had dropped CondensateNet's Tests: +3 in |
|
To use Codex here, create an environment for this repo. |
…on collapse P1 (sam2_propagate parent/child count parity): polygons_to_annotations can return fewer annotations than input masks when a propagated mask erodes to empty/invalid geometry (or has no contour), but the parent list was matched by index afterward and assign_parent_ids() asserted exact count parity — so a single degenerate mask raised ValueError and failed the run (earlier than the old server 400). Convert each mask individually and carry its parent annotation through, assigning parentId inline; a dropped mask now simply skips its parent instead of shifting the alignment. Removed the now-unused assign_parent_ids(). P2 (geometry_to_polygon_coords keep_largest_only): the collapse only inspected immediate sub-geometries with .exterior, so GeometryCollection([MultiPolygon]) returned [] even though it contains valid polygons. Recurse to flatten all leaf rings first, then pick the single largest by area — matching the documented GeometryCollection normalization. Tests: +1 in annotation_utilities (nested GeometryCollection collapse). sam2 workers have no local test harness (heavy GPU deps); the propagate change is verified by inspection + py_compile and should be validated on a real run. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01MgYphJWw3jgQoLx5MXz95h
Problem
A Cellpose-SAM run with Padding = -1 failed on annotation upload:
Negative padding (
buffer(<0)) andsimplifycan degrade a cell outline two ways:coordinates: [], which the server rejects. Because annotations upload as a single batch, one bad polygon fails the entire batch (all 442 in the reported run)..exteriorattribute →AttributeErrorcrashes the run. (This is also the general "malformed/multi-polygon causes trouble on a run" failure.)Fix
New shared helper
geometry_to_polygon_coords()inannotation_utilitiesnormalizes any shapely geometry: empty / zero-area pieces are dropped, and each piece of a MultiPolygon becomes its own coordinate list. It's wired into:worker_client.create_polygon_annotations(cellpose-family batch uploads)annotation_utilities.polygons_to_annotations(the SAM2 family)cellposesam,cellpose,condensatenet(run_model), andstardist(direct upload loop, preserving its non-swapped(x,y)convention).Coverage
run_modelpolygons_to_annotationschokepointNot touched (not vulnerable):
sam_automatic_mask_generator(no padding; already guardsis_valid/is_empty) andannulus_generator_M1(positive-only annulus dilation). Flagged for later:blob_random_forest_classifiershares the latent risk in a property/rasterization path (no annotation upload, so no 400).Tests
annotation_utilities: 9 new (helper +polygons_to_annotationsempty/MultiPolygon/degenerate, plus x/y-swap regression). 12/12 pass.worker_client: 8 new (helper +create_polygon_annotationsskips empty/unconstructable/all-degenerate). 11/11 pass.Notes
REGISTRY.mddescription drift into the diff). A separate PR will remove those hooks.scripts/push_workers_to_ecr.py).🤖 Generated with Claude Code
https://claude.ai/code/session_01MgYphJWw3jgQoLx5MXz95h