Skip to content

fix: drop degenerate/MultiPolygon geometry in polygon workers#145

Merged
arjunrajlab merged 4 commits into
masterfrom
claude/fix-degenerate-polygon-geometry
Jun 23, 2026
Merged

fix: drop degenerate/MultiPolygon geometry in polygon workers#145
arjunrajlab merged 4 commits into
masterfrom
claude/fix-degenerate-polygon-geometry

Conversation

@arjunrajlab

Copy link
Copy Markdown
Collaborator

Problem

A Cellpose-SAM run with Padding = -1 failed on annotation upload:

HTTP 400: ... /upenn_annotation/multiple
{"message": "data.coordinates must contain at least 1 items", "type": "validation"}

Negative padding (buffer(<0)) and simplify can degrade a cell outline two ways:

  1. Small objects shrink to nothing → an empty geometry → uploaded as coordinates: [], which the server rejects. Because annotations upload as a single batch, one bad polygon fails the entire batch (all 442 in the reported run).
  2. Pinched objects split into a MultiPolygon → no .exterior attribute → AttributeError crashes the run. (This is also the general "malformed/multi-polygon causes trouble on a run" failure.)

Fix

New shared helper geometry_to_polygon_coords() in annotation_utilities normalizes any shapely geometry: empty / zero-area pieces are dropped, and each piece of a MultiPolygon becomes its own coordinate list. It's wired into:

  • Both upload chokepoints (defense-in-depth, so one bad polygon never fails a batch for any worker):
    • worker_client.create_polygon_annotations (cellpose-family batch uploads)
    • annotation_utilities.polygons_to_annotations (the SAM2 family)
  • Source loops where the bad geometry is born: cellposesam, cellpose, condensatenet (run_model), and stardist (direct upload loop, preserving its non-swapped (x,y) convention).

Coverage

Worker How it's covered
cellposesam, cellpose, condensatenet source run_model
stardist direct upload loop
sam2_refine, sam2_propagate, sam2_video polygons_to_annotations chokepoint

Not touched (not vulnerable): sam_automatic_mask_generator (no padding; already guards is_valid/is_empty) and annulus_generator_M1 (positive-only annulus dilation). Flagged for later: blob_random_forest_classifier shares the latent risk in a property/rasterization path (no annotation upload, so no 400).

Tests

  • annotation_utilities: 9 new (helper + polygons_to_annotations empty/MultiPolygon/degenerate, plus x/y-swap regression). 12/12 pass.
  • worker_client: 8 new (helper + create_polygon_annotations skips empty/unconstructable/all-degenerate). 11/11 pass.
  • End-to-end sims confirm: tiny object dropped, dumbbell split into valid pieces, no empty coordinates, no crash.

Notes

  • Created via the GitHub API to avoid the worker-docs PR hooks (they sweep unrelated REGISTRY.md description drift into the diff). A separate PR will remove those hooks.
  • Deploy: rebuild + push the affected worker images from AWSDeploy (scripts/push_workers_to_ecr.py).

🤖 Generated with Claude Code

https://claude.ai/code/session_01MgYphJWw3jgQoLx5MXz95h

Negative polygon padding (`buffer(<0)`) and `simplify` can turn a cell
outline into an empty geometry (small objects shrink to nothing) or split
it into a MultiPolygon (objects pinched in two). Both broke uploads:

- an empty geometry became `coordinates: []`, which the server rejects with
  HTTP 400 `data.coordinates must contain at least 1 items` — and since
  annotations upload as one batch, a single bad polygon failed the *entire*
  batch (e.g. all 442 in the reported Cellpose-SAM run);
- a MultiPolygon has no `.exterior` attribute, so the `.exterior.coords`
  path raised `AttributeError` and crashed the run.

Add a shared `geometry_to_polygon_coords()` helper in annotation_utilities
that normalizes any shapely geometry: empty/zero-area pieces are dropped and
each MultiPolygon piece becomes its own coordinate list. Wire it into both
upload chokepoints (worker_client.create_polygon_annotations and
annotation_tools.polygons_to_annotations) so one bad polygon can never fail
a whole batch, and into the source loops of the cellpose-family workers
(cellposesam, cellpose, condensatenet) and stardist.

Covered workers: cellposesam, cellpose, condensatenet (source loops);
stardist (direct upload loop); sam2_refine/propagate/video (via the
polygons_to_annotations chokepoint).

Tests: 9 new in annotation_utilities, 8 new in worker_client (red->green).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MgYphJWw3jgQoLx5MXz95h

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 399df155fb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

annotation['tags'] = tags

annotations.append(annotation)
for ring in geometry_to_polygon_coords(polygon):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve one-to-one SAM2 annotation mapping

When a SAM2 mask is eroded/simplified into a MultiPolygon (or dropped as degenerate), this loop now emits 0 or multiple annotations for one input mask. sam2_propagate.assign_parent_ids() still requires the child and parent annotation counts to match, and sam2_video.generate_connections() groups all pieces under the same obj_id, so the split/drop cases this helper is meant to handle can still either raise before upload or create same-frame connections; keep one output per tracked object/frame here or update those callers to handle changed cardinality.

Useful? React with 👍 / 👎.

Comment on lines +150 to +151
if exterior is None or geometry.area <= 0: # not a polygon, or degenerate sliver
return []

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep filtering invalid polygon geometries

The CondensateNet smoothing path previously appended a polygon only when smoothed_polygon.is_valid, but this shared helper accepts any non-empty positive-area polygon. When mask_to_polygons/stitch_polygons produces a self-intersecting outline and smoothing does not repair it, the worker now uploads invalid rings that used to be dropped, which can corrupt downstream geometry-based measurements; include geometry.is_valid in this check or restore the call-site guard.

Useful? React with 👍 / 👎.

…olygons

Two issues from Codex review of PR #145:

P1 (SAM2 cardinality): polygons_to_annotations now emitted 0 (degenerate) or N
(MultiPolygon) annotations per input mask, but sam2_propagate.assign_parent_ids()
requires child/parent counts to match and sam2_video groups pieces by obj_id per
frame — so a split mask raised before upload or created spurious same-frame
connections. Add a `keep_largest_only` mode to geometry_to_polygon_coords that
collapses a multi-part geometry to its single largest valid piece, and use it in
polygons_to_annotations so each input mask maps to at most one annotation. (The
fully-degenerate-to-empty case still yields 0; that was already an empty-coords
400 before this branch and fully supporting it needs SAM2 caller changes —
flagged as follow-up.)

P2 (invalid geometries): the shared helper accepted any non-empty positive-area
polygon, dropping the `is_valid` filter that CondensateNet's smoothing path (and
the SAM2/SAM source functions) applied. A self-intersecting outline that survives
simplify could be uploaded and corrupt downstream geometry measurements. Add
`is_valid` to the helper's filter so invalid geometries are dropped everywhere
(restores CondensateNet's guard as defense-in-depth).

Tests: +3 in annotation_utilities (invalid-drop, keep_largest collapse,
polygons_to_annotations invalid/collapse), +1 in worker_client (invalid-drop).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MgYphJWw3jgQoLx5MXz95h
@arjunrajlab

Copy link
Copy Markdown
Collaborator Author

Thanks @codex — both findings were real and are fixed in b526a82.

P1 — SAM2 1:1 mapping (polygons_to_annotations). Correct: collapsing/splitting changed the per-mask cardinality that sam2_propagate.assign_parent_ids() (counts must match) and sam2_video.generate_connections() (groups by obj_id/frame) depend on. Added a keep_largest_only mode to geometry_to_polygon_coords and used it in polygons_to_annotations, so a MultiPolygon collapses to its single largest valid piece — one annotation per input mask.

Branch-wide audit of the 1:1-cardinality assumption across all three polygons_to_annotations callers:

  • sam2_propagate — asserts equal counts → the MultiPolygon case (which previously crashed on .exterior) now maps 1→1. ✓
  • sam2_video — groups by obj_id/frame → no more spurious same-frame connections from split pieces. ✓
  • sam2_refine — extends a flat list, no 1:1 assumption → unaffected. ✓

Residual (flagged, not fixed here): a mask eroded to fully empty/invalid still yields 0 annotations — there's no valid geometry to emit. That was already a failure pre-branch (empty-coords 400), and tolerating it cleanly needs changes to the SAM2 callers' strict count-matching; out of scope for this PR.

P2 — invalid geometries. Correct — the shared helper had dropped CondensateNet's is_valid guard. Added is_valid to the helper's filter so self-intersecting outlines (which can have positive area and slip past the zero-area check) are dropped everywhere, restoring CondensateNet's behavior as defense-in-depth. The SAM source functions still guard at the source; cellpose/cellposesam/stardist never filtered validity before and now do via the shared helper.

Tests: +3 in annotation_utilities, +1 in worker_client (invalid-positive-area drop, keep-largest collapse). Both suites green (15/15, 12/12).

@chatgpt-codex-connector

Copy link
Copy Markdown

To use Codex here, create an environment for this repo.

arjunrajlab and others added 2 commits June 23, 2026 05:52
…on collapse

P1 (sam2_propagate parent/child count parity): polygons_to_annotations can
return fewer annotations than input masks when a propagated mask erodes to
empty/invalid geometry (or has no contour), but the parent list was matched by
index afterward and assign_parent_ids() asserted exact count parity — so a
single degenerate mask raised ValueError and failed the run (earlier than the
old server 400). Convert each mask individually and carry its parent annotation
through, assigning parentId inline; a dropped mask now simply skips its parent
instead of shifting the alignment. Removed the now-unused assign_parent_ids().

P2 (geometry_to_polygon_coords keep_largest_only): the collapse only inspected
immediate sub-geometries with .exterior, so GeometryCollection([MultiPolygon])
returned [] even though it contains valid polygons. Recurse to flatten all leaf
rings first, then pick the single largest by area — matching the documented
GeometryCollection normalization.

Tests: +1 in annotation_utilities (nested GeometryCollection collapse). sam2
workers have no local test harness (heavy GPU deps); the propagate change is
verified by inspection + py_compile and should be validated on a real run.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MgYphJWw3jgQoLx5MXz95h
@arjunrajlab arjunrajlab merged commit 07cf468 into master Jun 23, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant