Commit d6c445a
feat: editable vector export pipeline (Stage 8) + prompt v2 + heart/MRI fix
Adds an editable-vector export stage to the pipeline, broadens SAM3 prompt
coverage for scientific/medical figures, and fixes a classification bug
that was rendering medical-image detections as blank white outlines.
## What's new
### Stage 8 — Vector export (new modules)
Per-image output under `output/{image}/vectors/`:
elements/ individual editable SVGs for every detected element
rasters/ cropped transparent-background PNGs for image elements
combined/ single combined.svg (layered) and combined.pdf
manifest.json element index with bbox, score, layer, paths
New modules:
modules/svg_generator.py hybrid renderer — geometric primitives for
known shapes, Chaikin-smoothed polygons for
complex contours, base64-embedded crops for
raster elements, editable <text> for OCR
modules/pdf_combiner.py svglib/cairosvg PDF backend
modules/section_detector.py panel detection via SAM3 backgrounds + HoughLinesP
modules/vector_exporter.py Stage 8 orchestrator (BaseProcessor subclass)
CLI:
--vector-level=granular|section|component|all (default: granular)
--no-vectors skip Stage 8
### Prompt v2 — broader coverage for scientific/medical figures
Total prompts: 19 -> 78
prompts/image.py 5 -> 29 (CT/MRI/ultrasound, 3D heart/anatomy,
person/crowd icons, computer monitors,
checkerboard/grid patterns, image stacks)
prompts/shape.py 7 -> 17 (trapezoid, parallelogram, 3D cube,
isometric box, cylinder, color swatch,
small colored square, stack of rectangles)
prompts/arrow.py 3 -> 17 (thick/block/curved/looping/bidirectional/
dashed/dotted/L-shaped/skip variants)
prompts/background.py 4 -> 15 (sub-figure panel, dashed border rectangle,
legend box/panel, title bar, header strip)
Config tuning to match (config/config.yaml — gitignored):
shape.min_area: 200 -> 80 (catches 14x14 legend swatches)
shape.score_threshold: 0.5 -> 0.45
arrow.score_threshold: 0.45 -> 0.4
image.score_threshold: 0.5 -> 0.45
### Bug fix — heart/MRI rendered as blank white polygon outlines
Type classification was scattered across three files using case-sensitive
string comparisons. IMAGE_PROMPT contains mixed-case names like
"3D heart model" and "MRI image", but every comparison did
`elem.type.lower() in CasedSet`, so those specific scientific-image
prompts silently fell through and got rendered as white polygon outlines.
Across 18 figures, this dropped 40 medical detections (36 MRI + 4 heart)
to outline-only. After the fix all 40 are properly extracted as RGBA
crops and embedded as base64 <image> in their SVGs.
Fix made the prompt files the single source of truth:
modules/svg_generator.py RASTER_TYPES, GEOMETRIC_SHAPES, ARROW_TYPES
now derived from prompt files via
`_expand_forms()` helper (covers both
space-form and underscore-form normalization)
modules/icon_picture_processor.py lowercased IMAGE_PROMPT before comparison
modules/data_types.py get_layer_level() imports prompt lists;
specific prompts land in correct layer
(IMAGE/BASIC_SHAPE/ARROW/BACKGROUND)
instead of OTHER
Adding a new prompt now auto-registers for routing, layer assignment, and
raster cropping — no parallel lists to keep in sync.
## Run results on the 18-figure test set
1,071 individual element SVGs
425 raster PNGs (was 385 before fix; +40 = the heart/MRI recoveries)
18 combined SVGs (one per figure)
18 combined PDFs (one per figure, Affinity-ready)
## Known limitations & future work
Even with broader prompts and the new hierarchical layer assignment, the
pipeline still under-understands **multi-panel / schematic figures**.
Detection happens per element; the global semantics — which arrow connects
which box across panel boundaries, which legend swatch labels which plot —
is not modeled.
Two directions worth exploring:
1. Two-pass extraction with explicit panel splitting. First pass: detect
sub-figure panels and split the source image into per-panel crops.
Second pass: run the full pipeline on each crop independently. This
should help the model focus on local structure and avoid cross-panel
prompt confusion. SAM3 backgrounds + HoughLinesP already give us panel
candidates (see section_detector.py); the missing piece is the
recursive split-and-rerun loop.
2. Smart margin padding around cropped rasters. Tight bboxes sometimes
clip strokes or leave faint background ghosts. A per-type margin
heuristic (icon vs. photo vs. schematic illustration) would clean this
up, but the logic is hard to pin down — loose enough to capture the
full visual element, tight enough to avoid neighbor bleed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent 85eeb94 commit d6c445a
14 files changed
Lines changed: 2014 additions & 25 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
50 | 50 | | |
51 | 51 | | |
52 | 52 | | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| 15 | + | |
| 16 | + | |
15 | 17 | | |
16 | 18 | | |
17 | 19 | | |
| |||
39 | 41 | | |
40 | 42 | | |
41 | 43 | | |
42 | | - | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
43 | 48 | | |
44 | 49 | | |
45 | 50 | | |
| |||
89 | 94 | | |
90 | 95 | | |
91 | 96 | | |
| 97 | + | |
92 | 98 | | |
93 | 99 | | |
94 | 100 | | |
| |||
138 | 144 | | |
139 | 145 | | |
140 | 146 | | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
141 | 153 | | |
142 | 154 | | |
143 | 155 | | |
144 | 156 | | |
145 | 157 | | |
146 | 158 | | |
147 | | - | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
148 | 162 | | |
149 | 163 | | |
150 | 164 | | |
| |||
264 | 278 | | |
265 | 279 | | |
266 | 280 | | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
267 | 301 | | |
268 | | - | |
| 302 | + | |
269 | 303 | | |
270 | 304 | | |
271 | 305 | | |
| |||
332 | 366 | | |
333 | 367 | | |
334 | 368 | | |
| 369 | + | |
| 370 | + | |
335 | 371 | | |
336 | 372 | | |
337 | 373 | | |
| |||
348 | 384 | | |
349 | 385 | | |
350 | 386 | | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
351 | 394 | | |
352 | 395 | | |
353 | 396 | | |
| |||
417 | 460 | | |
418 | 461 | | |
419 | 462 | | |
420 | | - | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
421 | 466 | | |
422 | 467 | | |
423 | 468 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
24 | 30 | | |
25 | 31 | | |
26 | 32 | | |
| |||
53 | 59 | | |
54 | 60 | | |
55 | 61 | | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
56 | 67 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
255 | 255 | | |
256 | 256 | | |
257 | 257 | | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
258 | 294 | | |
259 | 295 | | |
260 | 296 | | |
261 | | - | |
262 | | - | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
263 | 304 | | |
264 | 305 | | |
265 | | - | |
266 | | - | |
267 | | - | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
268 | 310 | | |
269 | | - | |
270 | | - | |
271 | | - | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
272 | 314 | | |
273 | | - | |
| 315 | + | |
274 | 316 | | |
275 | 317 | | |
276 | 318 | | |
277 | | - | |
278 | | - | |
279 | | - | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
280 | 324 | | |
281 | | - | |
282 | | - | |
| 325 | + | |
| 326 | + | |
283 | 327 | | |
284 | 328 | | |
285 | 329 | | |
286 | 330 | | |
287 | | - | |
| 331 | + | |
288 | 332 | | |
289 | | - | |
| 333 | + | |
290 | 334 | | |
291 | 335 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
315 | 315 | | |
316 | 316 | | |
317 | 317 | | |
318 | | - | |
319 | | - | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
320 | 327 | | |
321 | 328 | | |
322 | 329 | | |
| |||
0 commit comments