Skip to content

Commit d6c445a

Browse files
Sdamirsaclaude
andcommitted
feat: editable vector export pipeline (Stage 8) + prompt v2 + heart/MRI fix
Adds an editable-vector export stage to the pipeline, broadens SAM3 prompt coverage for scientific/medical figures, and fixes a classification bug that was rendering medical-image detections as blank white outlines. ## What's new ### Stage 8 — Vector export (new modules) Per-image output under `output/{image}/vectors/`: elements/ individual editable SVGs for every detected element rasters/ cropped transparent-background PNGs for image elements combined/ single combined.svg (layered) and combined.pdf manifest.json element index with bbox, score, layer, paths New modules: modules/svg_generator.py hybrid renderer — geometric primitives for known shapes, Chaikin-smoothed polygons for complex contours, base64-embedded crops for raster elements, editable <text> for OCR modules/pdf_combiner.py svglib/cairosvg PDF backend modules/section_detector.py panel detection via SAM3 backgrounds + HoughLinesP modules/vector_exporter.py Stage 8 orchestrator (BaseProcessor subclass) CLI: --vector-level=granular|section|component|all (default: granular) --no-vectors skip Stage 8 ### Prompt v2 — broader coverage for scientific/medical figures Total prompts: 19 -> 78 prompts/image.py 5 -> 29 (CT/MRI/ultrasound, 3D heart/anatomy, person/crowd icons, computer monitors, checkerboard/grid patterns, image stacks) prompts/shape.py 7 -> 17 (trapezoid, parallelogram, 3D cube, isometric box, cylinder, color swatch, small colored square, stack of rectangles) prompts/arrow.py 3 -> 17 (thick/block/curved/looping/bidirectional/ dashed/dotted/L-shaped/skip variants) prompts/background.py 4 -> 15 (sub-figure panel, dashed border rectangle, legend box/panel, title bar, header strip) Config tuning to match (config/config.yaml — gitignored): shape.min_area: 200 -> 80 (catches 14x14 legend swatches) shape.score_threshold: 0.5 -> 0.45 arrow.score_threshold: 0.45 -> 0.4 image.score_threshold: 0.5 -> 0.45 ### Bug fix — heart/MRI rendered as blank white polygon outlines Type classification was scattered across three files using case-sensitive string comparisons. IMAGE_PROMPT contains mixed-case names like "3D heart model" and "MRI image", but every comparison did `elem.type.lower() in CasedSet`, so those specific scientific-image prompts silently fell through and got rendered as white polygon outlines. Across 18 figures, this dropped 40 medical detections (36 MRI + 4 heart) to outline-only. After the fix all 40 are properly extracted as RGBA crops and embedded as base64 <image> in their SVGs. Fix made the prompt files the single source of truth: modules/svg_generator.py RASTER_TYPES, GEOMETRIC_SHAPES, ARROW_TYPES now derived from prompt files via `_expand_forms()` helper (covers both space-form and underscore-form normalization) modules/icon_picture_processor.py lowercased IMAGE_PROMPT before comparison modules/data_types.py get_layer_level() imports prompt lists; specific prompts land in correct layer (IMAGE/BASIC_SHAPE/ARROW/BACKGROUND) instead of OTHER Adding a new prompt now auto-registers for routing, layer assignment, and raster cropping — no parallel lists to keep in sync. ## Run results on the 18-figure test set 1,071 individual element SVGs 425 raster PNGs (was 385 before fix; +40 = the heart/MRI recoveries) 18 combined SVGs (one per figure) 18 combined PDFs (one per figure, Affinity-ready) ## Known limitations & future work Even with broader prompts and the new hierarchical layer assignment, the pipeline still under-understands **multi-panel / schematic figures**. Detection happens per element; the global semantics — which arrow connects which box across panel boundaries, which legend swatch labels which plot — is not modeled. Two directions worth exploring: 1. Two-pass extraction with explicit panel splitting. First pass: detect sub-figure panels and split the source image into per-panel crops. Second pass: run the full pipeline on each crop independently. This should help the model focus on local structure and avoid cross-panel prompt confusion. SAM3 backgrounds + HoughLinesP already give us panel candidates (see section_detector.py); the missing piece is the recursive split-and-rerun loop. 2. Smart margin padding around cropped rasters. Tight bboxes sometimes clip strokes or leave faint background ghosts. A per-type margin heuristic (icon vs. photo vs. schematic illustration) would clean this up, but the logic is hard to pin down — loose enough to capture the full visual element, tight enough to avoid neighbor bleed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 85eeb94 commit d6c445a

14 files changed

Lines changed: 2014 additions & 25 deletions

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,3 +50,7 @@ sam3_src/
5050
# Local processing & debug
5151
arrow_processing/
5252
debug_output/
53+
54+
# Local planning notes & AI tool session data
55+
.amir-zone/
56+
.claude/

main.py

Lines changed: 49 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@
1212
python main.py -i input/test.png -o output/custom/
1313
python main.py -i input/test.png --refine
1414
python main.py -i input/test.png --no-text
15+
python main.py -i input/test.png --vector-level=all
16+
python main.py -i input/test.png --no-vectors
1517
"""
1618

1719
import os
@@ -39,7 +41,10 @@
3941
XMLMerger,
4042
MetricEvaluator,
4143
RefinementProcessor,
42-
44+
45+
# Stage 8: Vector export
46+
VectorExporter,
47+
4348
# Text (modules/text/)
4449
TextRestorer,
4550

@@ -89,6 +94,7 @@ def __init__(self, config: dict = None):
8994
self._xml_merger = None
9095
self._metric_evaluator = None
9196
self._refinement_processor = None
97+
self._vector_exporter = None
9298

9399
@property
94100
def text_restorer(self):
@@ -138,13 +144,21 @@ def refinement_processor(self) -> RefinementProcessor:
138144
if self._refinement_processor is None:
139145
self._refinement_processor = RefinementProcessor()
140146
return self._refinement_processor
147+
148+
@property
149+
def vector_exporter(self) -> VectorExporter:
150+
if self._vector_exporter is None:
151+
self._vector_exporter = VectorExporter()
152+
return self._vector_exporter
141153

142154
def process_image(self,
143155
image_path: str,
144156
output_dir: str = None,
145157
with_refinement: bool = False,
146158
with_text: bool = True,
147-
groups: List[PromptGroup] = None) -> Optional[str]:
159+
groups: List[PromptGroup] = None,
160+
vector_level: str = "granular",
161+
no_vectors: bool = False) -> Optional[str]:
148162
"""Run pipeline on one image. Returns output XML path or None."""
149163
print(f"\n{'='*60}")
150164
print(f"Processing: {image_path}")
@@ -264,8 +278,28 @@ def process_image(self,
264278

265279
output_path = merge_result.metadata.get('output_path')
266280
print(f" Output: {output_path}")
281+
282+
# ============ Stage 8: Vector Export ============
283+
if not no_vectors:
284+
print(f"\n[8] Vector export (level={vector_level})...")
285+
context.intermediate_results['vector_level'] = vector_level
286+
try:
287+
vec_result = self.vector_exporter.process(context)
288+
if vec_result.success:
289+
vec_count = vec_result.metadata.get('exported_count', 0)
290+
vec_dir = vec_result.metadata.get('vector_dir', '')
291+
print(f" Exported {vec_count} elements -> {vec_dir}")
292+
else:
293+
print(f" Vector export failed: {vec_result.error_message}")
294+
except Exception as e:
295+
print(f" Vector export failed: {e}")
296+
import traceback
297+
traceback.print_exc()
298+
else:
299+
print("\n[8] Vector export (skipped)")
300+
267301
print(f"\n{'='*60}\nDone.\n{'='*60}")
268-
302+
269303
return output_path
270304

271305
except Exception as e:
@@ -332,6 +366,8 @@ def main():
332366
python main.py
333367
python main.py -i test.png --refine
334368
python main.py -i test.png --groups image arrow
369+
python main.py -i test.png --vector-level=all
370+
python main.py -i test.png --no-vectors
335371
"""
336372
)
337373

@@ -348,6 +384,13 @@ def main():
348384
help="Prompt groups to process (default: all)")
349385
parser.add_argument("--show-prompts", action="store_true",
350386
help="Show prompt config")
387+
388+
# Stage 8: Vector export options
389+
parser.add_argument("--vector-level", type=str, default="granular",
390+
choices=['granular', 'section', 'component', 'all'],
391+
help="Vector export granularity (default: granular)")
392+
parser.add_argument("--no-vectors", action="store_true",
393+
help="Skip vector export (Stage 8)")
351394

352395
args = parser.parse_args()
353396

@@ -417,7 +460,9 @@ def main():
417460
output_dir=output_dir,
418461
with_refinement=args.refine,
419462
with_text=not args.no_text,
420-
groups=groups
463+
groups=groups,
464+
vector_level=args.vector_level,
465+
no_vectors=args.no_vectors,
421466
)
422467
if result:
423468
success_count += 1

modules/__init__.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,12 @@
2121
from .metric_evaluator import MetricEvaluator
2222
from .refinement_processor import RefinementProcessor
2323

24+
# Stage 8: Vector export
25+
from .vector_exporter import VectorExporter
26+
from .svg_generator import SVGGenerator
27+
from .pdf_combiner import PDFCombiner
28+
from .section_detector import SectionDetector
29+
2430
# Text (modules/text/); optional if ocr/coord_processor missing
2531
try:
2632
from .text.restorer import TextRestorer
@@ -53,4 +59,9 @@
5359
'BasicShapeProcessor',
5460
'MetricEvaluator',
5561
'RefinementProcessor',
62+
# Stage 8: Vector export
63+
'VectorExporter',
64+
'SVGGenerator',
65+
'PDFCombiner',
66+
'SectionDetector',
5667
]

modules/data_types.py

Lines changed: 60 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -255,37 +255,81 @@ def from_yaml(cls, yaml_path: str) -> 'ProcessingConfig':
255255

256256

257257
# ======================== 辅助函数 ========================
258+
259+
def _expand_forms(prompts):
260+
"""Return set containing both lowercase-with-spaces and lowercase-with-underscores forms."""
261+
out = set()
262+
for p in prompts:
263+
low = p.lower()
264+
out.add(low)
265+
out.add(low.replace(" ", "_"))
266+
return out
267+
268+
269+
# Lazy-built prompt-derived type sets. Built on first call of get_layer_level
270+
# to avoid import-time cycles (prompts has no deps, but be safe).
271+
_TYPE_SETS_CACHE = {}
272+
273+
274+
def _get_type_sets():
275+
"""Build and cache prompt-derived type sets."""
276+
if _TYPE_SETS_CACHE:
277+
return _TYPE_SETS_CACHE
278+
try:
279+
from prompts.image import IMAGE_PROMPT
280+
from prompts.shape import SHAPE_PROMPT
281+
from prompts.arrow import ARROW_PROMPT
282+
from prompts.background import BACKGROUND_PROMPT
283+
except ImportError:
284+
# Fallback: empty sets; legacy hardcoded lists below still apply.
285+
IMAGE_PROMPT = SHAPE_PROMPT = ARROW_PROMPT = BACKGROUND_PROMPT = []
286+
287+
_TYPE_SETS_CACHE["image"] = _expand_forms(IMAGE_PROMPT)
288+
_TYPE_SETS_CACHE["shape"] = _expand_forms(SHAPE_PROMPT)
289+
_TYPE_SETS_CACHE["arrow"] = _expand_forms(ARROW_PROMPT)
290+
_TYPE_SETS_CACHE["background"] = _expand_forms(BACKGROUND_PROMPT)
291+
return _TYPE_SETS_CACHE
292+
293+
258294
def get_layer_level(element_type: str) -> int:
259295
"""
260296
根据元素类型获取默认层级
261-
262-
供各子模块使用,确保层级分配一致
297+
298+
供各子模块使用,确保层级分配一致。
299+
300+
v2 fix: derive image/shape/arrow/background sets from prompt files so
301+
specific prompts like "3D heart model" or "MRI image" (which were
302+
silently falling through to LayerLevel.OTHER and breaking stacking)
303+
now get the correct IMAGE layer.
263304
"""
264305
element_type = element_type.lower()
265-
266-
# 背景/容器类(最底层)
267-
if element_type in {'section_panel', 'title_bar'}:
306+
sets = _get_type_sets()
307+
308+
# 背景/容器类(最底层)— legacy names + prompt-derived
309+
if element_type in {'section_panel', 'title_bar'} or element_type in sets["background"]:
268310
return LayerLevel.BACKGROUND.value
269-
270-
# 箭头/连接线
271-
if element_type in {'arrow', 'line', 'connector'}:
311+
312+
# 箭头/连接线 — legacy names + prompt-derived
313+
if element_type in {'arrow', 'line', 'connector'} or element_type in sets["arrow"]:
272314
return LayerLevel.ARROW.value
273-
315+
274316
# 文字
275317
if element_type == 'text':
276318
return LayerLevel.TEXT.value
277-
278-
# 图片类
279-
if element_type in {'icon', 'picture', 'image', 'logo', 'chart', 'function_graph'}:
319+
320+
# 图片类 — legacy names + prompt-derived (this is the fix path for the heart bug)
321+
if element_type in {
322+
'icon', 'picture', 'image', 'logo', 'chart', 'function_graph'
323+
} or element_type in sets["image"]:
280324
return LayerLevel.IMAGE.value
281-
282-
# 基本图形
325+
326+
# 基本图形 — legacy names + prompt-derived
283327
if element_type in {
284328
'rectangle', 'rounded_rectangle', 'rounded rectangle',
285329
'diamond', 'ellipse', 'circle', 'cylinder', 'cloud',
286330
'hexagon', 'triangle', 'parallelogram', 'actor'
287-
}:
331+
} or element_type in sets["shape"]:
288332
return LayerLevel.BASIC_SHAPE.value
289-
333+
290334
# 其他
291335
return LayerLevel.OTHER.value

modules/icon_picture_processor.py

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -315,8 +315,15 @@ def process(self, context: ProcessingContext) -> ProcessingResult:
315315
)
316316

317317
def _get_elements_to_process(self, elements: List[ElementInfo]) -> List[ElementInfo]:
318-
"""Filter elements to process (icons, arrows, etc.; arrows treated as icon crop)."""
319-
all_types = set(IMAGE_PROMPT) | {"arrow", "line", "connector"}
318+
"""Filter elements to process (icons, arrows, etc.; arrows treated as icon crop).
319+
320+
NOTE: IMAGE_PROMPT contains mixed-case strings (e.g. "3D heart model",
321+
"MRI image", "CT scan image"). Comparing `.lower()` against the raw
322+
set caused those detections to be silently skipped — no base64 was
323+
generated and downstream SVG rendering fell back to a plain polygon
324+
outline. Always normalize both sides.
325+
"""
326+
all_types = {t.lower() for t in IMAGE_PROMPT} | {"arrow", "line", "connector"}
320327
return [
321328
e for e in elements
322329
if e.element_type.lower() in all_types and e.base64 is None

0 commit comments

Comments
 (0)