Skip to content

add segmentation level paragraph#222

Open
bertsky wants to merge 8 commits intoOCR-D:masterfrom
bertsky:textequiv-level-para
Open

add segmentation level paragraph#222
bertsky wants to merge 8 commits intoOCR-D:masterfrom
bertsky:textequiv-level-para

Conversation

@bertsky
Copy link
Collaborator

@bertsky bertsky commented Jan 14, 2026

Tesseract allows retrieving paragraphs, so we should also offer this. In OCR-D, we don't usually have this, so offer

  • flat → paragraphs as normal regions
  • recursive → paragraphs inside block-level regions

In both cases, the ReadingOrder will reflect the recursive structure via ordered subgroups.

Robert Sachunsky added 8 commits January 14, 2026 16:50
- add segmentation-related parameter `paragraphs`:
  - default to `none` (for existing behaviour, i.e.
    no paragraph level),
  - add `flat` (for paragraphs *as* regions)
  - add `recursive` (for paragraphs *inside* block regions)
- make new `flat` and `recursive` paragraphs accessible
  on `cell` level via `segmentation_level` and `textequiv_level`
- raise `ValueError` during `setup()` for all nonsensical combinations
@bertsky bertsky requested a review from kba January 14, 2026 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant