Skip to content

fix(ocr): Solve the issue of missing some lines and spans due to adhesion during OCR#91

Merged
myhloli merged 2 commits intoopendatalab:xiaomeng_devfrom
myhloli:fix_ocr_lost-line_span-adjoin
Aug 16, 2024
Merged

fix(ocr): Solve the issue of missing some lines and spans due to adhesion during OCR#91
myhloli merged 2 commits intoopendatalab:xiaomeng_devfrom
myhloli:fix_ocr_lost-line_span-adjoin

Conversation

@myhloli
Copy link
Copy Markdown
Collaborator

@myhloli myhloli commented Aug 16, 2024

  • fix(ocr): decrease detection threshold and increase padding for better text extraction

Decrease the detection box threshold from 0.6 to 0.3 to ensure more text areas are identified,
and increase the padding around each detected area from 25 to 50 pixels. This leads to a more comprehensive text extraction from documents.

  • fix(self_modify): merge detection boxes for optimized text region detection

Merge adjacent and overlapping detection boxes to optimize text region detection in
the document. Post processing of text boxes is enhanced by consolidating them into
larger text lines, taking into account their vertical and horizontal alignment. This
improvement reduces fragmentation and improves the readability of detected text blocks.

…tter text extraction

Decrease the detection box threshold from 0.6 to 0.3 to ensure more text areas are identified,
and increase the padding around each detected area from 25 to 50 pixels. This leads to a more comprehensive text extraction from documents.```
…ection

Merge adjacent and overlapping detection boxes to optimize text region detection in
the document. Post processing of text boxes is enhanced by consolidating them into
larger text lines, taking into account their vertical and horizontal alignment. This
improvement reduces fragmentation and improves the readability of detected text blocks.
@myhloli myhloli merged commit 5d8ec92 into opendatalab:xiaomeng_dev Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant