An API to just perform recognition and get confidence

By defauly, `tesseract` does both detection and recognition.
Is it possible to have an API for `recognize()` which would just perform recognition and return the output text with confidence?
Or atleast simulate it?

The `pytesseract.image_to_string()` call only gives the recognized text.

For `image_recognize()`, we could do something like this for output_type `dict`:
```
def recognize(img):
    data = pytesseract.image_to_data(img, lang=self.lang_str, output_type='dict')
    texts = []
    avg_confidence = 0
    total_bboxes = 0
    # assert len(data['text']) == 1 # Should contain only 1 bbox
    for i in range(len(data['text'])):
        text, conf = data['text'][i].strip(), float(data['conf'][i]) / 100.0
        if conf < 0 or not text:
            continue
        total_bboxes += 1
        avg_confidence += conf
        texts.append(text)
    
    if not total_bboxes:
        return {}
    return {
        'text': ' '.join(texts),
        'confidence': avg_confidence/total_bboxes
    }
```

Can you please take this as a feature request?
This would be helpful if someone is using their own detector and want to just perform recognition using tesseract.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An API to just perform recognition and get confidence #286

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

An API to just perform recognition and get confidence #286

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions