Skip to content

MacOS uses Tesseract and not Tesseract-OCR #4565

@avigoen

Description

@avigoen

Description of the bug

pymupdf/__init__.py in ?(tessdata)
  17818     # Unix-like systems:
  17819     cp = subprocess.run("whereis tesseract-ocr", shell=1, capture_output=1, check=0, text=True)
  17820     response = cp.stdout.strip().split()
  17821     if cp.returncode or len(response) != 2:  # if not 2 tokens: no tesseract-ocr
> 17822         raise RuntimeError("No tessdata specified and Tesseract is not installed")
  17823 
  17824     # search tessdata in folder structure
  17825     dirname = response[1]  # contains tesseract-ocr installation folder

RuntimeError: No tessdata specified and Tesseract is not installed

How to reproduce the bug

PyMuPDF installation command:
uv add pymupdf

Issue:

for page in doc:
    textPage = page.get_textpage_ocr()
    print(textPage.extract_text())

On running the above script, I am getting the error

I can see that on MacOS, tesseract is installed using brew install tesseract and has no package for tesseract-ocr

Tesseract Installation Proof:
tesseract: /opt/homebrew/bin/tesseract
tesseract-ocr:

PyMuPDF version

1.26.1

Operating system

MacOS

Python version

3.12

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions