Skip to content

Stripped "\" from Windows Tessdata Path #356

@Nilint1

Description

@Nilint1

When trying to run pytesseract from a windows 10 environment, I found that the config was stripping the \ in my tessdata path.

I found that the solution to fix this was in pytesseract.py: line 251:

if config:
cmd_args += shlex.split(config, posix=False)

To confirm it was the issue, here was the config before this section of code:
--tessdata-dir E:\Projects\fhm-test-automation\tessdata --oem 3 --psm 3

And after the shlex.split(), the cmd_args was now:
['tesseract', 'C:\\Users\\nickr\\AppData\\Local\\Temp\\tess_ai78chdv.PNG', 'C:\\Users\\nickr\\AppData\\Local\\Temp\\tess_ai78chdv', '-l', 'eng', '--tessdata-dir', 'E:Projectsfhm-test-automationtessdata', '--oem', '3', '--psm', '3']

Please note that I truncated off the extra, unneeded information.

As you can see, the --tessdata-dir was altered, thus giving the dreaded TESSDATA_PREFIX error.

I would make a PR myself, but I do not know enough about the system to know the consequences on my actions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions