'How to use tessdata_fast in pytesseract (python)?

I am currently trying to use the Tesseract OCR engine in python on macOS to detect the orientation of text (using image_to_osd).

It currently takes a long time to detect the orientation (300ms), so my aim is to decrease this time. I am trying to use the data set of tessdata_fast, as I believe this would help reduce the time and I am not too concerned about accuracy.

I have used this link: https://github.com/tesseract-ocr/tessdata_fast to download the eng.traineddata and the osd.traineddata in a tessdata_fast folder and added it to the tesseract folder. I have tried to customise the configuration as custom_config = r'--oem 1 --tessdata-dir /usr/local/Cellar/tesseract/5.0.1/share/tessdata_fast --psm 0'. However, the time taken does not seem to decrease, so I am unsure if my configuration is running tessdata_fast or the tessdata previously downloaded.

I have checked the command tesseract --list-langs and it seemed to be reading the tessdata :

"/usr/local/share/tessdata/" (2):
eng
osd

I have tried to delete the previously downloaded tessdata and run the command again but the result is "/usr/local/share/tessdata/" (0):

Does anyone know where I am going wrong? Or what steps should I be taking to run pytesseract with tessdata_fast?

Thank you!



Solution 1:[1]

According to the documentation of pytesseract, there is the argument --tessdata-dir of tesseract and specify the path of your data. Then, add it to the config of pytesseract, as follows:

# Example config: r'--tessdata-dir "C:\Program Files (x86)\Tesseract-OCR\tessdata"'
# It's important to add double quotes around the dir path.
tessdata_dir_config = r'--tessdata-dir "<replace_with_your_tessdata_dir_path>"'
pytesseract.image_to_string(image, lang='chi_sim', config=tessdata_dir_config)

For more details see https://pypi.org/project/pytesseract/.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Rachid Benouini