'Pytesseract: Error opening data file \\Program Files (x86)\\Tesseract-OCR\\en.traineddata
I am trying to use pytesseract on Jupyter Notebook.
- Windows 10 x64
 - Running Jupyter Notebook (Anaconda3, Python 3.6.1) with administrative privilege
 - The work directory containing TIFF file is in different drive (Z:)
 
When I run the following code:
try:
    import Image
except ImportError:
    from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'
tessdata_dir_config = '--tessdata-dir "C:\\Program Files (x86)\\Tesseract-OCR\\tessdata"'
print(pytesseract.image_to_string(Image.open('Multi_page24bpp.tif'), lang='en', config = tessdata_dir_config))
I get the following error:
TesseractError                            Traceback (most recent call last)
<ipython-input-37-c1dcbc33cde4> in <module>()
     11 # tessdata_dir_config = '--tessdata-dir "C:\\Program Files (x86)\\Tesseract-OCR\\tessdata"'
     12 
---> 13 print(pytesseract.image_to_string(Image.open('Multi_page24bpp.tif'), lang='en'))
     14 # print(pytesseract.image_to_string(Image.open('test-european.jpg'), lang='fra'))
C:\Users\cpcho\AppData\Local\Continuum\Anaconda3\lib\site-packages\pytesseract\pytesseract.py in image_to_string(image, lang, boxes, config)
    123         if status:
    124             errors = get_errors(error_string)
--> 125             raise TesseractError(status, errors)
    126         f = open(output_file_name, 'rb')
    127         try:
TesseractError: (1, 'Error opening data file \\Program Files (x86)\\Tesseract-OCR\\en.traineddata')
I found these two references helpful but I am missing something: https://github.com/madmaze/pytesseract/issues/50 https://github.com/madmaze/pytesseract/issues/64
Thank you for your time on this!
Solution 1:[1]
From your post, observed two possible issues.
All the trained language data should be saved in
TESSDATA_PREFIX, a Windows environmental variable, which is atC:\Program Files (x86)\Tesseract-OCR\tessdatain your case.The
tesseracttrained English data is namedeng.traineddata(i.e.'eng') unless you modified its name. Refer to this Tesseract Data Files for more information.
In addition, for pytesseract to read the image file Image.open(), you may include the full file path (e.g. 'z:\\path\\to\\image') if the image file is unable to locate.
Hope to this.
Solution 2:[2]
I faced the same problem. I tried all solutions on Google, without success. Finally, I solved the problem by replacing.
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe' 
with
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract.exe'.
    					Solution 3:[3]
If you don't want to set environment variable you can pass as an argument as well
For example:
First, do your imports
    import pytessetact
    from PIL import Image
And now configure pytesseract
    pytesseract.pytesseract.tesseract_cmd = "C:/path_to_your_tesseract.exe"
    tessdata_dir_config = '--tessdata-dir "C:/path_to_your_tessdata_folder"'
    pytesseract.image_to_string(image, config=tessdata_dir_config)
    					Solution 4:[4]
Day 1 -all works; Day 2 -this error; on second computer all works... 5 hours later: ===i find ANSWER in my mind===
From "C:\Program Files\Tesseract-OCR\tessdata" copy 'eng.traineddata' to "C:\Program Files\Tesseract-OCR"
its work =\
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source | 
|---|---|
| Solution 1 | thewaywewere | 
| Solution 2 | Isma | 
| Solution 3 | sam | 
| Solution 4 | ???????? ?????? | 
