'Google colab : How do i install traineddata file for pytesseract?
After installing pytesseract package using "pip install" on google colab, i needed to install OCR trained data for other country language, however, i do not know where to copy it..
if I install package by myself using "pip install", where is the location of package on my window PC?
Solution 1:[1]
Installing a package on google colab will not install on your local drive which you are using, initiating a colab environment will create a remote drive where you can check out all the project files.
If you want to know for a specific pip package installation path you can always use
!pip show pytesseract-ocr
it will show you Location: of where the package is installed and then you can always add necessary files onto the package installed directory
Solution 2:[2]
for example if I want to install arabic in google cloud
I will download the file ! wget https://raw.githubusercontent.com/tesseract-ocr/tessdata_best/master/ara.traineddata
then I will move it to dataset ! mv "ara.traineddata" "/usr/share/tesseract-ocr/4.00/tessdata"
then I will pass the parameter to pytesseract which is lang='ara' image_path_in_colab="/content/????-??????.jpg" extract = pytesseract.image_to_string(Image.open(image_path_in_colab) , lang='ara')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Sundeep Pidugu |
| Solution 2 | Mahmoud A Zaher |
