'Google colab : How do i install traineddata file for pytesseract?

After installing pytesseract package using "pip install" on google colab, i needed to install OCR trained data for other country language, however, i do not know where to copy it..

if I install package by myself using "pip install", where is the location of package on my window PC?



Solution 1:[1]

Installing a package on google colab will not install on your local drive which you are using, initiating a colab environment will create a remote drive where you can check out all the project files.

If you want to know for a specific pip package installation path you can always use

!pip show pytesseract-ocr

it will show you Location: of where the package is installed and then you can always add necessary files onto the package installed directory

Solution 2:[2]

for example if I want to install arabic in google cloud

I will download the file ! wget https://raw.githubusercontent.com/tesseract-ocr/tessdata_best/master/ara.traineddata

then I will move it to dataset ! mv "ara.traineddata" "/usr/share/tesseract-ocr/4.00/tessdata"

then I will pass the parameter to pytesseract which is lang='ara' image_path_in_colab="/content/????-??????.jpg" extract = pytesseract.image_to_string(Image.open(image_path_in_colab) , lang='ara')

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Sundeep Pidugu
Solution 2 Mahmoud A Zaher