'python tesseract OCR taking 13~ seconds to read a 1000x14000 website screenshot, is there a way to speed this up?

so basically the whole question is in the title: I have an extremely simple pytesseract script running on a screenshot of a website with a bit non-standard dimensions of 1000x14000 px. The issue is that the following script is taking approximately 13 seconds to run. In all of the examples on yt I've seen, the tesseract scripts are basically instantaneous.

My question is - is the long execution time caused by the dimension? Or is there a known issue that can be causing this to happen? I have tesseract 5.0.1 installed, and run python 3.10. I have 16 gb of ram and a 4 core 3.5 ghz CPU (i5-6600K), so I don't think this can be the bottleneck. Is there a way to speed this up? I guess I don't particularily need to use python, can switching to c++ help here? But even still, I don't think this should be taking 13 seconds, what can be wrong here?

import time
import cv2
import pytesseract
start_time = time.time()
img = cv2.imread('myimage.png')
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract'

print(pytesseract.image_to_string(img))
print("--- %s seconds ---" % (time.time() - start_time))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'python tesseract OCR taking 13~ seconds to read a 1000x14000 website screenshot, is there a way to speed this up?

Sources

Related Questions