'Pytesseract wrong number detection

My code is:

import cv2,numpy
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"  # For Windows OS

def scan(image):
    try:
        img = cv2.cvtColor(numpy.array(image), cv2.COLOR_RGB2BGR)
    except:
        img = cv2.imread(image)

    # Apply OCR
    data = pytesseract.image_to_string(img, config="-c tessedit"
                                                   "_char_whitelist=1234567890"
                                                   " --psm 6"
                                                   " ")
    return data

And when I make it scan this image it just gives me ''. Nothing. I don't know whats wrong, works on every other digit number, what should I change? If you have some python ocr that works on this image, you can also send it.



Solution 1:[1]

Using Tesseract or any OCR can get really tricky. The pictures you mentioned worked perfectly might have better quality or are closely related to the dataset version you are using in your code/computer.

Some basic steps you can do to improve this are:

  1. Add a new trained data file that has similar font to the font you are trying to detect
  2. Do some preprocessing on the image, sharpen it, change resolution and color, basically the whole routine till you find the perfect mix
  3. Try a different OCR

Let me know if this works!

Solution 2:[2]

Read the documentation, understand what are you doing and you will get the correct result. Hint: pretending that the single character is a uniform block of text is not wise.

Solution 3:[3]

Your picture works for me. My guess is that you didn't successfully read the image? You can debug by print(img.shape) or if img is None: print('None'). Python might be operating in a different directory. os.getcwd() gets Pythons current working directory. You can also do os.path.isfile(image) to see if Python can find file where you are looking.

This is what I tried:

import cv2,numpy
import pytesseract

# ~ pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"  # For Windows OS

img = cv2.imread('niner.png')

# Apply OCR
data = pytesseract.image_to_string(img, config="-c tessedit"
                                               "_char_whitelist=1234567890"
                                               " --psm 6"
                                               " ")
print('tesseract version: ', pytesseract.get_tesseract_version())
print('=============================================')
print(data)

and the result is:

tesseract version:  4.0.0.20181030
 leptonica-1.76.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.2.0

=============================================
9
?

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Ali Zahid Raja
Solution 2 user898678
Solution 3