'How to perform OCR for several contours

I have a code that identifies contours in a licence plates, however I don't know how to extract the letters using pytesseract for each individual contour. This is the original image:

enter image description here

This is the code:

        import cv2
        import numpy as np
        import pytesseract


        image = cv2.imread('c1.png')
        cv2.waitKey(0)


        gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)


        edged = cv2.Canny(gray, 30, 200)
        cv2.waitKey(0)
        blured = cv2.blur(gray, (5,5), 0)    
        img_thresh = cv2.adaptiveThreshold(blured, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 2)
        rect_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (30, 10))
        threshed = cv2.morphologyEx(img_thresh, cv2.MORPH_CLOSE, rect_kernel)

        contours, hierarchy = cv2.findContours(edged,
            cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
        for contour in contours:
            if cv2.contourArea(contour) > 170:
                [X, Y, W, H] = cv2.boundingRect(contour)
                cv2.rectangle(image, (X, Y), (X + W, Y + H), (0,0,255), 2)
                
                mask = np.zeros(gray.shape,np.uint8)
                new_image = cv2.drawContours(mask,[contour],0,255,-1,)
                new_image = cv2.bitwise_and(image,image,mask=mask)
                (x, y) = np.where(mask == 255)
                (topx, topy) = (np.min(x), np.min(y))
                (bottomx, bottomy) = (np.max(x), np.max(y))
                
                Cropped0 = gray[topx:bottomx+2, topy:bottomy+2]


        cv2.imshow('Canny Edges After Contouring', edged)
        cv2.waitKey(0)


        print("Number of Contours found = " + str(len(contours)))
        result_number = pytesseract.image_to_string(new_image, lang='eng')
        print("Detected Number is:",result_number)

        cv2.imshow('Contours', image)
        cv2.waitKey(0)
        cv2.imshow('new_image', Cropped0)
        cv2.waitKey(0)
        cv2.destroyAllWindows()

The output of the code:

enter image description here

I tried cropping the image after identifying the contours for the OCR but it only crops number 5 for some reason. Also I would like to ignore the contours containing Arabic letters would it be possible to exclude these two from the recognition process?. Can someone please help me on this.



Solution 1:[1]

Finding contours on the edge image is giving you inconsistent contours.

In the following code I have done:

  • Otsu threshold on the grayscale image and inverted the result th
  • Found contours on th and cropped them based on area

Code:

# Function to show image:
def show(img):
    cv2.namedWindow('image',cv2.WINDOW_NORMAL)
    cv2.imshow('image',img)
    cv2.waitKey(0)

img = cv2.imread('image_path')
img1 = img.copy()
img_g = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

th = cv2.threshold(img_g,127,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)[1]

contours, hierarchy = cv2.findContours(edged,cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
for contour in contours:
    if cv2.contourArea(contour) > 170:
        [X, Y, W, H] = cv2.boundingRect(contour)
        cv2.rectangle(img1, (X, Y), (X + W, Y + H), (0,0,255), 2)
        Cropped0 = th[Y - 2:Y + H +2, X - 2:X + W + 2]
        show(Cropped0)

Results:

th result:

enter image description here

img1 result:

enter image description here

Some cropped digits:

enter image description here enter image description here enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1