'How to remove noise artifacts from an image for OCR with Python OpenCV?

I have subsets of images that contains digits. Each subset is read by Tesseract for OCR. Unfortunately for some images the cropping from the original image isn't optimal.

enter image description here

Hence some artifacts/remains at the top and bottom of the image and hamper Tesseract to recognize characters on the image. Then I would like to get rid of these artifacts and get to a similar result:

enter image description here

First I considered a simple approach: I set the first row of pixels as the reference: if an artifact was found along the x-axis (i.e., a white pixel if the image is binarized), I removed it along the y-axis until the next black pixel. Code for this approach is the one below:

import cv2
inp = cv2.imread("testing_file.tif")
inp = cv2.cvtColor(inp, cv2.COLOR_BGR2GRAY)
_,inp = cv2.threshold(inp, 150, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)

ax = inp.shape[1]
ay = inp.shape[0]

out = inp.copy()
for i in range(ax):
    j = 0
    while j in range(ay):
        if out[j,i] == 255:
            out[j,i] = 0
        else:
            break
        j+=1

out = cv2.bitwise_not(out)    
cv2.imwrite('output.png',out)

But the result isn't good at all:

enter image description here

Then I stumbled across the flood_fill function from scipy (here) but found out it was too much time consuming and still not efficient. A similar question was asked on SO here but didn't help so much. Maybe a k-nearest neighbor approach could be considered? I also found out that methods that consist in merging neighbors pixels under some criteria were called growing methods, among which the single linkage is the most common (here).

What would you recommend to remove the upper and lower artifacts?



Solution 1:[1]

Here's a simple approach:

  • Convert image to grayscale
  • Otsu's threshold to obtain binary image
  • Cerate special horizontal kernel and dilate
  • Detect horizontal lines, sort for largest contour, and draw onto mask
  • Bitwise-and

After converting to grayscale, we Otsu's threshold to get a binary image

enter image description here

# Read in image, convert to grayscale, and Otsu's threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

Next we create a long horizontal kernel and dilate to connect the numbers together

enter image description here

# Create special horizontal kernel and dilate 
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_CROSS, (70,1))
dilate = cv2.dilate(thresh, horizontal_kernel, iterations=1)

From here we detect horizontal lines and sort for the largest contour. The idea is that the largest contour will be the middle section of the numbers where the numbers are all "complete". Any smaller contours will be partial or cut off numbers so we filter them out here. We draw this largest contour onto a mask

enter image description here

# Detect horizontal lines, sort for largest contour, and draw on mask
mask = np.zeros(image.shape, dtype=np.uint8)
detected_lines = cv2.morphologyEx(dilate, cv2.MORPH_OPEN, horizontal_kernel, iterations=1)
cnts = cv2.findContours(detected_lines, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)
for c in cnts:
    cv2.drawContours(mask, [c], -1, (255,255,255), -1)
    break

Now that we have the outline of the desired numbers, we simply bitwise-and with our original image and color the background white to get our result

enter image description here

# Bitwise-and to get result and color background white
mask = cv2.cvtColor(mask,cv2.COLOR_BGR2GRAY)
result = cv2.bitwise_and(image,image,mask=mask)
result[mask==0] = (255,255,255)

Full code for completeness

import cv2
import numpy as np

# Read in image, convert to grayscale, and Otsu's threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

# Create special horizontal kernel and dilate 
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_CROSS, (70,1))
dilate = cv2.dilate(thresh, horizontal_kernel, iterations=1)

# Detect horizontal lines, sort for largest contour, and draw on mask
mask = np.zeros(image.shape, dtype=np.uint8)
detected_lines = cv2.morphologyEx(dilate, cv2.MORPH_OPEN, horizontal_kernel, iterations=1)
cnts = cv2.findContours(detected_lines, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)
for c in cnts:
    cv2.drawContours(mask, [c], -1, (255,255,255), -1)
    break

# Bitwise-and to get result and color background white
mask = cv2.cvtColor(mask,cv2.COLOR_BGR2GRAY)
result = cv2.bitwise_and(image,image,mask=mask)
result[mask==0] = (255,255,255)

cv2.imshow('thresh', thresh)
cv2.imshow('dilate', dilate)
cv2.imshow('result', result)
cv2.waitKey()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 nathancy