'Character Segmentation and Recognition for Unevenly Spaced Digits
I have a image of number as shown below.
I segmented the number above into it's digits using methods of Adaptive Thresholding and detecting contours and placing a restriction of height and weight for bounding rectangle to be greater than 15 to get following segmented digits.
Instead of above output, I would like to segment the number in the image above so as to get each digit individually. This result further after resizing to (28, 28) can be fed to CNN of MNIST for better prediction of particular digits.So, is there any other neat way of segmenting this number in image into individual digits?
One method as mentioned here suggests to slide a green window of fixed size and detect the digits by training a Neural Net. So,how will this NN be trained to classify the digits? This method avoids the OpenCV approach to separate each individual digit but just sliding window over whole image won't that be a little expensive. How to deal with positive and negative examples while training (should i create a separate dataset... positive examples can be of mnist digits but what about negative examples.)?
Segmentation:
img = cv2.imread('Image')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray,(3,3), 0)
thresh = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_MEAN_C,\
cv2.THRESH_BINARY_INV, 7,10)
thresh = clear_border(thresh)
# find contours in the thresholded image, then initialize the
# list of group locations
clone = np.dstack([gray.copy()] * 3)
groupCnts = cv2.findContours(thresh.copy(), cv2.RETR_TREE,
cv2.CHAIN_APPROX_SIMPLE)
groupCnts = groupCnts[0] if imutils.is_cv2() else groupCnts[1]
groupLocs = []
clone = np.dstack([gray.copy()] * 3)
# loop over the group contours
for (i, c) in enumerate(groupCnts):
# compute the bounding box of the contour
(x, y, w, h) = cv2.boundingRect(c)
# only accept the contour region as a grouping of characters if
# the ROI is sufficiently large
if w >= 15 and h >= 15:
print (i, (x, y, w, h))
cv2.rectangle(clone, (x,y), (x+w, y+h), (255,0,0), 1)
groupLocs.append((x, y, w, h))
Sliding Window:
clf = joblib.load("digits_cls.pkl") #mnist trained classifier
img = cv2.imread('Image', 0)
winW, winH = (22, 40)
cv2.imshow("Window0", img)
cv2.waitKey(1)
blur = cv2.GaussianBlur(img, (5,5),0)
thresh = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,\
cv2.THRESH_BINARY,11,2)
thresh = clear_border(thresh)
for (x, y, window) in sliding_window(img, stepSize=10, windowSize=(winW, winH)):
if (window.shape[0] != winH or window.shape[1] != winW):
continue
clone = img.copy()
roi = thresh[y:y+winH, x:x+winW]
roi = cv2.resize(roi, (28, 28), interpolation=cv2.INTER_AREA)
roi = cv2.dilate(roi, (3, 3))
cv2.imshow("Window1", roi)
cv2.waitKey(1)
roi_hog_fd = hog(roi, orientations=9, pixels_per_cell=(14, 14), cells_per_block=(1, 1), visualise=False)
nbr = clf.predict(np.array([roi_hog_fd], 'float64'))
print (nbr)
# since we do not have a classifier, we'll just draw the window
clone = img.copy()
cv2.rectangle(clone, (x, y), (x + winW, y + winH), (0, 255, 0), 2)
cv2.imshow("Window2", clone)
cv2.waitKey(1)
time.sleep(0.95)
Weird Output(even for blank window it predicts): 522637753787357777722
Seperating joined digits:
h,w = img.shape[:2]
count = 0
iw = 15
dw = w
sw, sh = int(0), int(0)
while (dw > 0):
new_img = img[:, sw:(count+1)*iw]
dw = dw - iw
sw = sw + iw
if (dw-iw < 0):
iw = w
new = os.path.join('amount/', 'amount_'+ str(count)+'.png')
cv2.imwrite(new, new_img)
Having found a way to seprate these joined digits and feeding them to mnist trained classifier, output is inaccurate yet.
Steps I used :
(i)Extract first image
(ii)Segment first image into separate image i.e. get 2nd image.
(iii)See if image width is exceeding some threshold if yes segment it further to yield separate digit (in case of joined digits as above)
(iv) Feed all the separate digits obtained after step 3 to mnist classifier to get the prediction of digit based on reshaped image.Lengthy right? Is there any other efficient way to convert first image to digits directly (yes I used pytesseract too!!)?
Solution 1:[1]
Training a new neural net will be an elegant solution if you have the time and resource to do so.
To separate each of the digits individually, you can try to inverse the intensity of the image so the handwriting is white and background is black. Then project the values horizontally (sum all pixel values horizontally) and look for the peaks. Every peak location should indicate a new character location.
Extra smoothing function on the projected graph should refine the character locations.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | yapws87 |







