'Extracting contours bounding boxes for ROI's from image using opencv [duplicate]

I am trying to extract bounding boxes from this form image. The Bounding Boxes in my case are all the boxes in the image. My approach was to Find contours, obtain the bounding box, extract the ROI and perform OCR using pytesseract on those ROI's. I am not able to find the right contours. Is my approach the right way or should I try a different solution. Thanks in advance.

My code so far looks as follows

import cv2
import pytesseract


image = cv2.imread('DocOrigin_Government_W2_2014_Red-ScanL.jpg')
original = image.copy()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2) #adaptive
canny = cv2.Canny(thresh, 100, 200)


# Find contours, obtain bounding box, extract and save ROI
ROI_number = 0
cnts = cv2.findContours(canny, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    x,y,w,h = cv2.boundingRect(c)
    # x,y,w,h = 37, 625, 309, 28  
    ROI = thresh[y:y+h,x:x+w]
    data = pytesseract.image_to_string(ROI, lang='eng',config='--psm 6')
    print(data)
    # write contour images to disk
    # cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 2)
    # cv2.imwrite('ROI_{}.png'.format(ROI_number), ROI)
    # ROI_number += 1

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Extracting contours bounding boxes for ROI's from image using opencv [duplicate]

Sources

Related Questions