'SIFT match computer vision

I need to determine the location of yogurts in the supermarket. Source photo looks like enter image description here

With template:

enter image description here

I using SIFT to extract key points of template:

img1 = cv.imread('train.jpg')
sift = cv.SIFT_create()# queryImage
kp1, des1 = sift.detectAndCompute(img1, None)
path = glob.glob("template.jpg")
cv_img = []
l=0

for img in path:
    img2 = cv.imread(img) # trainImage
    # Initiate SIFT detector

    # find the keypoints and descriptors with SIFT

    kp2, des2 = sift.detectAndCompute(img2,None)
    # FLANN parameters
    FLANN_INDEX_KDTREE = 1
    index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
    search_params = dict(checks=50)   # or pass empty dictionary
    flann = cv.FlannBasedMatcher(index_params,search_params)
    matches = flann.knnMatch(des1,des2,k=2)
    # Need to draw only good matches, so create a mask

    # ratio test as per Lowe's paper

    if (l < len(matches)):
        l = len(matches)
        image = img2
        match = matches

        
        
    h_query, w_query, _= img2.shape

    matchesMask = [[0,0] for i in range(len(match))]
    good_matches = []
    good_matches_indices = {}
    for i,(m,n) in enumerate(match):
        if m.distance < 0.7*n.distance:
            matchesMask[i]=[1,0]
            good_matches.append(m)
            good_matches_indices[len(good_matches) - 1] = i

    
    bboxes = []
    
    
    
    src_pts = np.float32([ kp1[m.queryIdx].pt for m in good ]).reshape(-1,2)
    dst_pts = np.float32([ kp2[m.trainIdx].pt for m in good ]).reshape(-1,2)

    model, inliers = initialize_ransac(src_pts, dst_pts)
    
    n_inliers = np.sum(inliers)
    matched_indices = [good_matches_indices[idx] for idx in inliers.nonzero()[0]]
    
    print(len(matched_indices))
    model, inliers = ransac(
        (src_pts, dst_pts),
        AffineTransform, min_samples=4,
        residual_threshold=4, max_trials=20000
    )

    n_inliers = np.sum(inliers)
    print(n_inliers)
    matched_indices = [good_matches_indices[idx] for idx in inliers.nonzero()[0]]
    print(matched_indices)
    

    q_coordinates = np.array([(0, 0), (h_query, w_query)])
    coords = model.inverse(q_coordinates)
    print(coords)
    
    h_query, w_query,_ = img2.shape
    q_coordinates = np.array([(0, 0), (h_query, w_query)])
    coords = model.inverse(q_coordinates)
    print(coords)
#     bboxes_list.append((i, coords))
                

    M, mask = cv.findHomography(src_pts, dst_pts, cv.RANSAC, 2)

    draw_params = dict(matchColor = (0,255,0),
                        singlePointColor = (255,0,0),
                        matchesMask = matchesMask,
                        flags = cv.DrawMatchesFlags_DEFAULT)

    img3 = cv.drawMatchesKnn(img1,kp1,image,kp2,match,None,**draw_params)

    plt.imshow(img3),plt.show()

Result of SIFT looks like

enter image description here

The question is what is the best way to clasterise points to obtain rectangles, representing each yogurt? I tried RANSAC, but this method doesn't work in this case.



Solution 1:[1]

I am proposing an approach based on what is discussed in this paper. I have modified the approach a bit because the use-case is not entirely same but they do use SIFT features matching to locate multiple objects in video frames. They have used PCA for reducing time but that may not be required for still images.

Sorry I could not write a code for this as it will take a lot of time but I believe this should work to locate all the occurrences of the template object.

The modified approach is like this:

Divide the template image into regions: left, middle, right along the horizontal and top, bottom along the vertical

Now when you match features between the template and source image, you will get features matched from some of the keypoints from these regions on multiple locations on the source image. You can use these keypoints to identify which region of the template is present at what location(s) in the source image. If there are overlapping regions i.e. keypoints from different regions matched with close keypoints in source image then that would mean a wrong match.

Mark each set of matching keypoints within a neighborhood on source image as left, center, right, top, bottom depending upon if they have majority matches from keypoints of a particular region in the template image.

Starting from each left region on source image move towards right and if we find a central region followed by a right region then this area of source image between regions marked as left and right, can be marked as location of one template object.

There could be overlapping objects which could result in a left region followed by another left region when moving in right direction from the left region. The area between the two left regions can be marked as one template object.

For further refined locations, each area of source image marked as one template object can be cropped and re-matched with the template image.

Solution 2:[2]

Try working spatially: for each key-point in img2 get some bounding box around and consider only the points in there for your ransac homography to check for best fit.

You can also work with overlapping windows and later discard similar resulting homographys

Solution 3:[3]

Here is you can do

Base Image = Whole picture of shelf

Template Image = Single product image

  1. Get SIFT matches from both images. (base and template image)
  2. Do feature matching.
  3. Get all the points in base image which are matching. (refer to figure)
  4. Create Cluster based on size of template image. (here threshold in 50px)
  5. Get Bounding box of clusters.
  6. Crop each bounding box cluter and check matches with template image.
  7. Accept all cluters which has atleast minimum percentage of matched. (here taken minimum 10% of keypoints)
    def plot_pts(img, pts):
        img_plot = img.copy() 
        for i in range(len(pts)):
            img_plot = cv2.circle(img_plot, (int(pts[i][0]), int(pts[i][1])), radius=7, color=(255, 0, 0), thickness=-1)
    
        plt.figure(figsize=(20, 10))
        plt.imshow(img_plot)
        
    def plot_bbox(img, bbox_list):
        img_plot = img.copy() 
    
        
        for i in range(len(bbox_list)):
            start_pt = bbox_list[i][0]
            end_pt = bbox_list[i][2]
            img_plot = cv2.rectangle(img_plot, pt1=start_pt, pt2=end_pt, color=(255, 0, 0), thickness=2)
    
        plt.figure(figsize=(20, 10))
        plt.imshow(img_plot)
        
    def get_distance(pt1, pt2):
        x1, y1 = pt1
        x2, y2 = pt2    
        return np.sqrt(np.square(x1 - x2) + np.square(y1 - y2))
    
    def check_centroid(pt, centroid):
        x, y = pt
        cx, cy = centroid
        
        distance = get_distance(pt1=(x, y), pt2=(cx, cy))
        if distance < max_distance:
            return True
        else:
            return False
        
    def update_centroid(pt, centroids_list):
        new_centroids_list = centroids_list.copy()
        
        flag_new_centroid = True
        
        for j, c in enumerate(centroids_list):
            temp_centroid = np.mean(c, axis=0)
            if_close = check_centroid(pt, temp_centroid)
            if if_close:
                new_centroids_list[j].append(pt)
                flag_new_centroid = False
                break
            
        if flag_new_centroid:
            new_centroids_list.append([pt])
    
        new_centroids_list = recheck_centroid(new_centroids_list)
        return new_centroids_list
    
    
    def recheck_centroid(centroids_list):
        new_centroids_list = [list(set(c)) for c in centroids_list]
        return new_centroids_list
    
    
    def get_bbox(pts):
        minn_x, minn_y = np.min(pts, axis=0)
        maxx_x, maxx_y = np.max(pts, axis=0)
        
        return [[minn_x, minn_y], [maxx_x, minn_y], [maxx_x, maxx_y], [minn_x, maxx_y]]


    class RotateAndTransform:
        def __init__(self, path_img_ref):
            self.path_img_ref = path_img_ref        
            self.ref_img = self._read_ref_image()
            
            #sift
            self.sift = cv2.SIFT_create()
            
            #feature matching
            self.bf = cv2.BFMatcher()
            
            # FLANN parameters
            FLANN_INDEX_KDTREE = 1
            index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
            search_params = dict(checks=50)   # or pass empty dictionary
            self.flann = cv2.FlannBasedMatcher(index_params,search_params)
    
        def _read_ref_image(self):
            ref_img = cv2.imread(self.path_img_ref, cv2.IMREAD_COLOR)  
            ref_img = cv2.cvtColor(ref_img, cv2.COLOR_BGR2RGB)
            return ref_img
    
        def read_src_image(self, path_img_src):
            self.path_img_src = path_img_src
            
            # read images
            # ref_img = cv2.imread(self.path_img_ref, cv2.IMREAD_COLOR)  
            src_img = cv2.imread(path_img_src, cv2.IMREAD_COLOR)
            
            
            src_img = cv2.cvtColor(src_img, cv2.COLOR_BGR2RGB)
    
            return src_img
        
        def convert_bw(self, img):
            img_bw = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
            return img_bw
        
        def get_keypoints_descriptors(self, img_bw):
            keypoints, descriptors = self.sift.detectAndCompute(img_bw,None)
            return keypoints, descriptors
        
        def get_matches(self, src_descriptors, ref_descriptors, threshold=0.6):
            matches = self.bf.knnMatch(ref_descriptors, src_descriptors, k=2)
            flann_matches = self.flann.knnMatch(ref_descriptors, src_descriptors,k=2)
    
            good_matches = []
            good_flann_matches = []
    
            # Apply ratio test for Brute Force
            for m,n in matches:
                if m.distance <threshold*n.distance:
                    good_matches.append([m])
    
            print(f'Numner of BF Match: {len(matches)}, Number of good  BF Match: {len(good_matches)}')
    
            # Apply ratio test for FLANN
            for m,n in flann_matches:
                if m.distance < threshold*n.distance:
                    good_flann_matches.append([m])
    
            # matches = sorted(matches, key = lambda x:x.distance)
            print(f'Numner of FLANN Match: {len(flann_matches)}, Number of good Flann Match: {len(good_flann_matches)}')
            
            return good_matches, good_flann_matches
        
        
        def get_src_dst_pts(self, good_flann_matches, ref_keypoints, src_keypoints):
            pts_src = []
            pts_ref = []
            n = len(good_flann_matches)
    
            for i in range(n):
                ref_index = good_flann_matches[i][0].queryIdx
                src_index = good_flann_matches[i][0].trainIdx
    
                pts_src.append(src_keypoints[src_index].pt)
                pts_ref.append(ref_keypoints[ref_index].pt)
    
            return np.array(pts_src), np.array(pts_ref)
        
    def extend_bbox(bbox, increment=0.1):
        bbox_new = bbox.copy()
        bbox_new[0] = [bbox_new[0][0] - int(bbox_new[0][0] * increment), bbox_new[0][1] - int(bbox_new[0][1] * increment)]
        bbox_new[1] = [bbox_new[1][0] + int(bbox_new[1][0] * increment), bbox_new[1][1] - int(bbox_new[1][1] * increment)]
        bbox_new[2] = [bbox_new[2][0] + int(bbox_new[2][0] * increment), bbox_new[2][1] + int(bbox_new[2][1] * increment)]
        bbox_new[3] = [bbox_new[3][0] - int(bbox_new[3][0] * increment), bbox_new[3][1] + int(bbox_new[3][1] * increment)]
        return bbox_new
    
    def crop_bbox(img, bbox):
        y, x = bbox[0]
        h, w = bbox[1][0] - bbox[0][0], bbox[2][1] - bbox[0][1]
        return img[x: x + w, y: y + h, :]

    base_img = cv2.imread(path_img_base)
    ref_img = cv2.imread(path_img_ref)
    
    rnt = RotateAndTransform(path_img_ref)
    ref_img_bw = rnt.convert_bw(img=rnt.ref_img)
    ref_keypoints, ref_descriptors = rnt.get_keypoints_descriptors(ref_img_bw)
    
    base_img = rnt.read_src_image(path_img_src = path_img_base)
    base_img_bw = rnt.convert_bw(img=base_img)
    
    base_keypoints, base_descriptors = rnt.get_keypoints_descriptors(base_img_bw)
    good_matches, good_flann_matches = rnt.get_matches(src_descriptors=base_descriptors, ref_descriptors=ref_descriptors, threshold=0.6)
    
    
    ref_points = []
    
    for gm in good_flann_matches:
        x, y = ref_keypoints[gm[0].queryIdx].pt
        x, y = int(x), int(y)
        ref_points.append((x, y))

    max_distance = 50
    
    centroids = [[ref_points[0]]]
    
    for i in tqdm(range(len(ref_points))):
        pt = ref_points[i]
        centroids = update_centroid(pt, centroids)
        
    bbox = [get_bbox(c) for c in centroi[![enter image description here][1]][1]ds]
    centroids = [np.mean(c, axis=0) for c in centroids]
    print(f'Number of Points: {len(good_flann_matches)}, centroids: {len(centroids)}')

    data = []
    for i in range(len(bbox)):
        temp_crop_img = crop_bbox(ref_img, extend_bbox(bbox[i], 0.01))
        temp_crop_img_bw = rnt.convert_bw(img=temp_crop_img)
    
        temp_crop_keypoints, temp_crop_descriptors = rnt.get_keypoints_descriptors(temp_crop_img_bw)
    
        good_matches, good_flann_matches = rnt.get_matches(src_descriptors=base_descriptors, ref_descriptors=temp_crop_descriptors, threshold=0.6)
        
        temp_data = {'image': temp_crop_img,
                     'num_matched': len(good_flann_matches),
                     'total_keypoints' : len(base_keypoints),
                    }
        
        data.append(temp_data)


    filter_data = [{'num_matched' : i['num_matched'], 'image': i['image']} for i in data if i['num_matched'] > 25]
    
    for i in range(len(filter_data)):
        temp_num_match = filter_data[i]['num_matched']
        plt.figure()
        plt.title(f'num matched: {temp_num_match}')
        plt.imshow(filter_data[i]['image'])



Results of the method

Solution 4:[4]

First you could detect any item that is on the shelf with a network like this, it's pre-trained in this exact context and works pretty well. You should also rectify the image before feeding it to the network. You will obtain bounding boxes for every product (maybe some false positive/negative, but that's another issue). Then you can match each box with the template using SIFT and calculating a score (it's up to you define which score works), but I suggest to use another approach like a siamese network if you a consistent dataset.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 abggcv
Solution 2 YoniChechik
Solution 3 Manish Sahu
Solution 4 rok