'Object Detection without annotations and labels
Problem Statement:
I am given 2 sets of images. All the images in both sets are without annotations and labels.
First set : a set of images of the grocery store shelves (captured in the grocery stores).
Second set: a set of close-up images of the products kept on those store shelves.
What I am trying to achieve:
I want to first locate and then predict a bounding box Product for a Product in the set of images of Grocery shelves (first set), given a separate set of the Product images (second set)
Visually:
Output Corresponding Shelf image
My approach:
- For each product image, first find all the shelf image(s) which contain that product.
- Then predict a bounding box by finding the location of the product in the shelf image.
I am using YOLOv5 for this task but I am not sure how should I start off given that I have to do it without annotations or labels.
I have come across terms like Zero-Shot learning, Self-Supervised Object Detection, etc. but I haven't been able to figure out their use as a starting point.
There's a similar question asked but I am not sure the answer to it solves the problem.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|