'Is it possible for either Microsoft Computer Vision API or Google's Cloud Vision API to get a location for objects?
I am trying to develop an application that needs to know the location of tagged objects in an image. Knowing that there is a "piano" in an image is not enough, I need to know where that piano is in the image.
Both Microsoft's Computer Vision API and Google's Cloud Vision API provide some form of cropping suggestion/smart thumbnail generation service which leads me to think that the location of certain objects is being detected - however is there a way to get that information (like a bounding box around each detected object) from either Microsoft's Computer Vision API or Google's Cloud Vision API?
EDIT: I understand that both APIs can return the location of faces detected in an image, however I am looking for locations and sizes of every object in an image: cars, pianos, trees, people...anything.
Solution 1:[1]
Microsoft Vision API offer no pixel coordinates for the detected objects (see return features: https://dev.projectoxford.ai/docs/services/56f91f2d778daf23d8ec6739/operations/56f91f2e778daf14a499e1fa).
However if you want to detect persons Microsoft API can return the coordinates of the face rectangles.
Solution 2:[2]
I don't know about any API serving you coordinates of the object at this time. What I recommend to use is YOLO which provides you with coordinates of the object. You can use either pre-trained models or train your own.
However, it is not API and you have to code a bit of backend to run in remotely.
Solution 3:[3]
Hope this could help you https://azure.microsoft.com/en-in/services/cognitive-services/computer-vision/
API:
url:- (In POST) https://{yourvisionapp}.cognitiveservices.azure.com/vision/v2.0/detect
headers:- Content-Type: application/json
Ocp-Apim-Subscription-Key : {yourSubscriptionKey}
body:- {"url":"yoururl"}
sample response:-
{
"objects": [
{
"rectangle": {
"x": 460,
"y": 79,
"w": 141,
"h": 258
},
"object": "window",
"confidence": 0.508
},
{
"rectangle": {
"x": 180,
"y": 240,
"w": 299,
"h": 182
},
"object": "Billiard table",
"confidence": 0.635,
"parent": {
"object": "table",
"confidence": 0.676
}
},
{
"rectangle": {
"x": 8,
"y": 11,
"w": 497,
"h": 416
},
"object": "room",
"confidence": 0.547
}
],
"requestId": "f8aafd95-d17d-4088-a34b-ad616f9cde4a",
"metadata": {
"width": 640,
"height": 427,
"format": "Jpeg"
}
}
Solution 4:[4]
2020 UPDATE:
This question is a few years old, but the Microsoft Azure Computer Vision API is now able to draw bounding boxes around objects that are detected in an image. Here is a sample in Python. Other languages are available as well.
Computer Vision documentation: https://docs.microsoft.com/en-us/azure/cognitive-services/computer-vision/
Computer Vision SDK: https://docs.microsoft.com/en-us/python/api/azure-cognitiveservices-vision-computervision/?view=azure-python
Computer Vision API: https://westus.dev.cognitive.microsoft.com/docs/services/5cd27ec07268f6c679a3e641/operations/56f91f2e778daf14a499f21b
Solution 5:[5]
Azure Custom Vision would be the product in the Azure family that can do object detection and return the coordinates of an object.
https://azure.microsoft.com/en-us/services/cognitive-services/custom-vision-service/#overview
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | DaveStat |
| Solution 2 | david.r |
| Solution 3 | Dharmendra Prajapati |
| Solution 4 | Azurespot |
| Solution 5 | alexheat |

