'What form should the output layer of a deep learning network look like for multi-object bounding box regression?
I am building a neural network on the back of Mobilenet SSD v2 and its specifically for bounding box regression. I have had a difficult time looking for clear resources indicating how the output of the model should be shaped. My data generally has 1-4 boxes present in any given image and I could simply concatenate so the output is Dense(16) but what about the instance when there are more than 4 objects present in the image. I am unsure how to handle a dynamic multi-object output layer, how can I do this, are there any detailed resources that can be shared?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
