'I want to feed videos as well as its annotations as training data to tensorflow model to hopefully get better results
I am training a model to detect drones in videos obtained from a security feed. The dataset is videos of drones flying in front of a camera and a file with its annotations in the following format (index, frame_number, no_of_objects, X_co-ordinate, Y_co-ordinate, width, height, class).
I am aware I can train the model using only the frames as data and the no_of_objects column as value but, I want to utilize the data provided to tell the model where exactly in the entire frame the drone currently is.
Do I need to design a custom model or there is some existing library that accepts coordinates as arguments? If the approach I am currently looking at is not optimal please let me know.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
