'Deep learning network to detect activity in videos
I'm currently working on a project to create a model that detects the type of activity two people in a video are doing. For example, let's consider a data set consisting of videos where the two people are boxing, wrestling, jumping, and even just talking with each other. How can I create a model to distinguish these activities from each other.
In particular, what kind of features should be extracted from the videos ? Should I use a pre-trained model with image-net weights etc. I am not sure where to begin. Thank you for your help!
Solution 1:[1]
I suggest starting reading academic papers to get a sense of it. This might be a good start; though it might be a little bit overwhelming if you are a beginner.
if you want to start exploring a collection of related papers/repositories, this is suggested: link.
If you want to start hands-on: this repository might be a good choice, it is advanced a collection of related implementations for such tasks: link
Here is also some benchmark that might be useful to follow: link
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Sadra |
