[D] How agnostic are action recognition neural networks to large variance in sampling frequency of the videos?
Usually when we train action recognition algorithms (classifying sequences of frames), the researchers use public datasets such as HMDB-51 or UCF-11. Those datasets are neat and have well known, constant number of fps (30 and 29.97 respectively).
I am working on a project where I want to analyze “real world” videos. I am trying to keep the sampling frequency of my input at the constant fps level, but because of different data sources and imperfect recording devices (frame skipping), the average fps number per video clip differs to some extend.
Do you know if there are any works (or perhaps somebody has some practical experience with this problem) which can give me a clue of how robust neural networks are with respect to this issue?