I am currently researching practical applications of action recognition models with use of attention models. I have decided to share lessons learned from implementing several ideas from research papers in this field. The network learns to classify images from HMDB-51 dataset and creates attention heatmaps which focus on different parts on the image and thus justify model’s decision. Heatmaps can be very accurate, to the point that one could probably use them for tracking.
The tutorial contains brief overview of action recognition and visual attention mechanisms. Then I present the network architecture and discuss the results of my project. Additionally, I include github repo with my implementation.
I hope you guys find it interesting!