Time: Thursday 9-Jul-2020 23:30 (This is a past event.)
slides: please to see content
Motivation / Abstract
The Action Transformer model is for recognizing and localizing human actions in video clips. This model repurposes a Transformer-style architecture to aggregate features from the spatiotemporal context around the person whose actions are being classified with the model. The paper shows that by using high-resolution, person-specific, class-agnostic queries, the model spontaneously learns to track individual people and to pick up on semantic context from the actions of others.
- Learn about the Action Transformer Architecture - Learn about Action Detection in videos - Learn about attention mechanism in Computer Vision - Learn about Deep Learning paradigms in Action Recognition