[Action Recognition] Video Action Transformer Network

Time: Thursday 9-Jul-2020 23:30 (This is a past event.)

Motivation / Abstract
The Action Transformer model is for recognizing and localizing human actions in video clips.  This model repurposes a Transformer-style architecture to aggregate features
from the spatiotemporal context around the person whose actions are being classified with the model.  The paper shows that by using high-resolution, person-specific, class-agnostic queries, the model spontaneously learns to track individual people and to pick up on semantic context from the actions of others.
Questions Discussed
- Learn about the Action Transformer Architecture
- Learn about Action Detection in videos
- Learn about attention mechanism in Computer Vision
- Learn about Deep Learning paradigms in Action Recognition 
