Past Recording
Video Action Transformer Network
Thursday Jul 9 2020 23:30 GMT
Why This Is Interesting

The Action Transformer model is for recognizing and localizing human actions in video clips. This model repurposes a Transformer-style architecture to aggregate features from the spatiotemporal context around the person whose actions are being classified with the model. The paper shows that by using high-resolution, person-specific, class-agnostic queries, the model spontaneously learns to track individual people and to pick up on semantic context from the actions of others.

Discussion Points
  • Learn about the Action Transformer Architecture
  • Learn about Action Detection in videos
  • Learn about attention mechanism in Computer Vision
  • Learn about Deep Learning paradigms in Action Recognition
