Understand Self Attention

Total time needed: ~37 minutes
Learning Objectives
With this shortlist you will understand how the self-attention mechanism can relate different positions of a single sequence in order to compute a representation of the sequence.
Potential Use Cases
Help Google better discern the context of words in search queries.
Target Audience
BEGINNERBeginners looking to understand the Attention Is All You Need research paper.
Go through the following annotated items in order:
VIDEO 1. Rasa Algorithm Whiteboard - Attention 1: Self Attention
  • What are attention mechanisms?
  • How does time series data get used with self-attention?
14 minutes
ARTICLE 2. The Illustrated Transformer
  • What does a high-level look for a transformer look like?
  • What makes up an encoding component?
  • What makes up a decoding component?
8 minutes
VIDEO 3. The Narrated Transformer Language Model
  • How can you assign meaning to numbers via embeddings?
  • What are token embeddings?
  • How does the last hidden state predict the last word?
  • What does a softmax function do?
15 minutes

Concepts Covered