Understand Self Attention

Total time needed: ~37 minutes
With this shortlist you will understand how the self-attention mechanism can relate different positions of a single sequence in order to compute a representation of the sequence.
Potential Use Cases
Help Google better discern the context of words in search queries.
Who is This For ?
BEGINNERBeginners looking to understand the Attention Is All You Need research paper.
Click on each of the following annotated items to see details.
Resource Asset3/3
VIDEO 1. Rasa Algorithm Whiteboard - Attention 1: Self Attention
  • What are attention mechanisms?
  • How does time series data get used with self-attention?
14 minutes
ARTICLE 2. The Illustrated Transformer
  • What does a high-level look for a transformer look like?
  • What makes up an encoding component?
  • What makes up a decoding component?
8 minutes
VIDEO 3. The Narrated Transformer Language Model
  • How can you assign meaning to numbers via embeddings?
  • What are token embeddings?
  • How does the last hidden state predict the last word?
  • What does a softmax function do?
15 minutes

Concepts Covered

0 comment