Covers: theory of Self-Attention
Estimated time needed to finish: 45 minutes
Questions this item addresses:
  • Why we may need to transform signals in sequences?
  • May interaction of signals in a sequence be useful?
  • What is the high-level idea of self-attention?
  • What are Key, Query, Values and how do they interact with the processing information in the self-attention architecture?
  • What is the scheme of the self-attention in the neural network?
  • How does multi-head self-attention look like?
  • How to process information through multi-head self-attention and still end up with input dimension output?
How to use this item?

Watch Video 8: Attention - Self Attention Develop understanding of this schema Screen Shot 2021-09-17 at 11.39.28 AM.png

Watch Video 9: Attention: Keys, Values, Queries Develop understanding of this schema: Self-attention (with QVK differeniable parameters) Screen Shot 2021-09-17 at 11.40.50 AM.png

Develop understanding of this schema: Self-attention NN