Covers: theory of Transformers
Estimated time needed to finish: 20 minutes
Questions this item addresses:
  • What is the role of self-attention mechanism in Transformer architecture?
  • What are the main components of a Transformer?
  • What is the role and characteristics of Transformer Encoder?
  • Whats is the Positional Encoding and what important feature of attention mechanism does it solves?
  • What is advantage of the multi-head self-attention over a recurrent neural network (RNN)?
How to use this item?

This is the whitebord work by Rasa. He elaborates and stitches together the well exaplained self-attention concept in Self-attention: Whiteboard video series asset. Everything is explanation of the famouse Attention is All You Need paper. However, Rasa does not explain the Decoder part in detail. It's because the original work's application explicitelly focuses on language-translation task (there is a mask output, i.e. mask, embeding component!). The advantage of such explanation is that it does not "constrain" the audiance to think abount the encoder in this specific way. If you want to learn about this specific application in the above mentioned paper (and positional encodying), look here.

Develop understanding of Transformr Encoder

  • Screen Shot 2021-09-24 at 10.48.14 AM (3).png

Play the video from 11:30 and understand this schema about "what is the advantage of multi-head self-attention over RNN):

  • RRN vs Multi-head self-attention