This is the whitebord work by Rasa. He elaborates and stitches together the well exaplained self-attention concept in Self-attention: Whiteboard video series asset. Everything is explanation of the famouse Attention is All You Need paper. However, Rasa does not explain the Decoder part in detail. It's because the original work's application explicitelly focuses on language-translation task (there is a mask output, i.e. mask, embeding component!). The advantage of such explanation is that it does not "constrain" the audiance to think abount the encoder in this specific way. If you want to learn about this specific application in the above mentioned paper (and positional encodying), look here.
Develop understanding of Transformr Encoder
Play the video from 11:30 and understand this schema about "what is the advantage of multi-head self-attention over RNN):