ARTICLEAttention is all you need: Discovering the Transformer paper

Covers: implementation of Transformers
Estimated time needed: 60 minutes
Questions this item adddesses:
  • What is another example implementation of the Attention Is All You Need paper?
How to use this item?

Try out the coding implementation in this article.

Author(s) / creator(s) / reference(s)
Eduardo Muñoz
Shortlist
publicShare

Learning Transformers by Creating Transformers

Yan NusinovichTotal time needed: ~6 hours
Learning Objectives
Learn how to create the original transformers from the Attention Is All You Need paper
Potential Use Cases
This shortlist is for educational uses, it will help you better understand how future transformer-based models like BERT and GPT2 work
Target Audience
BEGINNERData Scientists learning NLP
Go through the following annotated items in order:
VIDEO 1. Attention Is All You Need
  • What is an introductory overview of Attention Is All You Need in video format?
30 minutes
ARTICLE 2. Transformer — Attention is all you need
  • What is an accessible overview of Attention Is All You Need in article form?
15 minutes
ARTICLE 3. The Illustrated Transformer
  • What is a more through explanation of the theory behind Transformers?
20 minutes
ARTICLE 4. TRANSFORMERS FROM SCRATCH
  • How do I start to implement and understand the code for transformers?
60 minutes
ARTICLE 5. Attention is all you need: Discovering the Transformer paper
  • What is another example implementation of the Attention Is All You Need paper?
60 minutes
ARTICLE 6. Pytorch Transformers from Scratch (Attention is all you need)
  • What is one more implementation of the Attention Is All You Need paper?
60 minutes
ARTICLE 7. The Annotated Transformer
  • What is a more thorough explanation of the theory and implementation of transformers?
60 minutes

Concepts Convered