Covers: theory of Transformers
Estimated time needed to finish: 25 minutes
Questions this item addresses:
  • What are transformers ?
How to use this item?

Read Section 3, 4, 5 and 7

Author(s) / creator(s) / reference(s)
Ashish Vaswani et. al
Andrew Berry.
Wrong URL link to the paper. Needs to be fixed!
Andrew Berry.
Sharvari Dhote.
Thank you.


Total time needed: ~2 hours
Understanding the characteristics and capabilities of GPT-3 and differences with the previous transformer based language models.
Potential Use Cases
natural language generation, summarization, question answering, classification
Who is This For ?
Click on each of the following annotated items to see details.
PAPER 1. Understanding the GPT-3
  • What is GPT-3 ?
  • How is GPT-3 different from previous transformer based architectures?
  • How GPT-3 uses Few shot learning and zero shot learning to eliminate fine-tuning and the need for large task specific datasets?
25 minutes
PAPER 2. Transformers
  • What are transformers ?
25 minutes
ARTICLE 3. Introduction to Language Modelling
  • What is Language Modelling?
23 minutes
PAPER 4. Few Shot Learning
  • What is few shot learning?
10 minutes
ARTICLE 5. Approaches and Applications of Few Shot Learning
  • Where is few shot learning used?
15 minutes

Concepts Covered

Amir Feizpour.
what is the best resource for GPT3 implementation?
Amir Feizpour.
never mind, found GPT-Neo