Perceivers: General Models For Any Data

Total time needed: ~3 hours
Understand how Perceivers, build on transformers like architecture, can generalize in multi-domain applications while solving the quadratic bottleneck.
Potential Use Cases
Multi-domain application and understanding/dealing many datatypes at once. Can also potentially replace state-of-the-art transformers (e.g. BERT) and ViTs models as no preprocessing (e.g. tokenization) is needed.
Who is This For ?
ADVANCEDMachine Learning Scientists willing to experiment with pre-trained multi-domain model.
VIDEO 1. Self-attention: Whiteboard video series
  • Why we may need to transform signals in sequences?
  • May interaction of signals in a sequence be useful?
  • What is the high-level idea of self-attention?
  • What are Key, Query, Values and how do they interact with the processing information in the self-attention architecture?
  • What is the scheme of the self-attention in the neural network?
  • How does multi-head self-attention look like?
  • How to process information through multi-head self-attention and still end up with input dimension output?
45 minutes
VIDEO 2. Transformer Encoder: Whiteboard video
  • What is the role of self-attention mechanism in Transformer architecture?
  • What are the main components of a Transformer?
  • What is the role and characteristics of Transformer Encoder?
  • Whats is the Positional Encoding and what important feature of attention mechanism does it solves?
  • What is advantage of the multi-head self-attention over a recurrent neural network (RNN)?
20 minutes
ARTICLE 3. Paper summary: “Perceiver : General Perception with Iterative Attention”
  • Why to use Transformers architecture for not-only NLP task?
  • What are the obstacles for using transformers out of NLP domain?
  • What is quadratic complexity and how the Perceiver tackles it?
  • You may know self-attention but what is cross-attention and why it may be useful?
40 minutes
ARTICLE 4. Perceiver and Perceiver IO work as multi-purpose tools for AI
  • How to design architecture for inputs and outputs of arbitrary size and semantics?
  • What is the role of Transformers and self-attention in domain agnostic archytectures?
  • How does Perceiver IO elaborates and improve its previous version Perceiver?
40 minutes
REPO 5. DeepMind Perceivers: GitHub Repository
  • What are the key differences between Perceiver and its successor Perceiver IO?
  • How to use pretrained Perceiver IO modes l in Colab notebook?
  • How to use the training scripts?
15 minutes
OTHER 6. Perceivers: Performance Discussion on Reddit
  • What the ML practitioners thing about Perceiver's perform on lower volume of training data?
5 minutes

Concepts Covered

