Understand how Perceivers, build on transformers like architecture, can generalize in multi-domain applications while solving the quadratic bottleneck.
Potential Use Cases
Multi-domain application and understanding/dealing many datatypes at once. Can also potentially replace state-of-the-art transformers (e.g. BERT) and ViTs models as no preprocessing (e.g. tokenization) is needed.
Who is This For ?
ADVANCEDMachine Learning Scientists willing to experiment with pre-trained multi-domain model.
Click on each of the following annotated items to see details.
VIDEO 1. Self-attention: Whiteboard video series
Why we may need to transform signals in sequences?
May interaction of signals in a sequence be useful?
What is the high-level idea of self-attention?
What are Key, Query, Values and how do they interact with the processing information in the self-attention architecture?
What is the scheme of the self-attention in the neural network?
How does multi-head self-attention look like?
How to process information through multi-head self-attention and still end up with input dimension output?
VIDEO 2. Transformer Encoder: Whiteboard video
What is the role of self-attention mechanism in Transformer architecture?
What are the main components of a Transformer?
What is the role and characteristics of Transformer Encoder?
Whats is the Positional Encoding and what important feature of attention mechanism does it solves?
What is advantage of the multi-head self-attention over a recurrent neural network (RNN)?
ARTICLE 3. Paper summary: “Perceiver : General Perception with Iterative Attention”
Why to use Transformers architecture for not-only NLP task?
What are the obstacles for using transformers out of NLP domain?
What is quadratic complexity and how the Perceiver tackles it?
You may know self-attention but what is cross-attention and why it may be useful?
ARTICLE 4. Perceiver and Perceiver IO work as multi-purpose tools for AI
How to design architecture for inputs and outputs of arbitrary size and semantics?
What is the role of Transformers and self-attention in domain agnostic archytectures?
How does Perceiver IO elaborates and improve its previous version Perceiver?
REPO 5. DeepMind Perceivers: GitHub Repository
What are the key differences between Perceiver and its successor Perceiver IO?
How to use pretrained Perceiver IO modes l in Colab notebook?
How to use the training scripts?
OTHER 6. Perceivers: Performance Discussion on Reddit
What the ML practitioners thing about Perceiver's perform on lower volume of training data?