Covers: theory of Domain agnostic AI
Estimated time needed to finish: 40 minutes
Questions this item addresses:
  • Why to use Transformers architecture for not-only NLP task?
  • What are the obstacles for using transformers out of NLP domain?
  • What is quadratic complexity and how the Perceiver tackles it?
  • You may know self-attention but what is cross-attention and why it may be useful?
How to use this item?

This is a short summary of the first Perciever, i.e. Perceiver: General Perception with Iterative Attention by DeepMind. Read the article to get high level overview of Perciever.

To dig deeper , it's great if experienced researchers interprets complex and new works! Yannic Kilcher, breaks down the paper in this video.

To follow up with the video and refer to transformers simultaniously (to get not lost and remaind yoursefl), you can always refer to these blog posts to not get lost:

Author(s) / creator(s) / reference(s)
Chul Hyun (Chadrick Kwag) Kwag
0 comment
Recipe
publicShare
Star(0)

Perceivers: General Models For Any Data

Contributors
Total time needed: ~3 hours
Objectives
Understand how Perceivers, build on transformers like architecture, can generalize in multi-domain applications while solving the quadratic bottleneck.
Potential Use Cases
Multi-domain application and understanding/dealing many datatypes at once. Can also potentially replace state-of-the-art transformers (e.g. BERT) and ViTs models as no preprocessing (e.g. tokenization) is needed.
Who is This For ?
ADVANCEDMachine Learning Scientists willing to experiment with pre-trained multi-domain model.
Click on each of the following annotated items to see details.
Resources4/6
VIDEO 1. Self-attention: Whiteboard video series
  • Why we may need to transform signals in sequences?
  • May interaction of signals in a sequence be useful?
  • What is the high-level idea of self-attention?
  • What are Key, Query, Values and how do they interact with the processing information in the self-attention architecture?
  • What is the scheme of the self-attention in the neural network?
  • How does multi-head self-attention look like?
  • How to process information through multi-head self-attention and still end up with input dimension output?
45 minutes
VIDEO 2. Transformer Encoder: Whiteboard video
  • What is the role of self-attention mechanism in Transformer architecture?
  • What are the main components of a Transformer?
  • What is the role and characteristics of Transformer Encoder?
  • Whats is the Positional Encoding and what important feature of attention mechanism does it solves?
  • What is advantage of the multi-head self-attention over a recurrent neural network (RNN)?
20 minutes
ARTICLE 3. Paper summary: “Perceiver : General Perception with Iterative Attention”
  • Why to use Transformers architecture for not-only NLP task?
  • What are the obstacles for using transformers out of NLP domain?
  • What is quadratic complexity and how the Perceiver tackles it?
  • You may know self-attention but what is cross-attention and why it may be useful?
40 minutes
ARTICLE 4. Perceiver and Perceiver IO work as multi-purpose tools for AI
  • How to design architecture for inputs and outputs of arbitrary size and semantics?
  • What is the role of Transformers and self-attention in domain agnostic archytectures?
  • How does Perceiver IO elaborates and improve its previous version Perceiver?
40 minutes
REPO 5. DeepMind Perceivers: GitHub Repository
  • What are the key differences between Perceiver and its successor Perceiver IO?
  • How to use pretrained Perceiver IO modes l in Colab notebook?
  • How to use the training scripts?
15 minutes
OTHER 6. Perceivers: Performance Discussion on Reddit
  • What the ML practitioners thing about Perceiver's perform on lower volume of training data?
5 minutes

Concepts Covered

0 comment