Covers: theory of AdaGrad
Estimated time needed to finish: 30 minutes
Questions this item addresses:
  • Where does this optimizer come from?
How to use this item?

This is the optional module for the students who want to learn more about AdaGrad. This is the first article that introduces​ this concept.

0 comment


Total time needed: ~2 hours
Learn the theory behind AdaGrad as an optimizer and how to implement it in Python
Potential Use Cases
Adagrad is an algorithm for gradient-based optimization. it is well-suited when dealing with sparse data (NLP or image recognition).
Who is This For ?
Click on each of the following annotated items to see details.
ARTICLE 1. Intro to mathematical optimization
  • What is mathematical optimization?
  • Why do we need to optimize a cost function in ML algorithms?
10 minutes
VIDEO 2. Gradient Descent
  • What is Gradient Decent(GD)?
  • How does GD work in python?
10 minutes
ARTICLE 3. Learning Rate
  • What is learning rate?
  • How can I make it better?
20 minutes
ARTICLE 4. AdaGrad : Introduction (No math!)
  • What is Adagrad?
10 minutes
ARTICLE 5. Adaptive Gradient (adaGrad) : Introduction [ With more advanced math concepts ]
  • What is AdaGrad?
  • What is the math behind this optimizer?
30 minutes
ARTICLE 6. AdaGrad in Python
  • How to implement AdaGrad in Python
10 minutes
PAPER 7. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization (optional)
  • Where does this optimizer come from?
30 minutes

Concepts Covered

0 comment