Covers: theory of AdaGrad
Questions this item addresses:
  • How to implement AdaGrad in Python
How to use this item?

In this module, Alex Smola demonstrates AdaGrad in python using Jupiter notebook. This module helps you understand different parameters in AdaGrad and how to implement it in Python (4:20 till the ​end) first 4 mins of this video is also about "momentum"

Fail to play? Open the link directly:
0 comment


Total time needed: ~2 hours
Learn the theory behind AdaGrad as an optimizer and how to implement it in Python
Potential Use Cases
Adagrad is an algorithm for gradient-based optimization. it is well-suited when dealing with sparse data (NLP or image recognition).
Who is This For ?
Click on each of the following annotated items to see details.
ARTICLE 1. Intro to mathematical optimization
  • What is mathematical optimization?
  • Why do we need to optimize a cost function in ML algorithms?
10 minutes
VIDEO 2. Gradient Descent
  • What is Gradient Decent(GD)?
  • How does GD work in python?
10 minutes
ARTICLE 3. Learning Rate
  • What is learning rate?
  • How can I make it better?
20 minutes
ARTICLE 4. AdaGrad : Introduction (No math!)
  • What is Adagrad?
10 minutes
ARTICLE 5. Adaptive Gradient (adaGrad) : Introduction [ With more advanced math concepts ]
  • What is AdaGrad?
  • What is the math behind this optimizer?
30 minutes
ARTICLE 6. AdaGrad in Python
  • How to implement AdaGrad in Python
10 minutes
PAPER 7. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization (optional)
  • Where does this optimizer come from?
30 minutes

Concepts Covered

0 comment