Covers: theory of Learning rate
Estimated time needed to finish: 20 minutes
Questions this item addresses:
  • What is learning rate?
  • How can I make it better?
How to use this item?

In the last module of the prerequisites of AdaGrad, we are going to learn about learning rates. The first section of this article, explains the definition of learning rate in ML.

0 comment


Total time needed: ~2 hours
Learn the theory behind AdaGrad as an optimizer and how to implement it in Python
Potential Use Cases
Adagrad is an algorithm for gradient-based optimization. it is well-suited when dealing with sparse data (NLP or image recognition).
Who is This For ?
Click on each of the following annotated items to see details.
ARTICLE 1. Intro to mathematical optimization
  • What is mathematical optimization?
  • Why do we need to optimize a cost function in ML algorithms?
10 minutes
VIDEO 2. Gradient Descent
  • What is Gradient Decent(GD)?
  • How does GD work in python?
10 minutes
ARTICLE 3. Learning Rate
  • What is learning rate?
  • How can I make it better?
20 minutes
ARTICLE 4. AdaGrad : Introduction (No math!)
  • What is Adagrad?
10 minutes
ARTICLE 5. Adaptive Gradient (adaGrad) : Introduction [ With more advanced math concepts ]
  • What is AdaGrad?
  • What is the math behind this optimizer?
30 minutes
ARTICLE 6. AdaGrad in Python
  • How to implement AdaGrad in Python
10 minutes
PAPER 7. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization (optional)
  • Where does this optimizer come from?
30 minutes

Concepts Covered

0 comment