Search for question

Learning using optimizers such as AdaGrad or RMSProp is adaptive as the learning rate is

changed based on a pre-defined schedule after every epoch.

Since different dimension has different impact, adapting the learning rate per parameter

could lead to good convergence solution.

Since AdaGrad technique, adapts the learning rate based on the gradients, it could

converge faster but it also suffers from an issue with the scaling of the learning rate,

which impacts the learning process and could lead to sub-optimal solution.

RMSProp technique is similar to the AdaGrad technique but it scales the learning rate

using exponentially decaying average of squared gradients.

0

Adam optimizer always converge at better solution than the stochastic gradient descent

optimizer.

Fig: 1