Question

3.4) Which of the following statements is/are true? Learning using optimizers such as AdaGrad or RMSProp is adaptive as the learning rate is changed based on a pre-defined schedule after every

epoch. Since different dimension has different impact, adapting the learning rate per parameter could lead to good convergence solution. Since AdaGrad technique, adapts the learning rate based on the gradients, it could converge faster but it also suffers from an issue with the scaling of the learning rate, which impacts the learning process and could lead to sub-optimal solution. RMSProp technique is similar to the AdaGrad technique but it scales the learning rate using exponentially decaying average of squared gradients. 0 Adam optimizer always converge at better solution than the stochastic gradient descent optimizer.

Fig: 1