When applying momentum optimization, setting higher momentum co-efficient will
always lead to faster convergence.
Unlike in gradient descent, in momentum optimization the gradients at a step is
dependent on the previous step.
Unlike in stochastic gradient descent, in momentum optimization the path to
convergence is faster but with high variance.
Fig: 1