Random initialization of the weights can make the learning unstable as the variance of
the output layer could be different from the variance of the input layer.
Initialization strategies such as 'Xavier' or 'He' completely addressed vanishing or
exploding issue.
Weight initialization schemes which draw samples from distribution with mean as O and
standard deviation based on fan _in and fan_out keeps the gradient stable.
Initialization parameters such as standard deviation of the Weight initialization schemes
should be same irrespective of the activation functions used in the hidden layer.
3.8) Which of the following statements is/are true?
Fig: 1