For sigmoid activation function, as input to the function becomes larger and positive, the
activation value approaches 1. Similarly, as it becomes very negative, the activation value
approaches 0.
For training MLPs, we do not use activation function as a step function, as in the flat
regions there is no slope and hence no gradient to learn.
For tanh activation function, as the value increases along the positive horizontal axis, the
activation value tends to 1 and along the negative horizontal axis, it tends towards -1,
with the centre value at 0.5.
The ReLU activation function is a piecewise linear function with a flat region for negative
input values and a linear region for positive input values.
Fig: 1