dataset, if activation function applied on the hidden layer(s) of the Multi-layer Perceptron network is a linear function, the model will converge to an optimal solution. For a linearly separable dataset, applying non-linear activation function such as sigmoid or tanh on hidden layers of a MLP network, can converge to a good solution. All of the above
Fig: 1