By adding one or more layers to perceptron network with activation functions, non-
linear separable cases can be handled.
For a non-linearly separable dataset, if activation function applied on the hidden layer(s)
of the Multi-layer Perceptron network is a linear function, the model will converge to an
optimal solution.
For a linearly separable dataset, applying non-linear activation function such as sigmoid
or tanh on hidden layers of a MLP network, can converge to a good solution.
All of the above
Fig: 1