hidden layer classification NN using the MNIST digits (M = 10) dataset 2. In this dataset, we have
N = 60000 training data points and 10000 examples for testing where each one consists of a gray
scale image with F= 28 * 28 = 784 features (or pixels).
Fig: 1
Fig: 2