This assignment has three (3) parts:
•
In part I, you will be assessed on the theorical aspects of machine learning and deep learning. More
specifically, you will be tested on activation functions and how DNNS, forward propagation and backward
propagation work.
.
In part II, you will be required to build, train, test and improve a basic DNN for handwritten letter recognition.
You will need to load and prepare data, visualize data for inspection, build your DNN, tune model
hyperparameters, and then further improve your DNN using label smoothing.
In part III, you will be required to build a real-world image classifier with CNNS and improve the model
robustness against adversarial attacks.
Make sure you read the instructions carefully.
You need to submit through the Moodle Assignment activity a single ZIP file, name xxx_A1_solution.zip, where xxx is
your student ID. The ZIP should contain:
1) Jupyter notebooks with answers to the questions and your work. They should be named
A1_Part1_Solutions.ipynb, A1_Part2_Solutions.ipynb, and A1_Part3_Solutions.ipynb
corresponding to part 1, part 2, and part 3 respectively.
2)
A copy of your solution notebooks exported in HTML format.
3)
(Optional) Any extra file or folder needed to complete your assignment (e.g., images used in
your answers)./nINSTRUCTIONS
This assignment has three (3) parts:
•
In part I, you will be assessed on the theorical aspects of machine learning and deep learning. More
specifically, you will be tested on activation functions and how DNNS, forward propagation and backward
propagation work.
.
In part II, you will be required to build, train, test and improve a basic DNN for handwritten letter recognition.
You will need to load and prepare data, visualize data for inspection, build your DNN, tune model
hyperparameters, and then further improve your DNN using label smoothing.
In part III, you will be required to build a real-world image classifier with CNNS and improve the model
robustness against adversarial attacks.
Make sure you read the instructions carefully.
You need to submit through the Moodle Assignment activity a single ZIP file, name xxx_A1_solution.zip, where xxx is
your student ID. The ZIP should contain:
1) Jupyter notebooks with answers to the questions and your work. They should be named
A1_Part1_Solutions.ipynb, A1_Part2_Solutions.ipynb, and A1_Part3_Solutions.ipynb
corresponding to part 1, part 2, and part 3 respectively.
2)
A copy of your solution notebooks exported in HTML format.
3)
(Optional) Any extra file or folder needed to complete your assignment (e.g., images used in
your answers)./nPart 1: Question on theory and knowledge (30 points)
The first part of this assignment covers contents from the lectures and lab sessions in Weeks 1
and 2. You are highly recommended to revise these materials before attempting this part.
In this part, you are expected to demonstrate your understanding of the concepts of activation
functions, forward propagation and backward propagation in DNNs.
Question 1.1 Activation Functions (8 points)
Activation functions play an important role in modern DNNs. In this question, we will explore
some of them for a deeper understanding of their characteristics and their advantages.
a) Given the Exponential Linear Unit activation function:
ELU = {a(ex - 1)
if x < 0
otherwise
State its output range, find its derivative (show your steps), and plot the activation function
and its derivative. (2 points)
b) There are a wide range of activation functions recently been proposed. Do your research
and select two (2) activation functions that have not been discussed in the lecture (i.e.,
ReLU, Sigmoid, and Tanh).
For each of the selected activation function, you must (3 points total for each function):
• Identify the research paper which proposes the activation function.
Write a summary of the author's motivation which leads to the activation function (max
150 words).
• Write a summary of advantages of the activation function (max 150 words).
Question 1.2 Feed-Forward Neural Networks (8 points)
Assume that we feed a data point x with a ground-truth label y = 3 (with index starting from 1
as in the lecture) to the feed-forward neural network with the ReLU activation function and
hidden layers shown in the following figure:/na) What is the numerical value of the latent presentation h¹(x)? (1 point)
b) What is the numerical value of the latent presentation h² (x)? (1 point)
c) What is the numerical value of the logit h³ (x)? (1 point)
d) What is the corresponding prediction probability p(x)? (1 point)
e)
What is the predicted class? Is it a correct or incorrect prediction? (1 point)
f)
What is the cross-entropy loss value caused by the feed-forward neural network at
(x, y)? (1 point)
g) Assume that we are applying label smoothing technique¹ with a = 0.1. What is the
relevant loss value caused by the feed-forward neural network at (x, y)? (2 points)
Note: You must show both formulas and numerical results to get full marks. Although it is
optional, it is great if you show your Numpy code for your computation.
Question 1.3 Back propagation (10 points)
Given a multi-layered feed-forward neural network for a classification problem with three (3)
classes (where the model parameters are initialized randomly).
The architecture is as follows:
x =
Input Layer
h®(x) = x
Hidden Layer 1
h¹(x)
Output Layer
h²(x)
p(x)=softmax(h²(x))/nWe feed a feature vector x = [1 -1 0] with ground-truth label y = 3 to the above network.
a) Using cross-entropy (CE) loss, what is the value of the CE loss /? (2 points)
b) What are the derivatives
and ? (3 points)
al
ab²
al al
əh²¹ əw²¹
al al Əl
Əh¹¹ Əh¹¹ aw¹¹
c) What are the derivatives
and? (3 points)
d) Assume that we use SGD with learning rate n = 0.01 to update the model parameters.
What are the values of W², b² and W¹, b¹ after being updated? (2 points)
Note: You must show the formulas, numerical results, and your Numpy code to get full marks.
¹ Link for main paper from Goeff Hinton:
https://papers.nips.cc/paper/2019/file/f1748d6b0fd9d439f71450117eba2725-Paper.pdf
Question 1.4 Optimization with Gradient Descent (4 points)
This question assesses your understanding of gradient descent, one of the most important
optimization techniques in deep learning.
a) Write a pseudo-code to implement the gradient descent algorithm and using your own
words, explain what each line of the code does (2 points)
b) Using your own words, explain why the negative gradient direction is the direction that
gives us the fastest decrease of loss value? (2 points)
Fig: 1
Fig: 2
Fig: 3
Fig: 4
Fig: 5