Search for question
Question

INSTRUCTIONS

This assignment has three (3) parts:

In part I, you will be assessed on the theorical aspects of machine learning and deep learning. More

specifically, you will be tested on activation functions and how DNNS, forward propagation and backward

propagation work.

.

In part II, you will be required to build, train, test and improve a basic DNN for handwritten letter recognition.

You will need to load and prepare data, visualize data for inspection, build your DNN, tune model

hyperparameters, and then further improve your DNN using label smoothing.

In part III, you will be required to build a real-world image classifier with CNNS and improve the model

robustness against adversarial attacks.

Make sure you read the instructions carefully.

You need to submit through the Moodle Assignment activity a single ZIP file, name xxx_A1_solution.zip, where xxx is

your student ID. The ZIP should contain:

1) Jupyter notebooks with answers to the questions and your work. They should be named

A1_Part1_Solutions.ipynb, A1_Part2_Solutions.ipynb, and A1_Part3_Solutions.ipynb

corresponding to part 1, part 2, and part 3 respectively.

2)

A copy of your solution notebooks exported in HTML format.

3)

(Optional) Any extra file or folder needed to complete your assignment (e.g., images used in

your answers)./nINSTRUCTIONS

This assignment has three (3) parts:

In part I, you will be assessed on the theorical aspects of machine learning and deep learning. More

specifically, you will be tested on activation functions and how DNNS, forward propagation and backward

propagation work.

.

In part II, you will be required to build, train, test and improve a basic DNN for handwritten letter recognition.

You will need to load and prepare data, visualize data for inspection, build your DNN, tune model

hyperparameters, and then further improve your DNN using label smoothing.

In part III, you will be required to build a real-world image classifier with CNNS and improve the model

robustness against adversarial attacks.

Make sure you read the instructions carefully.

You need to submit through the Moodle Assignment activity a single ZIP file, name xxx_A1_solution.zip, where xxx is

your student ID. The ZIP should contain:

1) Jupyter notebooks with answers to the questions and your work. They should be named

A1_Part1_Solutions.ipynb, A1_Part2_Solutions.ipynb, and A1_Part3_Solutions.ipynb

corresponding to part 1, part 2, and part 3 respectively.

2)

A copy of your solution notebooks exported in HTML format.

3)

(Optional) Any extra file or folder needed to complete your assignment (e.g., images used in

your answers)./nPart 1: Question on theory and knowledge (30 points)

The first part of this assignment covers contents from the lectures and lab sessions in Weeks 1

and 2. You are highly recommended to revise these materials before attempting this part.

In this part, you are expected to demonstrate your understanding of the concepts of activation

functions, forward propagation and backward propagation in DNNs.

Question 1.1 Activation Functions (8 points)

Activation functions play an important role in modern DNNs. In this question, we will explore

some of them for a deeper understanding of their characteristics and their advantages.

a) Given the Exponential Linear Unit activation function:

ELU = {a(ex - 1)

if x < 0

otherwise

State its output range, find its derivative (show your steps), and plot the activation function

and its derivative. (2 points)

b) There are a wide range of activation functions recently been proposed. Do your research

and select two (2) activation functions that have not been discussed in the lecture (i.e.,

ReLU, Sigmoid, and Tanh).

For each of the selected activation function, you must (3 points total for each function):

• Identify the research paper which proposes the activation function.

Write a summary of the author's motivation which leads to the activation function (max

150 words).

• Write a summary of advantages of the activation function (max 150 words).

Question 1.2 Feed-Forward Neural Networks (8 points)

Assume that we feed a data point x with a ground-truth label y = 3 (with index starting from 1

as in the lecture) to the feed-forward neural network with the ReLU activation function and

hidden layers shown in the following figure:/na) What is the numerical value of the latent presentation h¹(x)? (1 point)

b) What is the numerical value of the latent presentation h² (x)? (1 point)

c) What is the numerical value of the logit h³ (x)? (1 point)

d) What is the corresponding prediction probability p(x)? (1 point)

e)

What is the predicted class? Is it a correct or incorrect prediction? (1 point)

f)

What is the cross-entropy loss value caused by the feed-forward neural network at

(x, y)? (1 point)

g) Assume that we are applying label smoothing technique¹ with a = 0.1. What is the

relevant loss value caused by the feed-forward neural network at (x, y)? (2 points)

Note: You must show both formulas and numerical results to get full marks. Although it is

optional, it is great if you show your Numpy code for your computation.

Question 1.3 Back propagation (10 points)

Given a multi-layered feed-forward neural network for a classification problem with three (3)

classes (where the model parameters are initialized randomly).

The architecture is as follows:

x =

Input Layer

h®(x) = x

Hidden Layer 1

h¹(x)

Output Layer

h²(x)

p(x)=softmax(h²(x))/nWe feed a feature vector x = [1 -1 0] with ground-truth label y = 3 to the above network.

a) Using cross-entropy (CE) loss, what is the value of the CE loss /? (2 points)

b) What are the derivatives

and ? (3 points)

al

ab²

al al

əh²¹ əw²¹

al al Əl

Əh¹¹ Əh¹¹ aw¹¹

c) What are the derivatives

and? (3 points)

d) Assume that we use SGD with learning rate n = 0.01 to update the model parameters.

What are the values of W², b² and W¹, b¹ after being updated? (2 points)

Note: You must show the formulas, numerical results, and your Numpy code to get full marks.

¹ Link for main paper from Goeff Hinton:

https://papers.nips.cc/paper/2019/file/f1748d6b0fd9d439f71450117eba2725-Paper.pdf

Question 1.4 Optimization with Gradient Descent (4 points)

This question assesses your understanding of gradient descent, one of the most important

optimization techniques in deep learning.

a) Write a pseudo-code to implement the gradient descent algorithm and using your own

words, explain what each line of the code does (2 points)

b) Using your own words, explain why the negative gradient direction is the direction that

gives us the fastest decrease of loss value? (2 points)

Fig: 1

Fig: 2

Fig: 3

Fig: 4

Fig: 5