Search for question
Question

Adversarial Machine Learning 1.1 Generating Adversarial Examples Fast Gradient Sign Method [FGSM] In this first part of the assignment, you will implement the two most basic forms of gradient- based adversarial

attacks on convolutional neural networks used as image classifiers. In this exercise, you are given a pre-trained model along with the training and test data. The pre-trained model achieves a test accuracy of about 98%. Your objective is to cause misclassification by adding varying degrees of noise to the test images. The first (and the simplest) such technique is the Fast Gradient Sign Method (FGSM), where each input image with true label yis modified in a single update as follows: (v. C(U₁.8(2))) -z+asign, a sign [100 pts] where represents the loss function, and a represents the maximum change allowed to each pixel. After adding noise, we ensure that the resulting tensor is a valid RGB image by clipping its values to the range [0, 1]. For this section, you will be using the adversarial.py file, which defines the pre-trained convolutional neural network, and loads the weights saved in the model.pth file. 1. Using the negative log-likelihood loss (NLLLoss in PyTorch), complete the function to modify each test image using the above update, and compute accuracy over these ad- versarial examples (fill out the FGSM() function in the adversarial.py file). Evaluate your network for -0.001,0.01,0.1. Submit the accuracy on adversarial test images in your written submission for each value of a. [Code: 15 pts, Accuracy: 3 pts] 2. For one test example in each class (apple, banana, orange), provide the original image, along with three adversarial examples (as images) - constructed using @ -0.001,0.01, and 0.1 respectively. Clearly label them with the corresponding values of a used and which class each of them was predicted to be. [15 pts] Projected Gradient Descent (PGD] The Projected Gradient Descent (PGD) adversarial attack is a more powerful extension of FGSM, where the noise-addition step is repeated for a number of iterations, each time clipping the noise within an e-ball around the original image. Starting with z z, i.e. the original image, we update each test image as follows: – Clips + [21; + a sign (V.: L (1. (-;))] 3 3. Using the same loss function as before, complete the Python function to modify input images using PGD (fill out the PGD() function in the adversarial.py file). With a-2/255, and 50 iterations, for -0.01,0.05, and 0.1, submit the accuracy on adversarial test im- ages in your written submission. [Code: 20 pts, Accuracy: 3 pts] 4. For one test example in each class (apple, banana, orange), provide the original image, along with three adversarial examples (as images) -constructed using-0.001,0.01, and 0.1 respectively, and label which class each of them was predicted to be. [15 pts]

Fig: 1