elec 4806 5806 introduction to deep learning and pytorch tensorflow la
Search for question
Question
ELEC 4806/5806 Introduction to Deep Learning and PyTorch/TensorFlow Lab 3 Customize a Convolutional Neural Network for image classification tasks
1. Experiment goal:
. Understand the structure of Convolutional Neural Networks.
· Master the use of hyperparameters.
· Master various optimization, regularization and weight initialization methods.
. Improve the performance of the model by fine-tuning the CNN network
. TensorFlow code
2. Introduction:
In this lab, you will customize a convolutional neural network for image classification tasks. You will use the provided flowers-6 dataset. It consists of 480 color images with different sizes in 6 classes (80 images per class). For each class, there are 73 training images and 7 test images.
buttercup
daisy
fritillary
snowdrop
sunflower
Windflower
Please download the dataset "flowers6.zip" from Canvas and then upload it to the Google Colab online computer as shown below.
+ Code
=
Files
X
ó
[1]
sample_data
flowers6.zip
Then extract the zip file using the following command:
[3] !unzip flowers6.zip
You can find the "flower6" folder under the current directory. If you cannot see what shows below, please refresh it.
flowers6
sample_data
flowers6.zip
Upload
Refresh
New file
New folder
And its sub-folders are as shown in the figure below. The first-level contains two subfolders "train" and "test", which are used to store data for training and testing respectively. Each of them has 6 secondary sub-folders, which are used to store 6 kinds of flower images.
flowers6
test
buttercup
daisy
fritillary
snowdrop
sunflower
windflower
train
buttercup
daisy
1
fritillary
snowdrop
sunflower
windflower
sample_data flowers6.zip
You can use the following code to construct the train loader and the test loader. In this lab, the provided dataset is relatively small. However, training deep learning neural network models on more data can result in more skillful models. So here we introduce a powerful technique which is called "Data Augmentation". Image data augmentation is a technique that can be used to artificially expand the size of a training dataset by creating modified versions of images in the dataset. It can improve the ability of the fit models to generalize what they have learned to new images.
train_dir = 'flowers6/train' test_dir = 'flowers6/test'
train_transforms = transforms . Compose ( [transforms. RandomRotation(30),
transforms . RandomResizedCrop (224), transforms . RandomHorizontalFlip (), transforms . ToTensor (), transforms. Normalize((0.485,0.456,0.406), (0.229,0.224,0.225))])
test_transforms = transforms . Compose ( [transforms . Resize (256),
transforms . CenterCrop (224), transforms . ToTensor (), transforms . Normalize((0.485,0.456,0.406), (0.229,0.224,0.225))])
train_data = datasets. ImageFolder(train_dir, transform = train_transforms) test_data = datasets. ImageFolder(test_dir, transform = test_transforms)
train_loader = torch. utils. data. DataLoader (train_data, batch_size = batch_size_train, shuffle = True) test_loader = torch.utils. data. DataLoader (test_data, batch_size = batch_size_test, shuffle=True)
Pytorch provides very convenient API functions. You only need to add what the data augmentation you want in the transpose.compose function. Here we use three forms: random rotation, random crop and random horizontal flip.
ImageFolder function assumes that all files are stored in folders, each folder stores images of the same category, and the folder name is the category name.
The CNN model requires the input images to have the same size. So the images here are all resized to 224x224.
Please refer to the CNN module on canvas to complete this lab. But you need to pay attention to the following:
· The minimum requirement for model performance is that the test accuracy is greater than 82% (students registered ELEC 4806) and 85% (students who registered ELEC 5806)
· To achieve good performance, you can consider: o Add more convolution layers and pooling layers.
o You may have only 2-3 fully connected layers at the end of your model.
o You can try to use a relatively smaller learning rate and train more epochs.
Choose a smaller batch size, otherwise you will encounter out-of-memory issue. o
o Try different combination of activation function, optimization method, and regularization method.
o Please be patient fine tuning the model. This is the most basic requirement for engaging in deep learning work.
· Since the image data in this lab are all color images, you also need to modify some other codes, such as the "plot_image" function, etc.
· When evaluating the model, you must first use the .eval() function as shown in the figure below, otherwise, batch normalization and dropout will also be used in the test. The performance of the model will be very bad.
correct = 0 net.eval()
for data, target in test_loader: data, target = data.to(device), target.cuda() logits = net(data)
pred = logits.argmax(dim=1) correct += pred.eq(target).float().sum().item()
total_num = len(test_loader.dataset) acc = correct / total_num print('test acc:', acc)
· Since the three channels (RGB) of the image are all normalized when constructing the test_loader, thus, it is necessary to inversely transform the data when displaying the image. Please refer to the code in the figure below.
inv_normalize = transforms.Normalize( mean=[-0.485/0.229, -0.456/0.224, -0.406/0.225], std=[1/0.229, 1/0.224, 1/0.255] )
x, y = next(iter(test_loader)) x, y = x.to(device), y.cuda() out = net(x) pred = out.argmax(dim=1) X = inv_normalize(x) plot_image(x.detach().cpu().numpy(), pred,y)
Prediction = 0 Label = 0
0
25
50
75
100
125
150
175
200
0
50
100
150
200
. Please use only what you learned in the previous lectures to complete this lab. Plagiarizing code directly from other websites is not allowed. At the same time, methods to improve model performance . The purpose of this is to prevent plagiarism.
· Please try to use Google Colab to complete this Lab. Without GPU, the training process may become unacceptably long.
Submission requirements (all team members need to submit):
1. Please submit the .ipynb file . If you use google colab, you can download the .ipynb file by clicking "File->Download-> Download .ipynb". Please include the names of all team members in the file name.
2. Please submit a pdf version as well. You can use this website (https://htmtopdf.herokuapp.com/ipynbviewer/ ) to convert your .ipynb file to pdf.
3. Please record a video to demo your work. In the video, please indicate where you have changed the code and state the reason for the change. In addition, please describe in detail the process of modifying the model,
adjusting the parameters, the problems encountered, etc. And all team members must participate in the recording. If the team members cooperate remotely, they can submit multiple videos.
4. When you are recording a video, if you just read the comments you wrote in the code, or read according to a script, then you will not be able to prove that the work was done independently by you. Therefore, the lab will be judged as plagiarism./nELEC 4806/5806
Introduction to Deep Learning and PyTorch /TensorFlow
The state-of-the-art CNNs
0. ILSVRC winners
The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) evaluates algorithms for object detection and image classification at large scale
1. LeNet
LeNet refers to LeNet-5 was one of the earliest convolutional neural networks and promoted the development of deep learning. LeNet possesses the basic units of convolutional neural network, such as convolutional layer, pooling layer and full connection layer, laying a foundation for the future development of convolutional neural network.
2. AlexNet
AlexNet, which employed an 8-layer CNN, won the ImageNet Large Scale Visual Recognition Challenge 2012 by a phenomenally large margin. This network showed, for the first time, that the features obtained by learning can transcend manually-designed features, breaking the previous paradigm in computer vision.
from LeNet (left) to AlexNet (right)
3. VGG-16
VGG16 is a convolutional neural network model proposed by K. Simonyan and A. Zisserman from the University of Oxford in the paper “Very Deep Convolutional Networks for Large-Scale Image Recognition”. The model achieves 92.7% top-5 test accuracy in ImageNet, which is a dataset of over 14 million images belonging to 1000 classes. It was one of the famous model submitted to ILSVRC-2014. It makes the improvement over AlexNet by replacing large kernel-sized filters (11 and 5 in the first and second convolutional layer, respectively) with multiple 3×3 kernel-sized filters one after another. VGG16 was trained for weeks and was using NVIDIA Titan Black GPU’s.
4. GoogleNet
In this architecture, along with going deeper (it contains 22 layers in comparison to VGG which had 19 layers), the researchers also made a novel approach called the Inception module
Deep architectures, and specifically GoogLeNet (22 layers) are in danger of the vanishing gradients problem during training (back-propagation algorithm). The engineers of GoogLeNet addressed this issue by adding classifiers in the intermediate layers as well, such that the final loss is a combination of the intermediate loss and the final loss. This is why you see a total of three loss layers, unlike the usual single layer as the last layer of the network.
4. GoogleNet
An Inception Module is an image model block that aims to approximate an optimal local sparse structure in a CNN. Put simply, it allows for us to use multiple types of filter size, instead of being restricted to a single filter size, in a single image block, which we then concatenate and pass onto the next layer.
A 1x1 convolution simply maps an input pixel with all it's channels to an output pixel, not looking at anything around itself. It is often used to reduce the number of depth channels, since it is often very slow to multiply volumes with extremely large depths.
5. ResNet
ResNet, short for Residual Network is a specific type of neural network that was introduced in 2015
Won 1st place in the ILSVRC 2015 classification competition with a top-5 error rate of 3.57%
Won the 1st place in ILSVRC and COCO 2015 competition in ImageNet Detection, ImageNet localization, Coco detection and Coco segmentation.
Replacing VGG-16 layers in Faster R-CNN with ResNet-101. They observed relative improvements of 28%
Efficiently trained networks with 100 layers and 1000 layers also.
5. ResNet
Mostly in order to solve a complex problem, we stack some additional layers in the Deep Neural Networks which results in improved accuracy and performance. The intuition behind adding more layers is that these layers progressively learn more complex features.
But it has been found that there is a maximum threshold for depth with the traditional Convolutional neural network model.
5. ResNet
The problem of training very deep networks has been alleviated with the introduction of ResNet or residual networks and these ResNets are made up from Residual Blocks
A regular block (left) and a residual block (right)
We want the deep network to perform at least as good as the shallow network and not degrade the performance as we saw in case of plain neural networks(without residual blocks). One way of achieving so is if the additional layers in a deep network learn the identity function and thus their output equals inputs which do not allow them to degrade the performance even with extra layers
5. ResNet
ResNet follows VGG’s full 3×3 convolutional layer design. The residual block has two 3×3 convolutional layers with the same number of output channels. Each convolutional layer is followed by a batch normalization layer and a ReLU activation function. Then, we skip these two convolution operations and add the input directly before the final ReLU activation function. This kind of design requires that the output of the two convolutional layers has to be of the same shape as the input, so that they can be added together.
If we want to change the number of channels, we need to introduce an additional 1×1 convolutional layer to transform the input into the desired shape for the addition operation
ResNet block with and without 1×1 convolution
5. ResNet
6. Customize a ResNet-18 for CIFAR-10 classification task
6.1 ResNet-18 Architecture
6. Customize a ResNet-18 for CIFAR-10 classification task
6.1 ResNet-18 Architecture
6. Customize a ResNet-18 for CIFAR-10 classification task
Training Dataset:
The sample of data used to fit the model.
The actual dataset that we use to train the model (weights and biases in the case of a Neural Network).
The model sees and learns from this data.
Validation Dataset:
The sample of data used to provide an unbiased evaluation of a model fit on the training dataset while tuning model hyperparameters. The evaluation becomes more biased as skill on the validation dataset is incorporated into the model configuration.
Test Dataset:
The sample of data used to provide an unbiased evaluation of a final model fit on the training dataset.
6.2 Prepare a validation dataset
6. Customize a ResNet-18 for CIFAR-10 classification task
6.3 Scheduling a learning rate decay strategy
Learning rate decay is a technique for training modern neural networks. It starts training the network with a large learning rate and then slowly reducing/decaying it until local minima is obtained. It is empirically observed to help both optimization and generalization
6. Customize a ResNet-18 for CIFAR-10 classification task
6.3 Scheduling a learning rate decay strategy
6. Customize a ResNet-18 for CIFAR-10 classification task
6.4 Data Augmentation
Image data augmentation is a technique that can be used to artificially expand the size of a training dataset by creating modified versions of images in the dataset
Image data augmentation is used to expand the training dataset in order to improve the performance and ability of the model to generalize
7. Transfer learning
Transfer learning is a machine learning method where a model developed for a task is reused as the starting point for a model on a second task.
Transfer learning is an optimization that allows rapid progress or improved performance when modeling the second task.
7. Transfer learning
Transfer learning is a machine learning method where a model developed for a task is reused as the starting point for a model on a second task.
Transfer learning is an optimization that allows rapid progress or improved performance when modeling the second task.
7. Transfer learning and other the state of the arts
SqueezeNet
DenseNet
Inception v3
ShuffleNet v2
MobileNetV2
MobileNetV3
ResNeXt
Wide ResNet
MNASNet
import torchvision.models as models
resnet18 = models.resnet18(pretrained=True)
alexnet = models.alexnet(pretrained=True)
squeezenet = models.squeezenet1_0(pretrained=True)
vgg16 = models.vgg16(pretrained=True)
densenet = models.densenet161(pretrained=True)
inception = models.inception_v3(pretrained=True)
googlenet = models.googlenet(pretrained=True)
shufflenet = models.shufflenet_v2_x1_0(pretrained=True)
mobilenet_v2 = models.mobilenet_v2(pretrained=True)
mobilenet_v3_large = models.mobilenet_v3_large(pretrained=True)
mobilenet_v3_small = models.mobilenet_v3_small(pretrained=True)
resnext50_32x4d = models.resnext50_32x4d(pretrained=True)
wide_resnet50_2 = models.wide_resnet50_2(pretrained=True)
mnasnet = models.mnasnet1_0(pretrained=True)/n