Question
AAI/CPE/EE 695 Applied Machine Learning: Midterm Exam Spring 2024 This midterm contains two questions, each of which requires you to solve a coding problem, then discuss the results. You are free to use library implementations of methods and tools from Scikit-Learn or PyTorch. Make sure to answer every part of each question. For this exam, submit a single Jupyter Notebook (*.ipynb) file via Canvas. In addition to your code, include answers to any discussion questions in markdown cells in the same file. Question 1 (60 points): Dataset: MidtermQ1Data.zip contains images of two types of fruit: green apples and oranges. The images are divided into two sets: a train set, and a test set. Setup: When applying machine learning methods to complex data, one of the most common first steps is to extract salient features. For images, one such feature could be a color histogram, which measures the occurrence rates of each color within an image: from PIL import Image import numpy as np def get_image_histogram (image_filename, nbins=5): im = Image.open(image_filename) im = im.resize((100,100)) im_array = np.array(im).transpose(2,0,1) im_array = im_array.reshape(3,-1) im_array = np.transpose(im_array, (1,0)) return np.histogramdd(im_array, bins=nbins,range=[(0,255) for in range(3)])[0].flatten() Task: Use this color histogram feature to preprocess the images, then choose two of the following methods and apply them to classify the images as either a green apple or an orange: Method Logistic Regression Decision Tree Support Vector Machine (SVM) Random Forest Hyperparameters Maximum Iterations Maximum Depth Maximum Iterations Maximum Leaf Nodes Artificial Neural Network (a.k.a Multilayer Perceptron) | Learning Rate 1. For each of your chosen methods: a. Use Scikit-Learn's implementation to train the model on the train set, and employ a grid search using cross-validation to find an optimized value of the listed hyperparameter. b. Use the best hyperparameter value you found and train each model on the full train set 2. Compare the performance of your two chosen methods: a. Plot the results of the grid searches: the hyperparameter value on the x axis, and the performance metric on the y axis. Discuss the results of the grid searches and possible reasons for the behavior exhibited by the systems. b. Measure each method's performance on the test set; report Accuracy and F1 Score on the test set. C. Discuss any difference in performance between the two methods and possible reasons for their relative performance. 3. Explain whether the color histogram is a good or bad choice of feature for this task and why.