Machine Learning

Search for question

Questions & Answers

1. Introduction In this assignment you will build on your knowledge of classification image classification problem using a convolutional neural network. This assignment aims to guide you through the processes by following the four fundamental princi- ples. in particular you will solve an • Data: Data import, preprocessing, and augmentation. • Model: Designing a convolutional neural network model for classifying the images of the parts. • Fitting: Training the model using stochastic gradient descent. • Validation: Checking the model's accuracy on the reserved test data set and investigating where the most improvement could be found. Additionally, looking into the uncertainty in the predictions. This is not necessarily a lincar process, after you have fit and/or validated your model, you may need to go back to carlier steps and adjust your processing of the data or your model structure. This may need to be done several times to achieve a satisfactory result. This assignment is worth 35% of your course grade and is graded from 0 35 marks. An additional two bonus marks are available to the student who's model performs best on a previously unseen data sel.


Q1 Consider the problem where we want to predict the gender of a person from a set of input parameters, namely height, weight, and age. a) Using Cartesian distance, Manhattan distance and Minkowski distance of order 3 as the similarity measurements show the results of the gender prediction for the Evaluation data that is listed below generated training data for values of K of 1, 3, and 7. Include the intermediate steps (i.e., distance calculation, neighbor selection, and prediction). b) c) To evaluate the performance of the KNN algorithm (using Euclidean distance metric), implement a leave- one-out evaluation routine for your algorithm. In leave-one-out validation, we repeatedly evaluate the algorithm by removing one data point from the training set, training the algorithm on the remaining data set and then testing it on the point we removed to see if the label matches or not. Repeating this for each of the data points gives us an estimate as to the percentage of erroneous predictions the algorithm makes and thus a measure of the accuracy of the algorithm for the given data. Apply your leave-one-out validation with your KNN algorithm to the dataset for Question 1 c) for values for K of 1, 3, 5, 7, 9, and 11 and report the results. For which value of K do you get the best performance? d) Repeat the prediction and validation you performed in Question 1 c) using KNN when the age data is removed (i.e. when only the height and weight features are used as part of the distance calculation in the KNN algorithm). Report the results and compare the performance without the age attribute with the ones from Question 1 c). Discuss the results. What do the results tell you about the data? Implement the KNN algorithm for this problem. Your implementation should work with different training data sets as well as different values of K and allow to input a data point for the prediction.


Q2. Using the data from Problem 2, build a Gaussian Na ive Bayes classifier for this problem. For this you have to learn Gaussian distribution parameters for each input data feature, i.e. for p(height|W), p(height|M), p(weight|W), p(weight|M), p(age|W), p(age|M). a) Learn/derive the parameters for the Gaussian Naive Bayes Classifier for the data from Question 2 a) and apply them to the same target as in problem 1a). b) Implement the Gaussian Naive Bayes Classifier for this problem. c) Repeat the experiment in part 1 c) and 1 d) with the Gaussian Naive Bayes Classifier. Discuss the results, in particular with respect to the performance difference between using all features and using only height and weight. d) Same as 1d but with Naïve Bayes. e) Compare the results of the two classifiers (i.e., the results form 1 c) and 1d) with the ones from 2 c) 2d) and discuss reasons why one might perform better than the other.


3-For the data shown in the attached figure (dark circles are one class, white circles another) solve the classification problem with a neuron by hand. That is, find the appropriate weights of the required linear discriminant.


Question 1 [10 points]: Binary Classification Consider a situation that the common distribution used in the binary classification, μ = {0,1}, forms the Bernoulli distribution


Link of assignment https://github.com/ajdillhoff/CSE6363/tree/main/assignments/assignment 3


Task 0: Naïve Logistic Regression Make a logistic regression and report the accuracy. Task 1: Train Data Transformation Perform the pre-processing to transform the original data into a new feature space by doing feature engineering so the features are linear in the new space. Confirm four assumptions required for a linear classifier. Task 2: Linear Parametric Classification Implement logistic regression model using Scikit-learn. Using the GridSearchCV, optimize the model. 1. Make a logistic regression model. Report the weights and the accuracy of the model. 2. Using the GridSearchCV at various 100 a values from 10-5 to 10, build a logistic regression model. Visualize how the model accuracy behaviors. Then report the best model. If the accuracy is 100%, then the model is overfitted. In this case, the model should be regularized. 3. Using the best model, classify the test data set. Task 3: Transformation using Kernel Method Kernelize the original to a Kernel space using five different valid Kernel functions. Then repeat Task 2. Task 4: Non-parametric KNN Classification 1. Classify the original data with K values from 1 to 200. Then report the accuracy with visualization. 2. Repeat step 1 with the final train data sets from Tasks 1 and 3. Report: Write a report summarizing the work. In the report, all steps must be explicitly explained with visualizations.


For this programming assignment you will implement the Lenet 5 CNN using either pytorch or tensorflow, but not Keras. You can take a look to other implementations in internet but please, when coding use your personal coding style and add references to your sources.


In this problem, we will use a linear binary classifier (no activation function) with weight matrix W of size M x F to illustrate the use of the gradient descent algorithm in updating W at each iteration. We choose F = 5 and M = 2 to represent the number of features (per one example) and number of labels considered, respectively.


Decision Trees: 1. Consider the problem from the previous assignments where we want to predict gender from information about height, weight, and age. We will use Decision Trees to make this prediction. Note that as the data attributes are continuous numbers you have to use the 2 attribute and determine a threshold for each node in the tree. As a result, you need to solve the information gain for each threshold that is halfway between two data points and thus the complexity of the computations increases with the number of data items. a) Implement a decision tree learner for this particular problem that can derive decision trees with an arbitrary, pre- determined depth (up to the maximum depth where all data sets at the leaves are pure) using the information gain criterion. b) Divide the data set from Question 1c) in Project 1 (the large training data set) into a training set comprising the first 50 data points and a test set consisting of the last 70 data elements. Use the resulting training set to derive trees of depths 1-5 and evaluate the accuracy of the resulting trees for the 50 training samples and for the test set containing the last 70 data items. Compare the classification accuracy on the test set with the one on the training set for each tree depth. For which depths does the result indicate overfitting?


2. Perform K-means clustering with K = 2 using the Euclidean norm.Toss a coin 7 times to initialise the algorithm. 3. Cluster the data using hierarchical clustering with complete linkage and the Euclidean norm. Draw the resulting dendrogram.


(a) Suppose that, when using grid search with cross-validation to select the parameters C and gamma of the Support Vector Machine (SVM), you have obtained these results for the accuracy of the algorithm: (As usual, the accuracy is defined as 1 minus the error rate.) Is this a suitable grid for selecting the optimal values of the two parameters? Explain why. If it is not suitable, describe at least one way of improving it.[7 marks] (b) Give an example of a grid that is too crude and thus does not allow an accurate estimate of the optimal values of the parameters C and gamma of the SVM.[7 marks] (c) Give an example of a grid that clearly does not cover the optimal values ofthe parameters C and gamma of the SVM. Briefly explain why your example achieves its goal.[7 marks]


3. (a) What is regularization in the context of machine learning? (b) Define and briefly discuss ridge regression. (c) What is the LASSO in regression analysis? What is the main difference between the ridge regression and LASSO?[10 marks] (d) What is "subset selection"? Discuss the two stepwise selection methods in the context of linear regression. (e) What is meant by a maximal margin classifier? Name one example of maximal margin classifiers. (f) What is the "kernel trick" in machine learning?


(a) What is meant by data normalization in machine learning? (Remember that in this course "normalization" is understood in the wide sense and includes the transformations perfomed by Normalizer, Standard Scaler, etc., in scikit-learn.)[2 marks] (b) Briefly describe the class Standard Scaler in scikit-learn, paying particular attention to its fit and transform methods.[5 marks] (c) Compare and contrast the classes StandardScaler and RobustScaler inscikit-learn.[3 marks] (d) Briefly describe the class MinMaxScaler in scikit-learn. (e) Consider the following training set: What is its normalized version, in the sense of MinMaxScaler? Apply the same transformation to the test set (f) What is meant by data snooping in machine learning? Explain, briefly and in plain English, what the following code is doing (assuming that all functions that it uses have been loaded from the relevant libraries).


(a) Give the definition of a nonconformity measure in the context of conformal prediction.[3 marks] (b) Define the conformal predictor for a given nonconformity measure. Make sure to include the definition of the prediction set at a given significance level.[5 marks] (c) Compare and contrast nonconformity measures and conformity measures.[4 marks] (d) In the context of conformal prediction, what is the minimal possible p-value for a training set of size n? (e) Define the average false p-value for a test set in the context of conformal prediction.[3 marks] (f) Consider the following binary classification problem with two features. The training set is: • positive samples: (0,0), (1,0), (0,1); .• negative samples: (4,4), (3,4), (4,3). The test set consists of two samples, (1, 1) and (3,3). i. Find the 1-Nearest Neighbour predictions for the test samples. ii. Using the distance to the nearest sample of the same class as non-conformity measure, compute all the p-values for the test samples. For each test sample, compute the point prediction, confidence, and credibility.[13 marks] iii. You are told that the true labels of the test samples (1, 1) and (3, 3) are+1 and -1, respectively. What is the average false p-value for the test set with these labels?[3 marks]


1. Give an example of a low dimensional (approx. 20 dimensions), medium dimensional (approx. 1000 dimensions) and high dimensional (approx. 100000 dimensions) problem that you care about.


(a) What is meant by feature engineering in machine learning? (b) You are given a classification problem with one feature and the followingItraining set: As usual, y is the label. This is a multi-class classification problem with possible labels A, B, and C. The test samples are 0, 1, and -5. Find the 1-Nearest Neighbour prediction for each of the test samples. Use the standard Euclidean metric. If you have encountered any ties, discuss briefly your tie-breaking strategy.[5 marks] Engineer an additional feature for this dataset, namely ². Therefore, your new training set still has 6 labelled samples in its training set and 3 unlabelled samples in its test set, but there are two features, and ². Find the 1-Nearest Neighbour prediction for each of the test samples in the new dataset.[16 marks] (d) What is meant by a kernel in machine learning? (e) How can the distance between the images of two samples in the feature space be expressed via the corresponding kernel?[2 marks] (f) You are given the same training set as before, and only one test sample, 1. The learning problem is still multi-class classification with possible labels A, B, or C. Using kernelized Nearest Neighbours algorithm with kernel K(1,1)= (1-1¹)², compute the 3-Nearest Neighbours prediction for the test sample. If applicable, describe your tie-breaking strategy.[10 marks]


No Question Found forMachine Learning

we will make sure available to you as soon as possible.