Machine Learning

Questions & Answers

(a) What is meant by feature engineering in machine learning? (b) You are given a classification problem with one feature and the followingItraining set: As usual, y is the label. This is a multi-class classification problem with possible labels A, B, and C. The test samples are 0, 1, and -5. Find the 1-Nearest Neighbour prediction for each of the test samples. Use the standard Euclidean metric. If you have encountered any ties, discuss briefly your tie-breaking strategy.[5 marks] Engineer an additional feature for this dataset, namely ². Therefore, your new training set still has 6 labelled samples in its training set and 3 unlabelled samples in its test set, but there are two features, and ². Find the 1-Nearest Neighbour prediction for each of the test samples in the new dataset.[16 marks] (d) What is meant by a kernel in machine learning? (e) How can the distance between the images of two samples in the feature space be expressed via the corresponding kernel?[2 marks] (f) You are given the same training set as before, and only one test sample, 1. The learning problem is still multi-class classification with possible labels A, B, or C. Using kernelized Nearest Neighbours algorithm with kernel K(1,1)= (1-1¹)², compute the 3-Nearest Neighbours prediction for the test sample. If applicable, describe your tie-breaking strategy.[10 marks]


(a) Suppose that, when using grid search with cross-validation to select the parameters C and gamma of the Support Vector Machine (SVM), you have obtained these results for the accuracy of the algorithm: (As usual, the accuracy is defined as 1 minus the error rate.) Is this a suitable grid for selecting the optimal values of the two parameters? Explain why. If it is not suitable, describe at least one way of improving it.[7 marks] (b) Give an example of a grid that is too crude and thus does not allow an accurate estimate of the optimal values of the parameters C and gamma of the SVM.[7 marks] (c) Give an example of a grid that clearly does not cover the optimal values ofthe parameters C and gamma of the SVM. Briefly explain why your example achieves its goal.[7 marks]


Problem 2 - Gaussian Process Consider a parametic model governed by the parameter vector w together with a data set of input values x1,...,xN and a nonlinear feature mapping


(a) Define the notion of a feature mapping in machine learning. Define the kernel corresponding to a given feature mapping. (b) Give an advantage of using kernels over applying a feature mapping explicitly. (c) Suppose K is a kernel. i. What is the corresponding normalized kernel? Make sure to avoid any possibility of dividing by zero in your formula.[4 marks] ii. Give an advantage of using normalized kernels. (d) Describe in detail the kernel form of the 1-Nearest Neighbour algorithm. (e) You are given the following training set: object (3, 1,0) labelled as 1, object (0,2,-1) labelled as -1. Using the polynomial kernel K (x,z¹) = (1+x-²)², find the 1-Nearest Neighbour prediction for object (1,1,1), showing your calculations. Consider the learning problem as classification.[7 marks] (f) Describe in detail the algorithm of Kernel Ridge Regression (KRR). Make sure to define your notation. (g) You are given the following regression training set: object (0,0,0) labelled as 1. object (0, 1,0) labelled as -1. Using the polynomial kernel K(r, x¹) = (1+x)² and ridge parameter (also known as tuning parameter) À = 1, find the KRR prediction for object (0, 0, 1).If you need to invert a 2 x 2 matrix, you may do so using the formula \left(\begin{array}{ll} a & b \\ c & d \end{array}\right)^{-1}=\frac{1}{a d-b c}\left(\begin{array}{cc} d & -b \\ -c & a \end{array}\right) .


(a) A new machine learning method was developed to predict whether a person has a particular disease using blood test results. The new machine learning method was then tested on 200 random selected persons. Its confusion matrix is given below. i. Calculate the accuracy of the machine learning method. ii. Compute sensitivity and specificity of the machine learning method. iii. What is the false positive rate? (b) A receiver operating characteristics (ROC) graph is a technique for visualizing and selecting classifiers based on their performance. i. On a diagram, draw the ROC of an ideal classification method and a "no information" classifier, respectively.[4 marks] ii. Briefly explain the meaning of the area under the ROC curve (AUC).What is the AUC value of the "no information" classifier? [4 marks] (c) Compare and contrast linear regression method and k-nearest neighbours regression method. (d) How can you deal with synergy effect in linear regression method?


1. (a) Give a definition of learning in terms of tasks, experience, and performance measures. Give an example, making sure to identify the task, the experience, and the performance measure.[6 marks] (b) Describe the K-Nearest Neighbours algorithm for classification. (c) You are given the following training set: The problem is to predict the label of each object in the following test set: (1,0), (0,0). i. Is this a regression or classification problem? ii. Solve this problem using the K-Nearest Neighbours algorithm with Euclide an distance and K = 3.[6 marks] (d) What is the computational complexity of the K-Nearest Neighbours algorithm? Explain briefly why. In your answer, you may assume that K is a fixed constant.[5 marks] (e) How would you summarize the difference between inductive and transductive algorithms in machine learning? Which of these two classes does K-Nearest Neighbours belong to?[5 marks] (f) Give two advantages and two disadvantages of the K-Nearest Neighbours algorithm.[8 marks]


4. Give two examples of data where the Euclidean distance is not the right metric.


(a) What is meant by data normalization in machine learning? (Remember that in this course "normalization" is understood in the wide sense and includes the transformations perfomed by Normalizer, Standard Scaler, etc., in scikit-learn.)[2 marks] (b) Briefly describe the class Standard Scaler in scikit-learn, paying particular attention to its fit and transform methods.[5 marks] (c) Compare and contrast the classes StandardScaler and RobustScaler inscikit-learn.[3 marks] (d) Briefly describe the class MinMaxScaler in scikit-learn. (e) Consider the following training set: What is its normalized version, in the sense of MinMaxScaler? Apply the same transformation to the test set (f) What is meant by data snooping in machine learning? Explain, briefly and in plain English, what the following code is doing (assuming that all functions that it uses have been loaded from the relevant libraries).


No Question Found forMachine Learning

we will make sure available to you as soon as possible.