5. Does the accuracy of a kNN classifier using the Euclidean distance change if you (a) translate the data(b) scale the data (i.e., multiply the all the points by a constant), or (c) rotate the data? Explain.Answer the same for a kNN classifier using Manhattan distance¹.
Decision Trees: 1. Consider the problem from the previous assignments where we want to predict gender from information about height, weight, and age. We will use Decision Trees to make this prediction. Note that as the data attributes are continuous numbers you have to use the 2 attribute and determine a threshold for each node in the tree. As a result, you need to solve the information gain for each threshold that is halfway between two data points and thus the complexity of the computations increases with the number of data items. a) Implement a decision tree learner for this particular problem that can derive decision trees with an arbitrary, pre- determined depth (up to the maximum depth where all data sets at the leaves are pure) using the information gain criterion. b) Divide the data set from Question 1c) in Project 1 (the large training data set) into a training set comprising the first 50 data points and a test set consisting of the last 70 data elements. Use the resulting training set to derive trees of depths 1-5 and evaluate the accuracy of the resulting trees for the 50 training samples and for the test set containing the last 70 data items. Compare the classification accuracy on the test set with the one on the training set for each tree depth. For which depths does the result indicate overfitting?
2. Perform K-means clustering with K = 2 using the Euclidean norm.Toss a coin 7 times to initialise the algorithm. 3. Cluster the data using hierarchical clustering with complete linkage and the Euclidean norm. Draw the resulting dendrogram.
3-For the data shown in the attached figure (dark circles are one class, white circles another) solve the classification problem with a neuron by hand. That is, find the appropriate weights of the required linear discriminant.
2. (a) What is a linear scoring function? How can it be used for classifying test samples into positive and negative?[4 marks] (b) Consider the linear scoring function with parameters b = 2 and w = (2, −1, 1),where w are the weights. Calculate the predicted label for the test sample (c) What is the margin of a given separating hyperplane? (d) What is meant by the maximum margin hyperplane (also known as the optimal separating hyperplane)? (e) Define the notion of a support vector in the context of maximum margin classifiers.[3 marks] (f) Consider the following training set with two features: the positive samples are (0, 2) and (1, 2); ·the negative samples are (0,0), (0, 1), and (−1, 1). (g) Give an example of a training set for the problem of binary classification with only one feature where no separating hyperplane exists.[3 marks] (h) State an optimization problem whose solution is the maximum margin hyper-plane. Give the geometric interpretation of each formula in this optimization problem.[6 marks] (i) State the soft margin classifier as an optimization problem. Give the geometric interpretation of this problem.
(a) Give two examples of practical problems of supervised machine learning[4 marks]and identify samples and labels for them. (b) What is meant by a classification problem in machine learning? When is a classification problem called binary? When is it called multi-class? [4 marks] (c) What is meant by a regression problem in machine learning? (d) What is meant by a feature in machine learning? What is the differencebetween discrete and continuous features?[4 marks] (e) Compare and contrast batch and online learning protocols in machine learning.[6 marks] (f) Consider the following regression problem. The training set is: The test set consists of two samples, (0, 1,0) and (0,0,0). i. Calculate the predicted labels for the test set using the K Nearest Neighbours algorithm with Euclidean distance for K = 2.[6 marks] ii. Now you are told that the true labels of the test samples (0,1,0) and(0,0,0) are 1 and 0, respectively. Calculate the test TSS, test RSS, andtest R² for your predictions. What does the value of test R² tell you aboutthe quality of the predictions?[7 marks] iii. Which method would you use to compute the test R² in scikit-learn?[1 mark]
Question 1 [10 points]: Binary Classification Consider a situation that the common distribution used in the binary classification, μ = {0,1}, forms the Bernoulli distribution
1. Give an example of a low dimensional (approx. 20 dimensions), medium dimensional (approx. 1000 dimensions) and high dimensional (approx. 100000 dimensions) problem that you care about.