3 Problem description • Submit a binary classification model trained using the training data (variable name D) from 'data_final_report.py' in Moodle's 'Final Report' section • Choose from four types of binary classification models: linear model, neural network, decision tree, or support vector machine (multiple selections allowed) • The evaluation will be based on a combination of remote lecture reports and midterm checkpoint results, totaling 50 points. The final report will be graded out of 50 points for each model, with the total score (maximum 250 points) and 100 points, whichever is lower, being the final grade • The grading will be based on the classification accuracy [%] on our test data, divided by 2 (Errors will result in 0 points. Additionally, points will be deducted progressively for lack of summary or explanation of the model, insufficient training description, or deficiencies in the source code). ニューラルネットワーク How to submit in moodle • Perform three tasks: a brief summary of the model you chose from linear models, neural networks, decision trees, and support vector machines; the method of training; and upload the source code written for the function that performs classification. • Refer to the next slide for the source code of the function that performs classification. • If multiple people submit identical source code with the same parameters, those individuals will receive zero points./nNeed to write Code with Screenshots of output And Report with Comments Also do one or two models for the sake of comparison

Fig: 1

Most Viewed Questions Of Machine Learning

3-For the data shown in the attached figure (dark circles are one class, white circles another) solve the classification problem with a neuron by hand. That is, find the appropriate weights of the required linear discriminant.

Verified Answer

Q1 Consider the problem where we want to predict the gender of a person from a set of input parameters, namely height, weight, and age. a) Using Cartesian distance, Manhattan distance and Minkowski distance of order 3 as the similarity measurements show the results of the gender prediction for the Evaluation data that is listed below generated training data for values of K of 1, 3, and 7. Include the intermediate steps (i.e., distance calculation, neighbor selection, and prediction). b) c) To evaluate the performance of the KNN algorithm (using Euclidean distance metric), implement a leave- one-out evaluation routine for your algorithm. In leave-one-out validation, we repeatedly evaluate the algorithm by removing one data point from the training set, training the algorithm on the remaining data set and then testing it on the point we removed to see if the label matches or not. Repeating this for each of the data points gives us an estimate as to the percentage of erroneous predictions the algorithm makes and thus a measure of the accuracy of the algorithm for the given data. Apply your leave-one-out validation with your KNN algorithm to the dataset for Question 1 c) for values for K of 1, 3, 5, 7, 9, and 11 and report the results. For which value of K do you get the best performance? d) Repeat the prediction and validation you performed in Question 1 c) using KNN when the age data is removed (i.e. when only the height and weight features are used as part of the distance calculation in the KNN algorithm). Report the results and compare the performance without the age attribute with the ones from Question 1 c). Discuss the results. What do the results tell you about the data? Implement the KNN algorithm for this problem. Your implementation should work with different training data sets as well as different values of K and allow to input a data point for the prediction.

Verified Answer

Q1 Consider the problem where we want to predict the gender of a person from a set of input parameters, namely height, weight, and age. a) Using Cartesian distance, Manhattan distance and Minkowski distance of order 3 as the similarity measurements show the results of the gender prediction for the Evaluation data that is listed below generated training data for values of K of 1, 3, and 7. Include the intermediate steps (i.e., distance calculation, neighbor selection, and prediction). b) Implement the KNN algorithm for this problem. Your implementation should work with different training data sets as well as different values of K and allow to input a data point for the prediction. c) To evaluate the performance of the KNN algorithm (using Euclidean distance metric), implement a leave- one-out evaluation routine for your algorithm. In leave-one-out validation, we repeatedly evaluate the algorithm by removing one data point from the training set, training the algorithm on the remaining data set and then testing it on the point we removed to see if the label matches or not. Repeating this for each of the data points gives us an estimate as to the percentage of erroneous predictions the algorithm makes and thus a measure of the accuracy of the algorithm for the given data. Apply your leave-one-out validation with your KNN algorithm to the dataset for Question 1 c) for values for K of 1, 3, 5, 7, 9, and 11 and report the results. For which value of K do you get the best performance? d) Repeat the prediction and validation you performed in Question 1 c) using KNN when the age data is removed (i.e. when only the height and weight features are used as part of the distance calculation in the KNN algorithm). Report the results and compare the performance without the age attribute with the ones from Question 1 c). Discuss the results. What do the results tell you about the data?

Verified Answer

2. Perform K-means clustering with K = 2 using the Euclidean norm.Toss a coin 7 times to initialise the algorithm. 3. Cluster the data using hierarchical clustering with complete linkage and the Euclidean norm. Draw the resulting dendrogram.

Verified Answer

Q2. Using the data from Problem 2, build a Gaussian Naive Bayes classifier for this problem. For this you have to learn Gaussian distribution parameters for each input data feature, i.e. for p(height|W), p(height|M), p(weight|W), p(weight|M), p(age|W), p(age|M). a) Learn/derive the parameters for the Gaussian Na ive Bayes Classifier for the data from Question 2 a) and apply them to the same target as in problem 1a). b) Implement the Gaussian Na ive Bayes Classifier for this problem. c) Repeat the experiment in part 1 c) and 1 d) with the Gaussian Native Bayes Classifier. Discuss the results, in particular with respect to the performance difference between using all features and using only height and weight. d) Same as 1d but with Naïve Bayes. e) Compare the results of the two classifiers (i.e., the results form 1 c) and 1d) with the ones from 2 c) 2d) and discuss reasons why one might perform better than the other.

Verified Answer

1. Introduction In this assignment you will build on your knowledge of classification image classification problem using a convolutional neural network. This assignment aims to guide you through the processes by following the four fundamental princi- ples. in particular you will solve an • Data: Data import, preprocessing, and augmentation. • Model: Designing a convolutional neural network model for classifying the images of the parts. • Fitting: Training the model using stochastic gradient descent. • Validation: Checking the model's accuracy on the reserved test data set and investigating where the most improvement could be found. Additionally, looking into the uncertainty in the predictions. This is not necessarily a lincar process, after you have fit and/or validated your model, you may need to go back to carlier steps and adjust your processing of the data or your model structure. This may need to be done several times to achieve a satisfactory result. This assignment is worth 35% of your course grade and is graded from 0 35 marks. An additional two bonus marks are available to the student who's model performs best on a previously unseen data sel.

Verified Answer

(a) A new machine learning method was developed to predict whether a person has a particular disease using blood test results. The new machine learning method was then tested on 200 random selected persons. Its confusion matrix is given below. i. Calculate the accuracy of the machine learning method. ii. Compute sensitivity and specificity of the machine learning method. iii. What is the false positive rate? (b) A receiver operating characteristics (ROC) graph is a technique for visualizing and selecting classifiers based on their performance. i. On a diagram, draw the ROC of an ideal classification method and a "no information" classifier, respectively.[4 marks] ii. Briefly explain the meaning of the area under the ROC curve (AUC).What is the AUC value of the "no information" classifier? [4 marks] (c) Compare and contrast linear regression method and k-nearest neighbours regression method. (d) How can you deal with synergy effect in linear regression method?

Verified Answer

(a) What is meant by feature engineering in machine learning? (b) You are given a classification problem with one feature and the followingItraining set: As usual, y is the label. This is a multi-class classification problem with possible labels A, B, and C. The test samples are 0, 1, and -5. Find the 1-Nearest Neighbour prediction for each of the test samples. Use the standard Euclidean metric. If you have encountered any ties, discuss briefly your tie-breaking strategy.[5 marks] Engineer an additional feature for this dataset, namely ². Therefore, your new training set still has 6 labelled samples in its training set and 3 unlabelled samples in its test set, but there are two features, and ². Find the 1-Nearest Neighbour prediction for each of the test samples in the new dataset.[16 marks] (d) What is meant by a kernel in machine learning? (e) How can the distance between the images of two samples in the feature space be expressed via the corresponding kernel?[2 marks] (f) You are given the same training set as before, and only one test sample, 1. The learning problem is still multi-class classification with possible labels A, B, or C. Using kernelized Nearest Neighbours algorithm with kernel K(1,1)= (1-1¹)², compute the 3-Nearest Neighbours prediction for the test sample. If applicable, describe your tie-breaking strategy.[10 marks]

Verified Answer

For this programming assignment you will implement the Naive Bayes algorithm from scratch and the functions to evaluate it with a k-fold cross validation (also from scratch). You can use the code in the following tutorial to get started and get ideas for your implementation of the Naive Bayes algorithm but please, enhance it as much as you can (there are many things you can do to enhance it such as those mentioned at the end of the tutorial):

Verified Answer

Q2. Using the data from Problem 2, build a Gaussian Na ive Bayes classifier for this problem. For this you have to learn Gaussian distribution parameters for each input data feature, i.e. for p(height|W), p(height|M), p(weight|W), p(weight|M), p(age|W), p(age|M). a) Learn/derive the parameters for the Gaussian Naive Bayes Classifier for the data from Question 2 a) and apply them to the same target as in problem 1a). b) Implement the Gaussian Naive Bayes Classifier for this problem. c) Repeat the experiment in part 1 c) and 1 d) with the Gaussian Naive Bayes Classifier. Discuss the results, in particular with respect to the performance difference between using all features and using only height and weight. d) Same as 1d but with Naïve Bayes. e) Compare the results of the two classifiers (i.e., the results form 1 c) and 1d) with the ones from 2 c) 2d) and discuss reasons why one might perform better than the other.

Verified Answer