This question walks you through the typical process of discovering association rules. We will use the market basket data in the Groceries.csv file to discover association rules. Here are the data contents.

1- Customer: Customer Identifier

2- Item: Name of Product Purchased

For your information, we have sorted the observations in ascending order first by Customer and then by Item. Also, we have removed duplicated items for each customer.

A- What is the number of items in the Universal Set? What is the maximum number of itemsets that we can find in theory from the data? What is the maximum number of association rules that we can generate in theory from the data?

B- We are interested in the itemsets that can be found in the market baskets of at least seventy-five (75) customers. How many itemsets did we find? Also, what is the largest number of items, i.e., among these itemsets?

C- We will use up to the largest

value we found in Part (b) and then generate the association rules whose Confidence metrics are greater than or equal to 1%. How many association rules can we find? Next, we plot the Support metrics on the vertical axis against the Confidence metrics on the horizontal axis for these association rules. We will use the Lift metrics to indicate the size of the marker. We will add a color gradient legend to the chart for the Lift metrics.

D- Among the rules that you found in Part (c), list the rules whose Confidence metrics are greater than or equal to 60%. Please show the rules in a table that shows the Antecedent, the Consequent, the Support, the Confidence, the Expected Confidence, and the Lift.

Most Viewed Questions Of Machine Learning

3-For the data shown in the attached figure (dark circles are one class, white circles another) solve the classification problem with a neuron by hand. That is, find the appropriate weights of the required linear discriminant.

Verified Answer

Q1 Consider the problem where we want to predict the gender of a person from a set of input parameters, namely height, weight, and age. a) Using Cartesian distance, Manhattan distance and Minkowski distance of order 3 as the similarity measurements show the results of the gender prediction for the Evaluation data that is listed below generated training data for values of K of 1, 3, and 7. Include the intermediate steps (i.e., distance calculation, neighbor selection, and prediction). b) c) To evaluate the performance of the KNN algorithm (using Euclidean distance metric), implement a leave- one-out evaluation routine for your algorithm. In leave-one-out validation, we repeatedly evaluate the algorithm by removing one data point from the training set, training the algorithm on the remaining data set and then testing it on the point we removed to see if the label matches or not. Repeating this for each of the data points gives us an estimate as to the percentage of erroneous predictions the algorithm makes and thus a measure of the accuracy of the algorithm for the given data. Apply your leave-one-out validation with your KNN algorithm to the dataset for Question 1 c) for values for K of 1, 3, 5, 7, 9, and 11 and report the results. For which value of K do you get the best performance? d) Repeat the prediction and validation you performed in Question 1 c) using KNN when the age data is removed (i.e. when only the height and weight features are used as part of the distance calculation in the KNN algorithm). Report the results and compare the performance without the age attribute with the ones from Question 1 c). Discuss the results. What do the results tell you about the data? Implement the KNN algorithm for this problem. Your implementation should work with different training data sets as well as different values of K and allow to input a data point for the prediction.

Verified Answer

Q1 Consider the problem where we want to predict the gender of a person from a set of input parameters, namely height, weight, and age. a) Using Cartesian distance, Manhattan distance and Minkowski distance of order 3 as the similarity measurements show the results of the gender prediction for the Evaluation data that is listed below generated training data for values of K of 1, 3, and 7. Include the intermediate steps (i.e., distance calculation, neighbor selection, and prediction). b) Implement the KNN algorithm for this problem. Your implementation should work with different training data sets as well as different values of K and allow to input a data point for the prediction. c) To evaluate the performance of the KNN algorithm (using Euclidean distance metric), implement a leave- one-out evaluation routine for your algorithm. In leave-one-out validation, we repeatedly evaluate the algorithm by removing one data point from the training set, training the algorithm on the remaining data set and then testing it on the point we removed to see if the label matches or not. Repeating this for each of the data points gives us an estimate as to the percentage of erroneous predictions the algorithm makes and thus a measure of the accuracy of the algorithm for the given data. Apply your leave-one-out validation with your KNN algorithm to the dataset for Question 1 c) for values for K of 1, 3, 5, 7, 9, and 11 and report the results. For which value of K do you get the best performance? d) Repeat the prediction and validation you performed in Question 1 c) using KNN when the age data is removed (i.e. when only the height and weight features are used as part of the distance calculation in the KNN algorithm). Report the results and compare the performance without the age attribute with the ones from Question 1 c). Discuss the results. What do the results tell you about the data?

Verified Answer

2. Perform K-means clustering with K = 2 using the Euclidean norm.Toss a coin 7 times to initialise the algorithm. 3. Cluster the data using hierarchical clustering with complete linkage and the Euclidean norm. Draw the resulting dendrogram.

Verified Answer

Q2. Using the data from Problem 2, build a Gaussian Naive Bayes classifier for this problem. For this you have to learn Gaussian distribution parameters for each input data feature, i.e. for p(height|W), p(height|M), p(weight|W), p(weight|M), p(age|W), p(age|M). a) Learn/derive the parameters for the Gaussian Na ive Bayes Classifier for the data from Question 2 a) and apply them to the same target as in problem 1a). b) Implement the Gaussian Na ive Bayes Classifier for this problem. c) Repeat the experiment in part 1 c) and 1 d) with the Gaussian Native Bayes Classifier. Discuss the results, in particular with respect to the performance difference between using all features and using only height and weight. d) Same as 1d but with Naïve Bayes. e) Compare the results of the two classifiers (i.e., the results form 1 c) and 1d) with the ones from 2 c) 2d) and discuss reasons why one might perform better than the other.

Verified Answer

1. Introduction In this assignment you will build on your knowledge of classification image classification problem using a convolutional neural network. This assignment aims to guide you through the processes by following the four fundamental princi- ples. in particular you will solve an • Data: Data import, preprocessing, and augmentation. • Model: Designing a convolutional neural network model for classifying the images of the parts. • Fitting: Training the model using stochastic gradient descent. • Validation: Checking the model's accuracy on the reserved test data set and investigating where the most improvement could be found. Additionally, looking into the uncertainty in the predictions. This is not necessarily a lincar process, after you have fit and/or validated your model, you may need to go back to carlier steps and adjust your processing of the data or your model structure. This may need to be done several times to achieve a satisfactory result. This assignment is worth 35% of your course grade and is graded from 0 35 marks. An additional two bonus marks are available to the student who's model performs best on a previously unseen data sel.

Verified Answer

(a) What is meant by feature engineering in machine learning? (b) You are given a classification problem with one feature and the followingItraining set: As usual, y is the label. This is a multi-class classification problem with possible labels A, B, and C. The test samples are 0, 1, and -5. Find the 1-Nearest Neighbour prediction for each of the test samples. Use the standard Euclidean metric. If you have encountered any ties, discuss briefly your tie-breaking strategy.[5 marks] Engineer an additional feature for this dataset, namely ². Therefore, your new training set still has 6 labelled samples in its training set and 3 unlabelled samples in its test set, but there are two features, and ². Find the 1-Nearest Neighbour prediction for each of the test samples in the new dataset.[16 marks] (d) What is meant by a kernel in machine learning? (e) How can the distance between the images of two samples in the feature space be expressed via the corresponding kernel?[2 marks] (f) You are given the same training set as before, and only one test sample, 1. The learning problem is still multi-class classification with possible labels A, B, or C. Using kernelized Nearest Neighbours algorithm with kernel K(1,1)= (1-1¹)², compute the 3-Nearest Neighbours prediction for the test sample. If applicable, describe your tie-breaking strategy.[10 marks]

Verified Answer

For this programming assignment you will implement the Naive Bayes algorithm from scratch and the functions to evaluate it with a k-fold cross validation (also from scratch). You can use the code in the following tutorial to get started and get ideas for your implementation of the Naive Bayes algorithm but please, enhance it as much as you can (there are many things you can do to enhance it such as those mentioned at the end of the tutorial):

Verified Answer

Q2. Using the data from Problem 2, build a Gaussian Na ive Bayes classifier for this problem. For this you have to learn Gaussian distribution parameters for each input data feature, i.e. for p(height|W), p(height|M), p(weight|W), p(weight|M), p(age|W), p(age|M). a) Learn/derive the parameters for the Gaussian Naive Bayes Classifier for the data from Question 2 a) and apply them to the same target as in problem 1a). b) Implement the Gaussian Naive Bayes Classifier for this problem. c) Repeat the experiment in part 1 c) and 1 d) with the Gaussian Naive Bayes Classifier. Discuss the results, in particular with respect to the performance difference between using all features and using only height and weight. d) Same as 1d but with Naïve Bayes. e) Compare the results of the two classifiers (i.e., the results form 1 c) and 1d) with the ones from 2 c) 2d) and discuss reasons why one might perform better than the other.

Verified Answer

Question 1 Download the SGEMM GPU kernel performance dataset from the below link. https://archive.ics.uci.edu/ml/datasets/SGEMM+GPU+kernel+performance Understand the dataset by performing exploratory analysis. Prepare the target parameter by taking the average of the THREE (3) runs with long performance times. Design a linear regression model to estimate the target using only THREE (3) attributes from the dataset. Discuss your results, relevant performance metrics and the impact of normalizing the dataset.

Verified Answer