Search for question
Question

Analysis 2: Clustering

In this example, we will analyse the wine dataset from the UCI Machine Learning repository that is described here:

https://archive.ics.uci.edu/ml/datasets/Wine

The actual csv file for the data is available here:

https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data

For this dataset:

• Perform a k means clustering analysis for different values of k, including running the analysis multiple times with different initial centroid positions. Be careful to exclude the "region" variable which labels the known origin of each wine.

• By making a scree plot of the lowest cluster distortion found for each value of k vs k, determine the optimal number of clusters in the data.

• Does the number of clusters that you obtained above match the region data that was excluded in the training?