Search for question
Question

Problem solving activity: Solve the following set of problems using Python and submit the code file with extension .ipynb in OnTrack as part of your pass activity. SIT 307 and SIT 720 1. Load data from "Live_20210128.csv"file. Remove unwanted features if required. 2. Select the optimum k value using Silhouette Coefficient and plot the optimum k values. 3. Create clusters using Kmeans and Kmeans++ algorithms with optimal k value found in the previous problem. Report performances using appropriate evaluation metrics. Compare the results. 4. Now repeat clustering using Kmeans for 50 times and report the average performance. Again compare the results that you have obtained in Q3 using Kmeans++ and explain the difference (if any). SIT 720 5. Apply DBSCAN on this dataset ("Live_20210128.csv") and find the optimum "eps" and "min_samples" value. Is the number of cluster same as the cluster found in Q2? Explain the similarity or differences that you have found between two solutions.

Fig: 1