Search for question
Question

Problem 2. External Validation m When labels are available to distinguish groupings within a dataset, external cluster validation can be used to evaluate how well clustering re wwwwww Use sklearn's make_blobs to create a synthetic dataset for clustering. Your data set should include 5 clusters with similar variance and numb Use Kmeans to fit your model. And then save out predicted cluster assignments for each of your observations. www Select and employ an appropriate analytical method to assess the degree of agreement between your predicted vs. actual cluster assignments i. ww Questions: A. what method did you select to assess cluster agreement and why? B. What do your results of this assessment suggest? P/nProblem 1a. Kmeans Please load the following dataset: 'x1_vals.npy' Conduct an initial exploratory data analysis (EDA) to evaluate the data. Create at least two different types of visualizations to help you evaluate possible values for K (the number of clusters). Implement two different analytical methods to narrow your choice of K prior to modeling. Use scikit-learn to fit a basic kmeans clustering model with random initialization and reproducible results. Then create a plot of your results that distinguishes each of the clusters by color. Extract values for your cluster centroids, the number of iterations to convergence, as well as a value that serves as a measure of cluster 'coherence'. Questions: A. What method(s) did you use to identify an appropriate value for K? Why did you select this method? B. What value did you select for K? Does your EDA support this choice? C. How many iterations were required before your model converged? D. What were the values for each of your cluster centroids? E. What kmeans measure serves as a proxy for cluster coherence? What value did your model return? Discuss your interpretation of of this value. PA 191 Problem 1b. Silhouette Plot www mm (0) 0 Spaces: 4 Cell 1 of 4 Go Live

Fig: 1

Fig: 2