Search for question
Question

Problem 1a - Concepts: Interpreting SSE Total SSE is the sum of the SSE for each separate attribute in the Kmeans algorithm. What does it mean if the SSE for one variable is low for all clusters? Low for just one cluster? High for all clusters? High for just one cluster? How could you use the per variable SSE information to improve your clustering?/nProblem 1b. Local and Global Objective Functions K-means. For the following sets of two-dimensional points, (1) provide a sketch of how they would be split into clusters by K-means for the given number of clusters and (2) indicate approximately where the resulting centroids would be. Assume that we are using the squared error objective function. If you believe that there is more than one possible solution, then please indicate whether each solution is a global or local minimum (draw pictures to represent your responses). Darker areas indicate higher density. Assume a uniform density within each shaded area./n(a) k=3. O (b) k=2 (c) k=2/nProblem 1c. Density clustering Suppose we apply DBSCAN to cluster the following dataset using Euclidean distance. 3 10. 2 c E 1 a A L L M 0 1 2 3 4 5 6 A point is a core point if its density (num point within EPS) is > MinPts. Given that MinPts =3 and EPS =, answer the following questions. a) Label all point as 'core points', 'boundary points', and 'noise'. b) What is the clustering result (i.e., how will the data cluster)?/nProblem 1d. Entropy vs. SSE Assume you are given a data set of objects, each of which is assigned to one of two classes, and suppose that C1 and C2 are two clusterings produced from this data set. If entropy judges C1 to be a more accurate clustering than C2, is it necessary that SSE will also judge C1 to be a more accurate clustering than C2?

Fig: 1

Fig: 2

Fig: 3

Fig: 4

Fig: 5