Search for question
Question

a. Bias: Model 1 has higher bias as it assumes a linear relationship between X and Y. Model 2 has lower bias due to its flexibility to fit higher-degree relationships. b. Variance: Model 1 has lower variance because of its simplicity. Model 2 has higher variance due to its complexity and sensitivity to changes in the training set. c. Likelihood of Overfitting: Model 1 is less likely to overfit because it's simpler. Model 2, being more complex, has a higher likelihood of overfitting, particularly if the training data is not large enough. Q2. a. The percentage of the data clustered We anticipate a drop in the proportion of clustered data if MinPoints is raised from 6 to 8 while maintaining the same epsilon value. This is so that the requirements for a point to be deemed a core point are stricter as a result of raising the MinPoints value. In other words, fewer points will satisfy this requirement and be a member of a cluster since more points in the epsilon radius are required. b. The number of clusters The number of clusters should either go down or keep the same, but not rise. Higher MinPoints may cause smaller clusters to disintegrate and its points to either become a part of bigger clusters or be classed as outliers since fewer points will be considered core points. C. The percentage of the data labeled as outliers We anticipate that the aforementioned modifications will lead to an increase in the proportion of data classified as outliers. More points will fall short of the requirements and be classified as noise or outliers when the requirements for being a core point or a border point get stricter (as a result of the increase in MinPoints). Q3. Iteration 1 C1A, C2=C Points. Distance C1. Distance C2. Assigment A 0. √2 C1 B 1. √5 C1 C √2 C2 DE √10 √8 C2 Е √20 √2 C2 updated centroid C1 = (1, 0.5) = C2 (1.66, 3.66) Iteration 2 C1 = (1, 0.5) C2 (1.66, 3.66) = Points. ABCDE Distance C1. √0.25 √0.25 √3.25 √5.51 √13.25 √0.23 √24.25 √3.59 √7.51 √13.83 77 Distance C2. Assigment C1 C1 C1 C2 C2 updated centroid C1 = (0.66, 1) C2 = (2.5, 4.5) Iteration 3 C1 = (0.66, 1) C2 = (2.5, 4.5) Points. ABCDE Distance C1. √0.115 Distance C2. Assigment √14.5 C1 √1.115 √22.5 C1 √1.435 √12.5 C1 √10.79 √0.5 C2 √21.47 √0.5 C2 updated centroid C1 = (0.66, 1) C2 = (2.5, 4.5) Iteration 4 C1 = (0.66, 1) C2 = (2.5, 4.5) Points. ABCDE Distance C1. Distance C2. Assigment √O.115 √14.5 C1 √1.115 √22.5 C1 √1.435 √12.5 C1 √10.79 √0.5 C2 √21.47 √0.5 C2 updated centroid C1 = (0.66, 1) C2 = (2.5, 4.5) repeat 3 and 4 as the clusters wont change anymore point a. b. SC Q4. (0,0). 3 6.5 0.538 (0,1). 2.5 5.5 0.545 (2,3). 4.5 1.5 -0.667 (3,3). 4 1 -0.75 (3,4). 5 1 -0.8 Q5. a. Step 1: Take K = 1 Creating a table containing support count of each item present in dataset - Called C1(candidate set) Itemset Sup_count 11 2 12 4 2 3 4 4 1 Removing rows where the min_support is below 2. This is our set of frequent 1-itemsets. (C1 final) Itemset Sup_count 11 13 123 2 12 4 4 Step 2: Take K = 2 - Creating a table containing support count of 2 items present in dataset – Called C2(candidate set) Itemset Sup_count 12, 14 1 12, 13 3 11, 12 1 11, 13 2 Removing rows where the min_support is below 2. This is our set of frequent 2-itemsets.(C2 final) Itemset 12, 13 3 11, 13 2 Sup_count Step 3: Take K = 3 Creating a table containing support count of 3 items present in dataset - Called C3(candidate set) Itemset 11, 12, 13 Sup_count 1 Removing rows where the min_support is below 2.This is our set of frequent 3-itemsets. (C3 final) None Therefore, the frequent itemsets are {11}, {12}, {13}, {11, 13}, and {12, 13}. b. From the frequent itemsets, we can generate the following association rules: 1. {11} -> {13} 2. {13} -> {11} 3. {12} -> {13} 4. {13} -> {12} To find the strong rules, we calculate the confidence of each rule. Confidence({11} -> {13}) = Support({11, 13}) / Support({11}) = 2/2 = 1 or 100% Confidence({13} -> {11}) = Support({11, 13}) / Support({13}) = 2/4 = 0.5 or 50% Confidence({12} -> {13}) = Support({12, 13}) / Support({12}) = 3/4 = 0.75 or 75% Confidence({13} -> {12}) = Support({12, 13}) / Support({13}) = 3/4 = 0.75 or 75% From these, we can see that the rules {11} -> {13}, {12} -> {13}, and {13} -> {12} have a confidence greater than 60%, so they are considered strong rules. So, the strong rules for this dataset with a minimum confidence of 60% are: 1. {11} -> {13} 2. {12} -> {13} 3. {13} -> {12} C. let's consider the rule {12} -> {13} with a confidence of 75%. Suppose we are talking about a bookstore where items 11, 12, 13, etc., represent different genres of books. Item 12 might be 'Science Fiction' books and item 13 might be 'Fantasy' books. So, the association rule {12} -> {13} can be interpreted as: "75% of the time when a customer buys a 'Science Fiction' book (12), they also buy a 'Fantasy' book (13)." This means that 3 out of 4 customers who bought a science fiction book also picked up a fantasy book. The bookstore could use this information to make recommendations to customers who buy science fiction books, suggesting they might also enjoy some fantasy books. This kind of information could also be used in targeted marketing, store layout design, or to create bundle deals to promote sales.