a bias model 1 has higher bias as it assumes a linear relationship bet
Search for question
Question
a. Bias: Model 1 has higher bias as it assumes a linear relationship between X and Y. Model
2 has lower bias due to its flexibility to fit higher-degree relationships.
b. Variance: Model 1 has lower variance because of its simplicity. Model 2 has higher
variance due to its complexity and sensitivity to changes in the training set.
c. Likelihood of Overfitting: Model 1 is less likely to overfit because it's simpler. Model 2,
being more complex, has a higher likelihood of overfitting, particularly if the training data is
not large enough.
Q2.
a. The percentage of the data clustered
We anticipate a drop in the proportion of clustered data if MinPoints is raised from 6 to 8
while maintaining the same epsilon value. This is so that the requirements for a point to be
deemed a core point are stricter as a result of raising the MinPoints value. In other words,
fewer points will satisfy this requirement and be a member of a cluster since more points in
the epsilon radius are required.
b. The number of clusters
The number of clusters should either go down or keep the same, but not rise. Higher
MinPoints may cause smaller clusters to disintegrate and its points to either become a part
of bigger clusters or be classed as outliers since fewer points will be considered core points.
C.
The percentage of the data labeled as outliers
We anticipate that the aforementioned modifications will lead to an increase in the
proportion of data classified as outliers. More points will fall short of the requirements and be
classified as noise or outliers when the requirements for being a core point or a border point
get stricter (as a result of the increase in MinPoints).
Q3.
Iteration 1
C1A, C2=C
Points.
Distance C1.
Distance C2. Assigment
A
0.
√2
C1
B
1.
√5
C1
C
√2
C2
DE
√10
√8
C2
Е
√20
√2
C2
updated centroid C1 = (1, 0.5)
=
C2 (1.66, 3.66)
Iteration 2
C1 = (1, 0.5)
C2 (1.66, 3.66)
=
Points.
ABCDE
Distance C1.
√0.25
√0.25
√3.25
√5.51
√13.25
√0.23
√24.25
√3.59
√7.51
√13.83
77
Distance C2. Assigment
C1
C1
C1
C2
C2
updated centroid
C1 = (0.66, 1)
C2 = (2.5, 4.5)
Iteration 3
C1 = (0.66, 1)
C2 = (2.5, 4.5)
Points.
ABCDE
Distance C1.
√0.115
Distance C2. Assigment
√14.5
C1
√1.115
√22.5
C1
√1.435
√12.5
C1
√10.79
√0.5
C2
√21.47
√0.5
C2
updated centroid
C1 = (0.66, 1)
C2 = (2.5, 4.5)
Iteration 4
C1 = (0.66, 1)
C2 = (2.5, 4.5)
Points.
ABCDE
Distance C1.
Distance C2. Assigment
√O.115
√14.5
C1
√1.115
√22.5
C1
√1.435
√12.5
C1
√10.79
√0.5
C2
√21.47
√0.5
C2
updated centroid
C1 = (0.66, 1)
C2 = (2.5, 4.5)
repeat 3 and 4 as the clusters wont change anymore point
a.
b.
SC
Q4.
(0,0). 3
6.5
0.538
(0,1). 2.5
5.5
0.545
(2,3). 4.5 1.5
-0.667
(3,3). 4
1
-0.75
(3,4).
5
1
-0.8
Q5.
a.
Step 1:
Take K = 1
Creating a table containing support count of each item present in dataset - Called
C1(candidate set)
Itemset
Sup_count
11
2
12
4
2 3 4
4
1
Removing rows where the min_support is below 2. This is our set of frequent 1-itemsets. (C1
final)
Itemset
Sup_count
11
13
123
2
12
4
4
Step 2:
Take K = 2
-
Creating a table containing support count of 2 items present in dataset – Called
C2(candidate set)
Itemset
Sup_count
12, 14 1
12, 13 3
11, 12 1
11, 13
2
Removing rows where the min_support is below 2. This is our set of frequent 2-itemsets.(C2
final)
Itemset
12, 13 3
11, 13
2
Sup_count Step 3:
Take K = 3
Creating a table containing support count of 3 items present in dataset - Called
C3(candidate set)
Itemset
11, 12, 13
Sup_count
1
Removing rows where the min_support is below 2.This is our set of frequent 3-itemsets. (C3
final)
None
Therefore, the frequent itemsets are {11}, {12}, {13}, {11, 13}, and {12, 13}.
b.
From the frequent itemsets, we can generate the following association rules:
1. {11} -> {13}
2. {13} -> {11}
3. {12} -> {13}
4. {13} -> {12}
To find the strong rules, we calculate the confidence of each rule.
Confidence({11} -> {13}) = Support({11, 13}) / Support({11}) = 2/2 = 1 or 100%
Confidence({13} -> {11}) = Support({11, 13}) / Support({13}) = 2/4 = 0.5 or 50%
Confidence({12} -> {13}) = Support({12, 13}) / Support({12}) = 3/4 = 0.75 or 75%
Confidence({13} -> {12}) = Support({12, 13}) / Support({13}) = 3/4 = 0.75 or 75%
From these, we can see that the rules {11} -> {13}, {12} -> {13}, and {13} -> {12} have a
confidence greater than 60%, so they are considered strong rules.
So, the strong rules for this dataset with a minimum confidence of 60% are:
1. {11} -> {13}
2. {12} -> {13}
3. {13} -> {12}
C.
let's consider the rule {12} -> {13} with a confidence of 75%.
Suppose we are talking about a bookstore where items 11, 12, 13, etc., represent different
genres of books. Item 12 might be 'Science Fiction' books and item 13 might be 'Fantasy'
books.
So, the association rule {12} -> {13} can be interpreted as: "75% of the time when a customer buys a 'Science Fiction' book (12), they also buy a
'Fantasy' book (13)."
This means that 3 out of 4 customers who bought a science fiction book also picked up a
fantasy book. The bookstore could use this information to make recommendations to
customers who buy science fiction books, suggesting they might also enjoy some fantasy
books. This kind of information could also be used in targeted marketing, store layout
design, or to create bundle deals to promote sales.