(This relies on the file iris.csv that can be found in LumiNUS. Use that file to avoid problems associated with different versions.)
p-norms are used to measure the distance between multi-dimensional data points and the origin. For a n-dimensional data point x = (X₁, X2, ..... Xn), the p-norm is
given by:
11/p
n
ΣX₂²
k=1
See:
https://en.wikipedia.org/wiki/Lp_space#The_p-norm_in_finite_dimensions
Here we will, for each type of flower (setosa, versicolor, and virginica), measure the distance between each data point in the 1-norm, 2-norm, and 3-norm from
the mean of each of the factors: Sepal Length, Sepal Width, Petal Length, and Petal Width. So each data point is in 4-dimensions, and the distance from each
data point from the mean from is in 4-dimensions
The number of data points where the (component-wise) difference of the data point from the mean for its flower type has a p-norm less than or equal to 1.5 is:
Type of Flower \p
setosa
FLAG QUESTION
versicolor
virginica
2
FYI: Depending on which items were selected for this assignment in this semester, there may or may not be another question that tells you do do the exact
same thing for with a different threshold. So create your visual accordingly.
Note: The 1-norm is the Manhattan distance, which is quite relevant in transportation operations in cities. The 2-norm is the usual straight line distance. In
analytics work, it is common to generalise well known metrics.
Fig: 1