Search for question
Question

Question 6

(This relies on the file iris.csv that can be found in LumiNUS. Use that file to avoid problems associated with different versions.)

p-norms are used to measure the distance between multi-dimensional data points and the origin. For a n-dimensional data point x = (X₁, X2, ..... Xn), the p-norm is

given by:

11/p

n

ΣX₂²

k=1

See:

https://en.wikipedia.org/wiki/Lp_space#The_p-norm_in_finite_dimensions

Here we will, for each type of flower (setosa, versicolor, and virginica), measure the distance between each data point in the 1-norm, 2-norm, and 3-norm from

the mean of each of the factors: Sepal Length, Sepal Width, Petal Length, and Petal Width. So each data point is in 4-dimensions, and the distance from each

data point from the mean from is in 4-dimensions

The number of data points where the (component-wise) difference of the data point from the mean for its flower type has a p-norm less than or equal to 1.5 is:

Type of Flower \p

setosa

FLAG QUESTION

versicolor

virginica

2

FYI: Depending on which items were selected for this assignment in this semester, there may or may not be another question that tells you do do the exact

same thing for with a different threshold. So create your visual accordingly.

Note: The 1-norm is the Manhattan distance, which is quite relevant in transportation operations in cities. The 2-norm is the usual straight line distance. In

analytics work, it is common to generalise well known metrics.

Fig: 1