Question

In this project you will implement gradient descent for linear regression on Spark using Scala. The gradient descent update for linear regression is: Wi+1 = Wi · αἱ Σ(wx; -

yj)x; - Part 1 (20 points) First, implement a function that computes the summand (w¹x - y)x,, and test this function on two examples. Use (Vectors) to create dense vectors w and use (Labeled Point) to create training dataset with 3 features. You can also use (Breeze) to do the dot product. Part 2 (20 points) Implement a function that takes in vector w and an observation's Labeled Point and returns a (label, prediction) tuple. Note that we can predict by computing the dot product between weights and an observation's features. Test this function on a Labeled Point RDD. Part 3 (20 points) Implement a function to compute (RMSE) given an RDD of (label, prediction) tuples: Test this function on an example RDD. RMSE = -18 n Σ(vi - 1/2 i=1 Part 4 (40 points) Implement a gradient descent function for linear regression: The function will take trainData (RDD of Labeled Point) as an argument and return a tuple of weights and training errors. Reuse the code that you have written in Part 1 and 2. Initialize the elements of vector w = 0 and a = 1. Update the value of a in ith iteration using the formula: Wi+1 = Wi-ai (w/ x; -yj)xj Bonus (20 points) Implement the closed form solution: Test the function on and example RDD. Run it for 5 iterations and print the results. You can assume X is a (DenseMarix). α₂ = α n√i. w = (x¹x)¯¹x²

Fig: 1