To judge the quality of each approach, split the dataset into a training set and a testing set. The training set should consist of 400 observations, and use the remaining observations for testing. Before training any of these algorithms, it is a good idea to "standardize" the data. By this, I mean that you should take each feature (i.c., cach column of the matrix X) and subtract off its mean and divide by the standard deviation to make it zero mean and unit variance. Otherwise, the regularized methods will implicitly be placing bigger penalties on using features which just happen to be scaled to have small variance. You should determine how to "standardize" your training
data by appropriately shifting/scaling cach feature using only the training data, and then apply this transformation to both the training data and the testing data so that your learned function can readily be applied to the test set.
1. First, I would like you to evaluate the performance of least squares. You should implement this yourself using the equation we derived in class. Report the performance of your algorithm in terms of mean-squared error on the test set, i.c.,
2. Next, using the formula derived in class, implement your own version of ridge regression. You will need to set the free parameter A. You should do this using the training data in whatever manner you like (c.g., via a holdout set) - but you should not allow the testing dataset to influence your choice of A. Report the value of A selected and the performance of your algorithm in terms of mean-squared error on the test set.
3. Finally, I would like you to evaluate the performance of the LASSO. You do not need to
implement this yourself. Instead, you can use scikit-learn's built in solver via
3 reg.fit (Xtrain, ytrain)
4 reg.predict (Xtest)
1 from sklearn import linear_model
2 reg linear_model. Lasso (alpha = ???) # Fill in alpha
Above, alpha corresponds to the A parameter from the lecture notes. As in part (b), you will
need to do something principled to choose a good value for this parameter. Report the value
of alpha used in your code, the performance of your algorithm in terms of mean-squared
error, and the number of nonzeros in 9. (You can get via reg. coef..)