Search for question
Question

In this problem we will compare the performance of traditional least squares, ridge regression, and the LASSO on a real-world dataset. We will use the "Boston House Prices" dataset which contains the median sale price (as of some point in the 1970's, when the dataset was created) of owner occupied homes in about 500 different neighborhoods in the Boston a rea, a long with 13 features for each home that might be relevant. These features include factors such as measures of the crime rate; measures of school quality; various measures of density; proximity to things like highways, major employment centers, the Charles River; pollution levels; etc.¹

To judge the quality of each approach, split the dataset into a training set and a testing set. The training set should consist of 400 observations, and use the remaining observations for testing. Before training any of these algorithms, it is a good idea to "standardize" the data. By this, I mean that you should take each feature (i.c., cach column of the matrix X) and subtract off its mean and divide by the standard deviation to make it zero mean and unit variance. Otherwise, the regularized methods will implicitly be placing bigger penalties on using features which just happen to be scaled to have small variance. You should determine how to "standardize" your training

data by appropriately shifting/scaling cach feature using only the training data, and then apply this transformation to both the training data and the testing data so that your learned function can readily be applied to the test set.

1. First, I would like you to evaluate the performance of least squares. You should implement this yourself using the equation we derived in class. Report the performance of your algorithm in terms of mean-squared error on the test set, i.c.,

2. Next, using the formula derived in class, implement your own version of ridge regression. You will need to set the free parameter A. You should do this using the training data in whatever manner you like (c.g., via a holdout set) - but you should not allow the testing dataset to influence your choice of A. Report the value of A selected and the performance of your algorithm in terms of mean-squared error on the test set.

3. Finally, I would like you to evaluate the performance of the LASSO. You do not need to

implement this yourself. Instead, you can use scikit-learn's built in solver via

3 reg.fit (Xtrain, ytrain)

4 reg.predict (Xtest)

1 from sklearn import linear_model

2 reg linear_model. Lasso (alpha = ???) # Fill in alpha

Above, alpha corresponds to the A parameter from the lecture notes. As in part (b), you will

need to do something principled to choose a good value for this parameter. Report the value

of alpha used in your code, the performance of your algorithm in terms of mean-squared

error, and the number of nonzeros in 9. (You can get via reg. coef..)