Problem 8: In this exercise, we will generate simulated data, and will then use this data to
perform best subset selection.
(a) Use the morm() function to generate a predictor X of length n=100, as well as a noise vector of
length n=100.
(b) Generate a response vector Y of length n=100 according to the model
Y=BO+B1X+82X2+B3X3+€
Where 30, 31, 32, and 83 are constants of your choice.
(c) Use the regsubsets() function to perform best subset selection in order to choose the best model
containing the predictors X,X2,...,X10. What is the best model obtained according to Cp, BIC, and
adjusted R2? Show some plots to provide evidence for your answer, and report the coefficients of the
best model obtained. Note you will need to use the data.frame() function to create a single data set
containing both X and Y.
(d) Repeat (c), using forward stepwise selection and also using backwards stepwise selection. How
does your answer compare to the results in (c)?
Fig: 1