Question

5.4 Exercise, Problem 8

8. We will now perform cross-validation on a simulated data set.

(a) Generate a simulated data set as follows:

> set. seed (1)

> x=rnorm (100)

> y=x-2*x^2+ morm (100)

In this data set, what is n and what is p? Write out the model used to generate the data in equation

form.

(b) Create a scatterplot of X against Y. Comment on what you find.

(c) Set a random seed, and then compute the LOOCV errors that result from fitting the following four

models using least squares:

1. Y=BO+B1X + E

ii. Y = BO+B1X + 32X2 + €

iii. Y = BO+B1X + B2X2 + B3X3 + ε

iv. Y = 30 + B1X + 32X2 + B3X3 + B4X4 + ε.

Note you may find it helpful to use the data.frame() function to create a single data set containing

both Xand Y.

(d) Repeat (c) using another random seed and report your results. Are your results the same as what

you got in (c)? Why?

(e) Which of the models in (c) had the smallest LOOCV error? Is this what you expected? Explain your

answer.

https://wustlinstructure.com/courses/111056/assignments/559857

2/3

2023/10/18 17:46

Individual Assignment 1

(f) Comment on the statistical significance of the coefficient estimates that results from fitting each of

the models in (c) using least squares. Do these results agree with the conclusions drawn based on the

cross-validation results?

Question image 1