in this exercise we are going to generate artificial data in order to

Question

. In this exercise we are going to generate artificial data in order to understand the properties of the OLS estimator. Consider the following linear model: Y_{i}=\beta_{0}+\beta_{1} X_{i}+u_{i} where Bo = 3 and 31 = 1.7. Fix the sample size to N = 100. Using R, Generate a random sample of size N where each X; comes from an independent normal distribution with mean equal to 3 and variance equal to 4. Similarly, generate a random sample of size N where each U¡ comes from an independent normal distribution with mean 0 and variance equal to 1. Use the linear model above to compute the values of Y; starting from the artificially generated values {X;, u;}. At the end of the process, you have a random sample of size N = 100 of observations {Yi, X¡}. a) Produce a scatter plot of Y; against X;. Do you observe any relationship between the two variables? If so, is it surprising? b) Does assumption 1 of the OLS hold in your sample? Prove it formally. c) Compute the OLS estimate of ß1 for your sample (remember ß1 = $XY). What value doyou obtain? What is the error of the OLS estimator (i.e. B1 – B1)? d) What is the source of the error found in point (c)? Can you get rid of it? e) Generate other 200 random samples of size N from the same model. For each one,estimate 31 with OLS and save the value obtained. Plot the distribution of 31 across the200 samples using a histogram (make sure you have at least 10 bins in your histogram).Can you use that histogram as a way to think about the sampling distribution of B,?Where does the dispersion in the estimated values of B1 come from? f) Does B1 appear to be unbiased? If so, why? g) Now, repeat point (e) increasing the sample size from N = 100 to N = 1000. What happens to the distribution of ß1 with a larger sample size? Comment on what happens to its mean and to its variance. h) Are the findings in point (g) related to the consistency property of the OLS estimator? i) Prove formally that the OLS estimator of B1 is consistent when the data are generated in the same way as we did in this exercise.