Question

Part 2 REGRESSION ANALYSIS (a) Run a regression to determine the impact of the 2013 unemployment rate (UnempRate2013) on the per capita income (PerCapitalne) in a county. What is the estimated

slope? Explain what this number means in words in terms of the unemployment rate and in terms of per capita income. Also indicate if the relationship is statistically significant at the 10%, 5%, and 1% levels. For this first pass, use homoskedastic standard errors. (b) Re-run the regression from part (a) but this time use heteroskedastic standard errors. Are your coefficients the same as in part (a)? Why? Are your standard errors (of your betas) the same as in part (a)? Why? (c) Run the same regression as in part (b) but now also include the following additional regressors: percentage of the population that is college-educated (Ed5CollegePlusPct), percentage of the population that is black (BlackNonHispanic Pct 2010), and percentage of the population that is Hispanic (Hispanic Pct 2010. Now, what is the estimated impact of unemployment rate in 2013 on per capita income? Also indicate if the relationship is statistically significant at the 10%, 5%, and 1% levels? Make sure that you are using heteroskedastic standard errors. (d) Provide economic/econometric intuition as to why the impact of the unemployment rate's impact on per capita income changed between parts (b) and (c). Note that I am asking you to think about the context (and hence the "story" behind these data). (e) Construct a 95% confidence interval for the slope coefficient on UnempRate2013 found in Part 2(c). Write out your calculations. Clearly indicate how this confidence interval relates to whether UnempRate 2013 is statistically significant or not in this context by relating your answer to your constructed confidence interval. (f) You recall from Part 1 that both the means of per capita income and of unemployment rate in 2013 are quite different across metro and nonmetro areas. You therefore want to explore this in more detail. Run the regression from Part 2(c) using only metro areas in 2013 (i.e., Metro2013--1). [Hint: You need to restrict the data based on a criterion before running the regression.] Now, what is the estimated effect of the 2013 unemployment rate on per capita income and also indicate if the relationship is statistically significant at the 10%, 5%, and 1% levels? Make sure that you are using heteroskedastic standard errors. (g) Now, run the regression from Part 2(c) using only non-metro areas in 2013 (Metro2013--0). [Hint: You need to restrict the data based on a criterion before running the regression]. Now, what is the estimated effect of the 2013 unemployment rate on per capita income and also indicate if the relationship is statistically significant at the 10%, 5%, and 1% levels? Make sure that you are using heteroskedastic standard errors. (h) What did you learn from the comparison between results in parts (f) and (g)? Explain your answer. Note that I again am asking you to think about the context (and hence the "story" behind these data). (i) Return to the full sample. Now, run a regression to determine the impact of changing the percentage of the population which is college educated (Ed5CollegePlusPct) on the per capita income (PerCapitalne) in a county. Include controls for the unemployment rate in 2013 (UnempRate2013), percentage of the population that is black (BlackNonHispanicPet2010), percentage of the population that is Hispanic (HispanicPet2010) and now also include a dummy variable for metro status (Metro2013). Now, what is the estimated impact of percentage with a college education on per capita income? Also indicate if the relationship is statistically significant at the 10%, 5%, and 1% levels? Make sure that you are using heteroskedastic standard errors. (j) It is quite common in econometrics to model income variables nonlinearly. Construct a new variable and call it "logine" or whatever you prefer, where logine-In (PerCapitalne). Provide summary statistics for this new variable. (Hint: Think back to how you constructed summary statistics in Part 1.) (k) Now run a regression model with logine as the dependent variable (and we are also going to start controlling for metro status in addition to the other controls). In other words, the control variables are unemployment rate in 2013 (UnempRate2013) as the main regressor, while also including the other regressors: percentage college educated (Ed5CollegePlusPct), percentage non-Hispanic black in 2010 (BlackNon HispanicPet2010), percentage Hispanic in 2010 (HispanicPct 2010), and metro status in 2013 (Metro2013). Now, what is the estimated effect of UnempRate 2013 in words? Also indicate if the relationship is statistically significant at the 10%, 5%, and 1% levels? Make sure that you are using heteroskedastic standard errors. [Careful not to leave out any variables in your regression specification in STATA] (1) What is the null hypothesis corresponding to the F-statistic as reported in the output for the regression in part (k)? What is the conclusion of the reported F-test? Explain (i.e. Do you reject or fail to reject the stated null hypothesis above and how do you know this?) (m) Construct a 95% confidence interval for the slope coefficient on UnempRate2013 in Part 2(k). As usual, write out your calculations. Clearly indicate how this confidence interval relates to whether UnempRate2013 is statistically significant or not in this context by relating your answer to your constructed confidence interval. (n) Discuss what the standard error of the regression (SER), R-squared and adjusted R-squared in part (k) are telling you in terms of the numbers that you have found. Using what you know about the difference between the two formulas, explain specifically why the R² and R² statistics so similar for this case. (0) Use an F-test to test the joint significance of the additional regressors: Ed5CollegePlus, BlackNon- Hispanic Pct 2010, Hispanic Pct 2010, and Metro2013. Find this test statistic and clearly indicate the conclusions of the test. (p) If you had more time to study this question and/or more or different data, what would you suggest doing next? Propose additional variables to add and/or different specifications to try and give specific reasons why you are suggesting these. Answers will vary for this part of the problem.