(a) Run a regression to determine the impact of the 2013 unemployment rate (UnempRate2013) on the per
capita income (PerCapitalne) in a county. What is the estimated slope? Explain what this number
means in words in terms of the unemployment rate and in terms of per capita income. Also indicate
if the relationship is statistically significant at the 10%, 5%, and 1% levels. For this first pass, use
homoskedastic standard errors.
(b) Re-run the regression from part (a) but this time use heteroskedastic standard errors. Are your
coefficients the same as in part (a)? Why? Are your standard errors (of your betas) the same as in part
(a)? Why?
(c) Run the same regression as in part (b) but now also include the following additional regressors: percentage
of the population that is college-educated (Ed5CollegePlusPct), percentage of the population that is
black (BlackNonHispanic Pct 2010), and percentage of the population that is Hispanic (Hispanic Pct 2010.
Now, what is the estimated impact of unemployment rate in 2013 on per capita income? Also indicate
if the relationship is statistically significant at the 10%, 5%, and 1% levels? Make sure that you are
using heteroskedastic standard errors.
(d) Provide economic/econometric intuition as to why the impact of the unemployment rate's impact on
per capita income changed between parts (b) and (c). Note that I am asking you to think about the
context (and hence the "story" behind these data).
(e) Construct a 95% confidence interval for the slope coefficient on UnempRate2013 found in Part 2(c). Write
out your calculations. Clearly indicate how this confidence interval relates to whether UnempRate 2013
is statistically significant or not in this context by relating your answer to your constructed confidence
interval.
(f) You recall from Part 1 that both the means of per capita income and of unemployment rate in 2013 are
quite different across metro and nonmetro areas. You therefore want to explore this in more detail. Run
the regression from Part 2(c) using only metro areas in 2013 (i.e., Metro2013--1). [Hint: You need to
restrict the data based on a criterion before running the regression.] Now, what is the estimated effect
of the 2013 unemployment rate on per capita income and also indicate if the relationship is statistically
significant at the 10%, 5%, and 1% levels? Make sure that you are using heteroskedastic standard
errors.
(g) Now, run the regression from Part 2(c) using only non-metro areas in 2013 (Metro2013--0). [Hint:
You need to restrict the data based on a criterion before running the regression]. Now, what is the
estimated effect of the 2013 unemployment rate on per capita income and also indicate if the relationship
is statistically significant at the 10%, 5%, and 1% levels? Make sure that you are using heteroskedastic
standard errors.
(h) What did you learn from the comparison between results in parts (f) and (g)? Explain your answer.
Note that I again am asking you to think about the context (and hence the "story" behind these data).
(i) Return to the full sample. Now, run a regression to determine the impact of changing the percentage of
the population which is college educated (Ed5CollegePlusPct) on the per capita income (PerCapitalne)
in a county. Include controls for the unemployment rate in 2013 (UnempRate2013), percentage of the
population that is black (BlackNonHispanicPet2010), percentage of the population that is Hispanic
(HispanicPet2010) and now also include a dummy variable for metro status (Metro2013). Now, what is
the estimated impact of percentage with a college education on per capita income? Also indicate if the
relationship is statistically significant at the 10%, 5%, and 1% levels? Make sure that you are using
heteroskedastic standard errors.
(j) It is quite common in econometrics to model income variables nonlinearly. Construct a new variable
and call it "logine" or whatever you prefer, where logine-In (PerCapitalne). Provide summary statistics
for this new variable. (Hint: Think back to how you constructed summary statistics in Part 1.)
(k) Now run a regression model with logine as the dependent variable (and we are also going to start
controlling for metro status in addition to the other controls). In other words, the control variables are
unemployment rate in 2013 (UnempRate2013) as the main regressor, while also including the other
regressors: percentage college educated (Ed5CollegePlusPct), percentage non-Hispanic black in 2010
(BlackNon HispanicPet2010), percentage Hispanic in 2010 (HispanicPct 2010), and metro status in 2013
(Metro2013). Now, what is the estimated effect of UnempRate 2013 in words? Also indicate if the
relationship is statistically significant at the 10%, 5%, and 1% levels? Make sure that you are using
heteroskedastic standard errors. [Careful not to leave out any variables in your regression specification
in STATA]
(1) What is the null hypothesis corresponding to the F-statistic as reported in the output for the regression
in part (k)? What is the conclusion of the reported F-test? Explain (i.e. Do you reject or fail to reject
the stated null hypothesis above and how do you know this?)
(m) Construct a 95% confidence interval for the slope coefficient on UnempRate2013 in Part 2(k). As
usual, write out your calculations. Clearly indicate how this confidence interval relates to whether
UnempRate2013 is statistically significant or not in this context by relating your answer to your
constructed confidence interval.
(n) Discuss what the standard error of the regression (SER), R-squared and adjusted R-squared in part (k)
are telling you in terms of the numbers that you have found. Using what you know about the difference
between the two formulas, explain specifically why the R² and R² statistics so similar for this case.
(0) Use an F-test to test the joint significance of the additional regressors: Ed5CollegePlus, BlackNon-
Hispanic Pct 2010, Hispanic Pct 2010, and Metro2013. Find this test statistic and clearly indicate the
conclusions of the test.
(p) If you had more time to study this question and/or more or different data, what would you suggest
doing next? Propose additional variables to add and/or different specifications to try and give specific
reasons why you are suggesting these. Answers will vary for this part of the problem.