instructions the objective of computer assignments is to learn how to
Search for question
Question
Instructions:
The objective of computer assignments is to learn how to analyze real data. They are not
necessarily difcult, but can be time intensive because it requires careful coding, especially
if you are not familiar with Stata. So plan ahead. NO late work will be accepted. You
may discuss the questions with your classmates, but you are required to hand in your own
independently digested written solutions, do-fles, and log-fles. (USE STATA).
What to hand in:
(1) Write-up answering the assigned questions
(2) Do-file
(3) Log-fle (must be in a .smcl format).
In this assignment, you will use the Annual Social and Economic Supplement (ASEC) of the Current
Population Survey (CPS) downloaded from IPUMS. The CPS is one of the most important data on labor
force characteristics of the U.S. population. The government relies on the CPS to report the national
unemployment rate to public. For example, the government's unemployment rates reported in media is
based on the CPS. The U.S. Census Bureau and U.S. Bureau of Labor Statistics collect the basic data every
month, but in March, they collect more detailed information than other months. We will use the ASEC of
the CPS from 1990, 2000, 2010 and 2020. You will only use randomly selected 10 percent of the sample
from each year. [Hint: Use sample command in Stata.]. From IPUMS CPS, download educ, sex, empstat,
labforce, uhrsworkly, wkswork1, incwage, age, cpi99.
1. Create categorical variables that indicates less than high school, high school, some college, college,
and more than college. [Hint: Read the codebook carefully. The values for educ changes and the values
for respondents not in universe should be coded as missing. Cross tabulate the variables to check your
work.] Report the share of individuals in each group by year using a table and briefy describe the
fndings. You do not need to make tables using Stata.
a. Choose and make appropriate graphs to report the results in addition to the table. 2. Generate hourly earnings using the variables you downloaded. For each year, report the sample
mean, median, and standard deviation for hourly earning. [Hint: Be careful with the values and how
IPUMS CPS reports missing values.]
3. You want to explore the gender earnings gap. Write down the econometric model that will measure
the average hourly earning diferences between male and female. [Hint: This is a theoretical model. It
should be written in a proper format to receive full credit.]
4. Estimate the model above separately by survey year. Interpret the coefcient. Is the gender gap
statistically signifcant? How did the diference change over time?
5. The dollar amounts in each survey year are nomial. Using the IPUMS variable cpi99, convert the dollar
amounts to 1999 dollars and re-estimate the gender earnings gap with the age and education controls.
[Hint: https://cps.ipums.org/cps/cpi99.shtml] How did your result change and why was this step
important? Discuss briefy.
a. Convert to 2020 dollar and re-estimate the model. Briefy discuss the diference.
6. Write down the econometric model where your coefcient will give you a percentage diference in
gender gap.
7. Estimate your model above separately by survey year and report the results in a table format. Use the
hourly wage measured in 1999 real dollar. Interpret the coefcient on female dummy variable. Are the
coefcients statistically signifcant? How did the gender gap change over time?
a. Calculate the exact percentage diference in the gender gap in earnings and add this to the table
above. [Hint: You can do this in Stata using display command. But you can manually calculate using a
calculator and report the results.
8. Construct 99% confdence interval for the female dummy variables. [Hint: The default in Stata is 95%
confdence interval. You need to change this.]
9. Add age, age squared, and education categories you created above to your regression model. Report
the results in a table format. How did the gender gap change? Discuss your results.
10. Interpret the coefcient on college. [Hint: Which category is the omitted group?] 11. Calculate the heteroskedasticity robust standard error. Did the estimates on coefcients from
question #9 change? How about the standard errors? When do we need to calculate the
heteroskedasticity robust standard error?
12. Is the coefcient on college sufer from omitted variable bias? Why or why not? If you argue that it
sufers from bias, what is the direction of bias?