Search for question
Question

Instructions: The objective of computer assignments is to learn how to analyze real data. They are not necessarily difcult, but can be time intensive because it requires careful coding, especially if you are not familiar with Stata. So plan ahead. NO late work will be accepted. You may discuss the questions with your classmates, but you are required to hand in your own independently digested written solutions, do-fles, and log-fles. (USE STATA). What to hand in: (1) Write-up answering the assigned questions (2) Do-file (3) Log-fle (must be in a .smcl format). In this assignment, you will use the Annual Social and Economic Supplement (ASEC) of the Current Population Survey (CPS) downloaded from IPUMS. The CPS is one of the most important data on labor force characteristics of the U.S. population. The government relies on the CPS to report the national unemployment rate to public. For example, the government's unemployment rates reported in media is based on the CPS. The U.S. Census Bureau and U.S. Bureau of Labor Statistics collect the basic data every month, but in March, they collect more detailed information than other months. We will use the ASEC of the CPS from 1990, 2000, 2010 and 2020. You will only use randomly selected 10 percent of the sample from each year. [Hint: Use sample command in Stata.]. From IPUMS CPS, download educ, sex, empstat, labforce, uhrsworkly, wkswork1, incwage, age, cpi99. 1. Create categorical variables that indicates less than high school, high school, some college, college, and more than college. [Hint: Read the codebook carefully. The values for educ changes and the values for respondents not in universe should be coded as missing. Cross tabulate the variables to check your work.] Report the share of individuals in each group by year using a table and briefy describe the fndings. You do not need to make tables using Stata. a. Choose and make appropriate graphs to report the results in addition to the table. 2. Generate hourly earnings using the variables you downloaded. For each year, report the sample mean, median, and standard deviation for hourly earning. [Hint: Be careful with the values and how IPUMS CPS reports missing values.] 3. You want to explore the gender earnings gap. Write down the econometric model that will measure the average hourly earning diferences between male and female. [Hint: This is a theoretical model. It should be written in a proper format to receive full credit.] 4. Estimate the model above separately by survey year. Interpret the coefcient. Is the gender gap statistically signifcant? How did the diference change over time? 5. The dollar amounts in each survey year are nomial. Using the IPUMS variable cpi99, convert the dollar amounts to 1999 dollars and re-estimate the gender earnings gap with the age and education controls. [Hint: https://cps.ipums.org/cps/cpi99.shtml] How did your result change and why was this step important? Discuss briefy. a. Convert to 2020 dollar and re-estimate the model. Briefy discuss the diference. 6. Write down the econometric model where your coefcient will give you a percentage diference in gender gap. 7. Estimate your model above separately by survey year and report the results in a table format. Use the hourly wage measured in 1999 real dollar. Interpret the coefcient on female dummy variable. Are the coefcients statistically signifcant? How did the gender gap change over time? a. Calculate the exact percentage diference in the gender gap in earnings and add this to the table above. [Hint: You can do this in Stata using display command. But you can manually calculate using a calculator and report the results. 8. Construct 99% confdence interval for the female dummy variables. [Hint: The default in Stata is 95% confdence interval. You need to change this.] 9. Add age, age squared, and education categories you created above to your regression model. Report the results in a table format. How did the gender gap change? Discuss your results. 10. Interpret the coefcient on college. [Hint: Which category is the omitted group?] 11. Calculate the heteroskedasticity robust standard error. Did the estimates on coefcients from question #9 change? How about the standard errors? When do we need to calculate the heteroskedasticity robust standard error? 12. Is the coefcient on college sufer from omitted variable bias? Why or why not? If you argue that it sufers from bias, what is the direction of bias?