Search for question
Question

This problem set uses 2019 data primarily for Denver county accessed through IPUMS-

USA, University of Minnesota, www.ipums.org,

Steven Ruggles, Sarah Flood, Ronald Goeken, Josiah Grover, Erin Meyer, Jose Pacas and

Matthew Sobek. IPUMS USA: Version 7.0 [dataset]. Minneapolis, MN: IPUMS, 2020.

https://doi.org/10.18128/D010.V10.0

The PUMA-to-county restriction was done using MABLE,

http://mcdc.missouri.edu/websas/geocorr12.html

This problem set uses a subsample of demographic data for Denver.

The sample was drawn according to the values in the variable "perwt". This is a weight

value provided by the US Census Bureau to correct for differences between the sampled

population and the target population. It is called a sample weight or an expansion weight.

It can be thought of as the number of people in Colorado that the one observation

represents in terms of demographic characteristics. For example, if you add all the weights

in the original sample for all of Colorado, you will get an approximation of the population of

Colorado in the sample year. If you multiply the "age" variable by "perwt" then divide by

the sum of the "perwt" values, you will get an approximation of the average age in the state,

whether or not the ages of the cases are present in the same proportion in the sample as in

the population./nThe category "educ"=7 corresponds to 1 year of college. The category "educ"=10

corresponds to 4 years of college.

Samples of size 40 are drawn from the responses with "educ"=7 and with "educ"=10

according to the weights in the data set and saved in "dat_7_10.RData".

IPUMS Data

Read in the subsample of the IPUMS data.

load("dat_7_10. RData")

2.a

(10 points)

Please run and interpret a Mann-Whitney U test comparing "incwage" for the observations

with "educ" equal to 7 and with "educ" equal to 10. In your interpretation, please consider

the case in which you treat the distributions of the two populations as related by

translation and the case in which you don't make this assumption.

ggplot (dat.7.10, aes (x-incwage, color-factor (educ)))+geom_density()/ndensity

1.5e-05-

1.0e-05-

5.0e-06-

0.0e+00-

Oe+00 1e+05 2e+05 3e+05 4e+05 5e+05

incwage

factor(educ)

7

10

2.b

(10 points)

Please run a Mann-Whitney U test comparing log(incwage) for the observations with

"educ" equal to 7 and with "educ" equal to 10 and compare to the result in part a. Please

explain what you observe about the two tests.

Fig: 1

Fig: 2

Fig: 3