Search for question
Question

Question 3

The data loaded below are sampled from IPUMS, https://ipums.org/, an interface for

accessing survey and census data. These are drawn from U.S. Census microdata in a way

that approximates a simple random sample from Colorado households in 2017 that are

headed by unmarried men and a simple random sample from Colorado household in 2017

that are headed by unmarried women.

Steven Ruggles, Katie Genadek, Ronald Goeken, Josiah Grover, and Matthew Sobek.

Integrated Public Use Microdata Series: Version 6.0 [dataset]. Minneapolis: University of

Minnesota, 2015. http://doi.org/10.18128/D010.V6.0.

The cases with HHTYPE equal to 2 make up the sample of male-headed households. The

cases with HHTYPE equal to 3 make up the sample of female-headed households./nHHINCOME

3.a.

(5 points)

Are the household incomes for the male-headed households approximately Normally

distributed? Are the household incomes for the female-headed households approximately

Normally distributed? Please provide visualizations to support your response./n3.b.

(5 points)

Please carry out a Mann-Whitney U-test on the two data sets, the household incomes for

the male-headed households and the household incomes for the female-headed

households.

What can you conclude from the results? In particular, can this test be interpreted as a test

of center in this case?/n3c.

(0 points)

The code below carries out a bootstrap test of the difference in means of the household

incomes for the male-headed households and the household incomes for the female-headed

households. Please study this and be prepared to ask questions about it in class.

Basic bootstrap samples are samples with replacement of cases from the data. They are

used to estimate confidence intervals on statistics non-parametrically.

A data vector s defines an empirical probability distribution as follows. The sample space is

the set of distinct values in s. The set of events is the power set of the sample space. The

probability function is defined by the density function f(s) = where k is the number of

occurrences of the values in s and n is the length of s.

If the empirical distribution is close to the population distribution, then a bootstrap sample

from the empirical distribution simulates a new sample. Computing the range of the

statistic of interest for a large number of bootstrap samples gives an indication of the range

of values that would be produced if the population actually was resampled.

Fig: 1

Fig: 2

Fig: 3

Fig: 4