The data loaded below are sampled from IPUMS, https://ipums.org/, an interface for
accessing survey and census data. These are drawn from U.S. Census microdata in a way
that approximates a simple random sample from Colorado households in 2017 that are
headed by unmarried men and a simple random sample from Colorado household in 2017
that are headed by unmarried women.
Steven Ruggles, Katie Genadek, Ronald Goeken, Josiah Grover, and Matthew Sobek.
Integrated Public Use Microdata Series: Version 6.0 [dataset]. Minneapolis: University of
Minnesota, 2015. http://doi.org/10.18128/D010.V6.0.
The cases with HHTYPE equal to 2 make up the sample of male-headed households. The
cases with HHTYPE equal to 3 make up the sample of female-headed households./nHHINCOME
3.a.
(5 points)
Are the household incomes for the male-headed households approximately Normally
distributed? Are the household incomes for the female-headed households approximately
Normally distributed? Please provide visualizations to support your response./n3.b.
(5 points)
Please carry out a Mann-Whitney U-test on the two data sets, the household incomes for
the male-headed households and the household incomes for the female-headed
households.
What can you conclude from the results? In particular, can this test be interpreted as a test
of center in this case?/n3c.
(0 points)
The code below carries out a bootstrap test of the difference in means of the household
incomes for the male-headed households and the household incomes for the female-headed
households. Please study this and be prepared to ask questions about it in class.
Basic bootstrap samples are samples with replacement of cases from the data. They are
used to estimate confidence intervals on statistics non-parametrically.
A data vector s defines an empirical probability distribution as follows. The sample space is
the set of distinct values in s. The set of events is the power set of the sample space. The
probability function is defined by the density function f(s) = where k is the number of
occurrences of the values in s and n is the length of s.
If the empirical distribution is close to the population distribution, then a bootstrap sample
from the empirical distribution simulates a new sample. Computing the range of the
statistic of interest for a large number of bootstrap samples gives an indication of the range
of values that would be produced if the population actually was resampled.
Fig: 1
Fig: 2
Fig: 3
Fig: 4