Question 3 The data loaded below are sampled from IPUMS, https://ipums.org/, an interface for accessing survey and census data. These are drawn from U.S. Census microdata in a way that approximates a simple random sample from Colorado households in 2017 that are headed by unmarried men and a simple random sample from Colorado household in 2017 that are headed by unmarried women. Steven Ruggles, Katie Genadek, Ronald Goeken, Josiah Grover, and Matthew Sobek. Integrated Public Use Microdata Series: Version 6.0 [dataset]. Minneapolis: University of Minnesota, 2015. http://doi.org/10.18128/D010.V6.0. The cases with HHTYPE equal to 2 make up the sample of male-headed households. The cases with HHTYPE equal to 3 make up the sample of female-headed households./nHHINCOME 3.a. (5 points) Are the household incomes for the male-headed households approximately Normally distributed? Are the household incomes for the female-headed households approximately Normally distributed? Please provide visualizations to support your response./n3.b. (5 points) Please carry out a Mann-Whitney U-test on the two data sets, the household incomes for the male-headed households and the household incomes for the female-headed households. What can you conclude from the results? In particular, can this test be interpreted as a test of center in this case?/n3c. (0 points) The code below carries out a bootstrap test of the difference in means of the household incomes for the male-headed households and the household incomes for the female-headed households. Please study this and be prepared to ask questions about it in class. Basic bootstrap samples are samples with replacement of cases from the data. They are used to estimate confidence intervals on statistics non-parametrically. A data vector s defines an empirical probability distribution as follows. The sample space is the set of distinct values in s. The set of events is the power set of the sample space. The probability function is defined by the density function f(s) = where k is the number of occurrences of the values in s and n is the length of s. If the empirical distribution is close to the population distribution, then a bootstrap sample from the empirical distribution simulates a new sample. Computing the range of the statistic of interest for a large number of bootstrap samples gives an indication of the range of values that would be produced if the population actually was resampled.

Solved: Question 3 The data loaded below are sampled from IPUMS, https://ipums

Home
questions and answers
question 3 the data loaded below are sampled from ipums https ipums or

Question

Question 3 The data loaded below are sampled from IPUMS, https://ipums.org/, an interface for accessing survey and census data. These are drawn from U.S. Census microdata in a way that approximates a simple random sample from Colorado households in 2017 that are headed by unmarried men and a simple random sample from C

This question hasn’t solved by tutor

Solution unavailable? No problem! Generate answers instantly with our AI tool, or receive a tailored solution from our expert tutors.

Found 10 similar results for your question:

Team Paper This is a team assignment. Each team must use the R tool to use for the project. Your team will have to find a problem to solve that deals with data and data sources. The team should meet prior to the residency weekend and agree to which problem they will solve. There should be data available online to pull from. On Friday evening, teams will meet for the residency weekend and put together a one page proposal that must be reviewed and approved by the professor that states: 1. The problem to solve. 2. The data sources to pull from. 3. The tool that will be used (R) a. Note high level graphics that will be used to solve the problem and how they will be used. On Saturday, teams will reconvene and complete the following: 1. There must be a through data plan this includes: a. Where the data is online b. How you know the data is accurate and the plan for ensuring accuracy. c. An import of the data into the selected tool. 2. A paper that includes: a. The data plan mentioned above b. The problem- note the description and why it's a problem and how you are going to make a recommendation with the data presented. c. The analysis of why the data will solve the issue d. Graphical representation and formulas. The screenshots of the formulas in the tool must be present. e. A summary of the consideration and evaluation of results i. This includes your teams' final analysis of the problem and the resolution.

Download "llo Lab.zip" from Blackboard, rename it with your name and open (double click) the R project file. You run R script "llo Run.R" that contains all the code you need. The call to the function "Forecast Electric. Demand" in script "Project Functions.R" Calculation for R-squared measure. Plot the results (Note: To Plot type "p" in console) Run or debug "llot Run.R" to see how it works. (download needed packages if necessary) Note that the CSV data does not contain the day and the hour columns. In the function "Forecast Electric.Demand()" these fields are set to 1, thus the fit (r-square) is not good. This information can be extracted from the time stamp.

1. Shipments of Household Appliances: Line Graphs. The file ApplianceShipments.csv contains the series of quarterly shipments (in millions of dollars) of US household appliances between 1985 and 1989. a. Create a well-formatted time plot of the data using the ggplot2 package. Add a smoothed line to the graph. For a closer view of the patterns, zoom in to the range of 3500-5000 on the y- axis. Hint: in order to convert Quarter into a date format, use the zoo library's as.Date utility: as.Date (as. yearqtr (appship.df$Quarter,format="Q%q-%Y")). b. Does there appear to be a quarterly pattern? c. Using ggplot2 in R, create one chart with four separate lines, one line for each of Q1, Q2, Q3, and Q4. In R, this can be achieved by generating a data.frame for each quarter Q1, Q2, Q3, Q4 (use seq(1,20,4), seq (2,20,4), etc. to create indexes for different quarters), and then plotting them as separate series on the line graph. Does there appear to be a difference between quarters? Hint: For ggplot() to display the legend, the color aesthetics must be included inside the aes() specification. d. Using ggplot2, create a chart with one line of average shipments in each quarter. Hint: Use the quarter () command of the lubridate package to create a new column in the shipments data frame and use tapply to average shipments across quarters. e. Using ggplot2, create a line graph of the series at a yearly aggregated level (i.e., the total shipments in each year) and comment on what happened to shipments over years. Hint: Use the year() function of the lubridate package to extract the years the shipments data frame.

Part 1: Create an R script that computes the measures of central tendency and measures of variability and the relationships for each of the seven variables in the attitude dataset. Use the functions: var( ) sd() and cor() 3 mean, median, mode, max, min, range, quantile, IQR, Check your work by using the summary and/or describe functions.

a. With reference to the lecture slides (Lecture 4), determine the mean center and standard distance for each of the above points datasets. b. Create a plot showing the events for each dataset as well as the location of the mean center and standard distance overlaid on that plot. NOTE: see "symbols()" for plotting the standard distance and in particular the argument "inches" for that function and see "points" for plotting the centroid.

A researcher has a set of numbers whose mean is equal to 13.8. The researcher wants to know if that set of numbers likely comes from the uniform distribution on the interval of 1 to 16. a. Determine the theoretical expected value for the uniform distribution on the interval of 1 to 16 using the equation method. b. With reference to the lecture slides, create the distribution of means from 99 random simulated draws from the uniform distribution on the interval from 1 to 16. C. Plot the histogram (function hist()) of the simulated distribution of means and place a vertical line on that plot at the location of the researcher's mean (abline(v=13.8)) and another line showing the theoretically expected value. d. Determine the probability that the researcher's mean comes from that distribution (the monte-carlo p-value). e. Explain your conclusion.

(4) Wooldridge Chapter 7 Exercise 1 1 Using the data in SLEEP75 (see also Problem 3 in Chapter 3), we obtain the estimated equation . The variable sleep is total minutes per week spent sleeping at night, totwrk is total weekly minutes spent working, educ and age are measured in years, and male is a gender dummy. (i) All other factors being equal, is there evidence that men sleep more than women? How strong is the evidence? (ii) Is there a statistically significant tradeoff between working and sleeping? What is the estimated tradeoff? (iii) What other regression do you need to run to test the null hypothesis that, holding other factors fixed, age has no effect on sleeping?

This exercise involves the Boston housing data set. (a) To begin, load in the Boston data set. The Boston data set is part of the MASS library in R. >library(MASS) Now the data set is contained in the object Boston. > Boston Read about the data set: > ?Boston How many rows are in this data set? How many columns? What do the rows and columns represent? (b) Make some pairwise scatterplots of the predictors (columns) in this data set. Describe your findings. (c) Are any of the predictors associated with per capita crime rate? If so, explain the relationship. (d) Do any of the suburbs of Boston appear to have particularly high crime rates? Tax rates? Pupil- teacher ratios? Comment on the range of each predictor. (e) How many of the suburbs in this data set bound the Charles river? (f) What is the median pupil-teacher ratio among the towns in this data set? (g) Which suburb of Boston has the lowest median value of owner-occupied homes? What are the values of the other predictors for that suburb, and how do those values compare to the overall ranges for those predictors? Comment on your findings. (h) In this data set, how many of the suburbs average more than seven rooms per dwelling? More than eight rooms per dwelling? Comment on the suburbs that average more than eight rooms per dwelling.

ASSESSMENT TASK 2 (PROBLEM SOLVING) in 2023T3 Using aggregation functions for data analysis The provided zip file contains the data file [ENB_2023.txt] and the R code [AggWaFit718.R] to use with the following tasks, include these in your R working directory. Total Marks 100, Weighting 20% Energy Appliances Dataset The Dataset for this assignment is modified version of a subset of data used in Candanedo et al, 2017. The experimental data have been used to create models of energy use of appliances in a low-energy house. The modified Dataset provides the energy use of Appliances (denoted as Y). The Dataset comprises 5 features (variables), which are denoted as X1, X2, X3, X4 and X5. The details about these variables are given below: X1: Temperature in living room area (Celsius degrees) X2: Humidity in living room area (percentage) X3: Temperature in office room (Celsius degrees) X4: Humidity in office room (percentage) X5: Pressure (millimeter of mercury) Y: Appliances energy consumption (Wh) For more information about the variables see Candanedo et al, 2017. Assignment tasks T1. Understand the data (i) Download the txt file (ENB_2023.txt) from CloudDeakin and save it to your R working directory. (ii) Assign the data to a matrix, e.g. using the.data <- as.matrix(read.table("ENB_2023.txt")) (iii) The variable of interest is Y. To investigate Y, generate a subset of num_row=400 (use the same setting for the following tasks as well) with numerical data e.g. using: my.data <- the.data[sample(1:num_samples,num_row) c(1:num_col)] This would give you a new dataset with num_row rows and num_col columns. Values of num_sample and num_col have to be determined from the data provided. (iv)Use scatter plots and histograms to understand the relationship between each of the variables X1 X2, X3, X4, X5, and your variable of interest Y, i.e., catter plots of (X1, Y), (X2, Y), …, (X5, Y), and histograms of X1 X2, X3, X4, X5, Y. T2. Transform the data Choose any FOUR variables from X1, X2, X3, X4, X5. Make appropriate transformations so that the values can be aggregated in order to predict the variable of interest Y. Assign your transformed data along with your transformed variable of interest to an array (it should be ``num_row" rows and 5 columns). Save it to a txt file titled "name-transformed.txt". write.table(your.data,"name-transformed.txt") The following tasks are based on the saved transformed data. T3. Build models and investigate the importance of each variable. (i) Download the AggWaFit.R file to your working directory and load into the R workspace using, (ii) a. b. C. d. source("AggWaFit718.R") Use the fitting functions to learn the parameters for A weighted arithmetic mean (WAM), Weighted power means (WPM) with p = 0.5, Weighted power means (WPM) with p = 2, An ordered weighted averaging function (OWA). T4. Use your model for prediction. Using your best fitting model from T3, i.e., WAM, WPM(0.5), WPM(2), or OWA, predict Y (Appliances) for the following inputs: X1= 19.1, X2=43.29, X3=19.7, X4=43.4, X5=743.6 You should use the same pre-processing as in Task 2. Compare your prediction with the measured Y=60. T5. Summarise your data analysis in up to 20 slides for a 5-minute presentation The slides should include the following content: Correlations between the variables; What kinds of data distributions you have identified in the raw data, use the histograms you have produced; List and explain the transformations applied for the selected four variables and the variable of interest; Explain the importance of the variables you have selected; The best fitting model on your selected data; include two tables: one with the error measures and correlation coefficients, and one summarizing the weights/parameters and any other useful information learned for your data; Your prediction result and comment on wheather you think it is reasonable; Discuss the best conditions (in terms of your chosen variables) under which a low energy use of appliances will occur. Comment on the implications and limitations of the fitting model you used for prediction. The slides should contain all necessary information to prove your findings. All the bold terms above must appear in slide titles. For the 5-minute presentation, you may provide a link to YouTube or upload a mp4 video. Any content beyond 5 minutes will not be graded. SUBMISSION: Submit to the SIT718 CloudDeakin Dropbox. Your final submission must include the following TWO files: 1. The presentation slides with video, "name-slides" (pdf), covering all of the items in above (where "name" is replaced with your name -you can use your surname or first name) (a link to YouTube or uploading a mp4 file). 2. The R code file (that you have written to produce your results) named "name-code.R" (where "name" is replaced with your surname or first name; .RMD file is not allowed). Your assignment will not be assessed if the code is missing, or the outputs of the code are inconsistent with the content of the slides. For referencing, follow the Harvard style: https://www.deakin.edu.au/students/studying/study-support/referencing/harvard You must cite all the datasets, packages and literature you used for this assessment. You will loose some marks for lack of or inappropriate citations/references. References Luis M. Candanedo, Veronique Feldheim, Dominique Deramaix. Data driven prediction models of energy use of appliances in a low-energy house, Energy and Buildings, Volume 140, 1 April 2017, pages 81-97, ISSN 0378-7788. The original data are available in: http://archive.ics.uci.edu/ml/datasets/Appliances+energy+prediction

Task 1 Review the 6 pages from the application article associated with thisassignment 1. Use R to generate the transition matrix from Table 1 on page 365 2. Use R to generate a transition diagram from Table 1 on page 365 3. Paste your transition matrix and diagram into a submission document 4. Post your R code, for Task 1, directly beneath your matrix and diagrams 5. Submit the code in the PDF document, do not submit separate R scriptfile