Search for question
Question

Part B-Lab Component Finish the following questions using R and R commander. 1. The survey data STATISTICSSTUDENTSSURVEYFORR contains several columns, including the columns NUMVINYL (number of vinyl records a student owns),

WKHRSHWK (weekly hours of homework per course by a student), LIKEENGLISH (how much a student likes the subject English on a scale of 1 (dislike very much) to 5 (like very much)), IMPISSUE (most important election issue a student feels faces us today, including criminal, domestic policy, education, electoral, environmental, foreign policy, healthcare, immigration, and social). For each of these 4 columns, indicate if the variable is qualitative (non-ordinal), qualitative (ordinal), quantitative (discrete) or quantitative (continuous). (4 marks) 2.The data file HIGHLANDS contains the assessed dwelling values of ALL residential Edmonton addresses in the HIGHLANDS. (FYI: The file HIGHLANDS is a subset of https://data.edmonton.ca/City-Administration/Property Assessment-Data-Current-Calendar- Year-/07d6-ambg) a) Create and paste a histogram that summarizes the assessed residential dwelling values for the HIGHLANDS dwellings. You will need to make 20 bins in order to get a meaningful picture. Comment on the distribution of the assessed dwelling values in terms of overall shape, modality, symmetricity/skewness if applicable. (4 marks) b) Create and paste a boxplot that summarizes the assessed residential dwelling values for the HIGHLANDS dwellings. What can you tell about the distribution of the data? (4 marks) c)Find and paste a full set of descriptive measures for the entire population of assessed HIGHLANDS residential dwelling values. Choose the most appropriate descriptive measures and explain your choice (with reference to the shape of the data distribution). (4 marks) d) You will notice that the histogram distribution of assessed residential dwelling values that you found for the Highlands in the sample of size 15 taken in Part A does not match the shape of the distribution you found when you used the HIGHLANDS datafile with the assessed dwelling values for all residential dwellings in the Highlands in Part B. Furthermore, the boxplot of assessed residential dwelling values that you found for the Highlands in the sample of size 15 taken in Part A does not match the shape of the boxplot you found when you used the HIGHLANDS datafile with the assessed dwelling values for all residential dwellings in the Highlands. State how the shapes differ and explain why this may have happened. (4 marks) 3. The data set HEARTFAILUREPREDICTION looks at several variables that play a role in heart failure prediction for 918 people in the United States. (For the curious, this set of open data can be found at https://www.kanale.com/datasets/fedesoriano/heart failure-prediction.)/nStat 151-Assignment 1 a) For each of the numerical variables Age (in years) and MaxHr (maximum heart rate achieved in bpm), find and paste two suitable graphs that summarize their distribution. Comment on each distribution in terms of overall shape, modality, symmetricity/skewness if applicable. (6 marks) b)For each of the numerical variables Age and MaxHR, find and paste a full set of descriptive measures. Choose the most appropriate descriptive measures in each case and justify your choices (with reference to the shapes of their distributions). (6 marks) c) Find and paste the most suitable graph to show the relationship between Age and MaxHr (maximum heart rate) among the 918 respondents in this dataset. Briefly describe the relationship you see. (4 marks) 4. The dataset HENSANDBEES summarizes the locations in Edmonton where residents are housing hens or keeping bees. (FYI, this dataset was taken from https://data.edmonton.co/Community Services/Hens-and-Bees/trz2-qkzs) a) Find and paste a suitable graph that summarizes how the counts in the data in the numerical column that indicates the number of hens for households that were granted a hen license is distributed. Look in the R dropdown of graphs and explore to find a graph that you think best accommodates the fact that there is a low number of distinct potential numerical values for #hens (2,3,4,5,6,7,8). Be sure to explain why you chose this graph. You will want to think about this one and suggest something perhaps not covered in lab! (4 marks) Sa): Draw side by side histograms to compare the maximum heartrates of male and female respondents in the HEARTFAILUREPREDICTION dataset. Decide if it is more appropriate to use counts or relative frequencies (percents) on the histogram. Indicate why you made the choice you did. Describe your chosen graphs. What further graphs do you think you might consider to be of interest after looking at these two histograms? (7 marks) Sb): Find proper numerical summaries to compare the maximum heartrates of male and female respondents in the heart dataset, remembering to base your choices on the distribution shapes obtained above. Briefly explain how your numerical summaries best elucidate your graphs. (5 marks) Sc: Consider the variable "MaxHr" in the heart dataset. i) Calculate the interquartile range of the variable "MaxHr" for females. (2 marks) 1) What percent of distribution data falls within the interquartile range of the "MaxHr" variable for the females? (1 mark) ii) Calculate the interquartile range of the variable "MaxHr" for males. (2 marks) w) What percent of distribution data falls within the interquartile range of the "MaxH variable for the males. (1 mark)

Fig: 1

Fig: 2