Search for question
Question

Answer the following questions. All the answers and graphs need to be submitted in one Word or PDF file with all the R studio codes. All the graphs need to have their own title and labels (x-and y-axes). Question 1 Use a built-in dataset, oscars, to answer the following questions. (Oscars is available in a packet "openintro") 1. Based on this dataset, when was the first year of the Oscar Awards? 2. Create two histograms to show the distribution of ages. One for best actor and another for best actress. 3. Find means and medians of the best actors' and best actresses' ages. 4. Compare the distributions of ages of best actor and actress winners based on #1-3. Question 2 Use a built-in dataset, nyc_marathon, to answer the following questions. (nyc_marathon is also available in a packet "openintro") 1. Plot a histogram of time_hrs 2. Plot a boxplot of time_hrs 3. Based on the graphs from #1 and 2, what features of the distribution are apparent in the histogram and not the box plot? What features are apparent in the box plot but not in the histogram? 4. Plot a boxplot of marathon times for men and women. Hint: You can plot a formula, such as y ~ grp, where y is a numeric vector of data values to be split into groups according to the grouping variable grp (usually a factor). 5. Compare the distribution of marathon times for men and women based on the box plot from #4. 6. Considering all the 3 graphs so far, what may be the reason for the bimodal distribution? Explain. 7. Create a time series plot comparing men's and women's marathon times between 1970 and 2020. Try to add a legend here! [Hint: You may want to show men's and women's marathon times in two different colors.] 8. Describe what is visible in this plot from #7 but not in the others.