r programming Homework Help

Search for questionUpload Image

Questions & Answers

Please prepare your submission in a document (Word or PDF) and clearly label all answers and output with their corresponding question number and part.

Verified Answer

Download "llo Lab.zip" from Blackboard, rename it with your name and open (double click) the R project file. You run R script "llo Run.R" that contains all the code you need. The call to the function "Forecast Electric. Demand" in script "Project Functions.R" Calculation for R-squared measure. Plot the results (Note: To Plot type "p" in console) Run or debug "llot Run.R" to see how it works. (download needed packages if necessary) Note that the CSV data does not contain the day and the hour columns. In the function "Forecast Electric.Demand()" these fields are set to 1, thus the fit (r-square) is not good. This information can be extracted from the time stamp.

Verified Answer

Question 1. Consider a population of perennial plants that breed in the early spring and suffer high drought-related mortality late in the summer. Field monitoring experiments suggest that drought leads to a 50% decline in the population during the late summer (d = 0.5). Given this degree of mortality, use the model to calculate how many offspring each individual would, on average, have to produce during the breeding season to prevent the population from declining over time. In other words, calculate the minimum value of b that would be compatible with population growth. Scoring: Full credit for providing the correct answer and showing how the answer was obtained (i.e., show your work). Suppose that you are monitoring island endemic cricket population that has recently become threatened due to an invasive parasitoid wasp species that is attacking its members. From observations of birth and death rates, you estimate that the intrinsic growth rate of the cricket population to be r = -0.05, which has a 95% confidence interval of: 95% C.I. for r = [-0.01, -0.1] Since the entire confidence interval for your estimate of r is negative, your data imply that the population will decline over time.

Verified Answer

Part 1: Create an R script that computes the measures of central tendency and measures of variability and the relationships for each of the seven variables in the attitude dataset. Use the functions: var( ) sd() and cor() 3 mean, median, mode, max, min, range, quantile, IQR, Check your work by using the summary and/or describe functions.

Verified Answer

Problem 6.3.2 Use "PimaIndians Diabetes2" dataset and ggplot to draw histograms for "pressure". One histogram with counts and one histogram with density.

Verified Answer

A researcher has a set of numbers whose mean is equal to 13.8. The researcher wants to know if that set of numbers likely comes from the uniform distribution on the interval of 1 to 16. a. Determine the theoretical expected value for the uniform distribution on the interval of 1 to 16 using the equation method. b. With reference to the lecture slides, create the distribution of means from 99 random simulated draws from the uniform distribution on the interval from 1 to 16. C. Plot the histogram (function hist()) of the simulated distribution of means and place a vertical line on that plot at the location of the researcher's mean (abline(v=13.8)) and another line showing the theoretically expected value. d. Determine the probability that the researcher's mean comes from that distribution (the monte-carlo p-value). e. Explain your conclusion.

Verified Answer

a. With reference to the lecture slides (Lecture 4), determine the mean center and standard distance for each of the above points datasets. b. Create a plot showing the events for each dataset as well as the location of the mean center and standard distance overlaid on that plot. NOTE: see "symbols()" for plotting the standard distance and in particular the argument "inches" for that function and see "points" for plotting the centroid.

Verified Answer

1. Shipments of Household Appliances: Line Graphs. The file ApplianceShipments.csv contains the series of quarterly shipments (in millions of dollars) of US household appliances between 1985 and 1989. a. Create a well-formatted time plot of the data using the ggplot2 package. Add a smoothed line to the graph. For a closer view of the patterns, zoom in to the range of 3500-5000 on the y- axis. Hint: in order to convert Quarter into a date format, use the zoo library's as.Date utility: as.Date (as. yearqtr (appship.df$Quarter,format="Q%q-%Y")). b. Does there appear to be a quarterly pattern? c. Using ggplot2 in R, create one chart with four separate lines, one line for each of Q1, Q2, Q3, and Q4. In R, this can be achieved by generating a data.frame for each quarter Q1, Q2, Q3, Q4 (use seq(1,20,4), seq (2,20,4), etc. to create indexes for different quarters), and then plotting them as separate series on the line graph. Does there appear to be a difference between quarters? Hint: For ggplot() to display the legend, the color aesthetics must be included inside the aes() specification. d. Using ggplot2, create a chart with one line of average shipments in each quarter. Hint: Use the quarter () command of the lubridate package to create a new column in the shipments data frame and use tapply to average shipments across quarters. e. Using ggplot2, create a line graph of the series at a yearly aggregated level (i.e., the total shipments in each year) and comment on what happened to shipments over years. Hint: Use the year() function of the lubridate package to extract the years the shipments data frame.

Verified Answer

2. Sales of Riding Mowers: Scatter Plots. A company that manufactures riding mowers wants to identify the best sales prospects for an intensive sales campaign. In particular, the manufacturer is interested in classifying households as prospective owners or nonowners on the basis of Income (in $1000s) and Lot Size (in 1000 ft2). The marketing expert looked at a random sample of 24 households, given in the file Riding Mowers.csv. a. Using ggplot() in R, create a scatter plot of Lot Size vs. Income, color-coded by the outcome variable owner/nonowner. Make sure to obtain a well-formatted plot (create legible labels and a legend, etc.). 3. Laptop Sales at a London Computer Chain: Bar Charts and Boxplots. The file LaptopSales- January 2008.csv contains data for all sales of laptops at a computer chain in London in January 2008. This is a subset of the full dataset that includes data for the entire year. a. Using ggplot() in R, create a histogram and density plot of the average retail price. Overlay the histogram and density plot by a normal density plot. Does the price data look normally distributed? b. Create a Q-Q plot of the price data. Does the Q-Q plot confirm your finding (in part a.) about the normality of the data? Are there any outliers? c. Create a bar chart, showing the average retail price by store postcode (StorePostcode). Which store postcode has the highest average retail price? Which has the lowest? Hint: For better readability, feel free to rotate the x axis labels. You can do it by adding the following statement to the ggplot() statement: +theme (axis.text.x = element_text (angle = 90)). Also, in order to zoom in closer to the price limit, add the following statement to the ggplot () call: + coord_cartesian (ylim-c (480, 500)). d. Using the filter() function of the dplyr package, reduce your laptop data frame to only these two store postcodes. Using ggplot2, create a side-by-side violin plot of retail prices of the two stores. Be sure to jitter the markers for better visibility. Does there seem to be a huge difference between their prices? e. To better compare retail prices across post codes, create side-by-side boxplots of retail prices of the two postcodes and compare the price distribution in the two postcodes. Does there seem to be a difference between their price distributions? f. Suppose you are interested in what specific technical features greatly impact computer prices. Using the cut() function of the base package, create a new categorical variable in your main laptop sales data frame that contains 3 RetailPrice categories: "low", "medium", and "high." Call the variable PriceCat and make sure that its class is factor. Subsequently, create another data frame that contains this PriceCat variable and all the columns that describe laptop features (such as BatteryLife_Hrs, ScreenSize In, etc.). Finally, create a box-plot enhanced parallel coordinate plot with all the features on the horizontal axis and PriceCat on the vertical axis. Which feature(s) seem to be the most important determinants of PriceCat?

Verified Answer

2. Consider a continuous random variable X with the probability density function defined by ƒ (x) = ³ (1 − x²) for x € [−1,1] and ƒ(x) = 0 otherwise. What is the probability of the event consisting of the interval [-3,0]? (10 points)

Verified Answer

3. Construct a reasonable model for rolling a fair die twice and recording the results in order. You don't have to explain the model, just provide the information requested below. Let X be the event that the first number is odd. Let Y be the event that the second number is even. Let Z be the event that both numbers have the same parity, that is, both are even or both are odd. Are the events X and Z independent? (5 points) Are the events Xn Y and Z independent? (5 points)

Verified Answer

4. Consider a continuous random variable X with the probability density function defined by f(x) = c(x + 2) for x € [-2,0],f(x) = c(2-x) for x € [0,2] and f(x) = 0 otherwise. What is value of c? (5 points)

Verified Answer

6. If you model the data below as a sample from a Normal distribution, what is the P(-(x-1) =#)²) ? (5 maximum likelihood estimate for the o² in the density f(x) = exp points) v<-c(0.59, -0.55, 1.95, 1.02, 0.30) 1 √270²

Verified Answer

10. In the data set "demog_data.csv" what is the mean value of INCTOT for women in each category of EDUC? (For full credit, please do this without looping through the EDUC values. You can use "summarize" from dplyr, with group_by.) (5 points)

Verified Answer

11. Please restrict the data set "demog_data.csv" to cases in which "INCTOT" is greater than 0 and "EDUC" is greater than 4. What is the mean value of "INCTOT" for the remaining observations? (5 points)

Verified Answer

14. Set the random seed to 12345. Create a vector of 60 samples from the exponential distribution ("rexp") with rate equal to 1/(1,000,000). Create a matrix of these values with 10 rows and 6 columns in such a way that the vector of the first 10 integers in the sample equals the first column in the matrix, the vector of the second 10 integers in the sample equals the second column in the matrix, and so on. Use the "apply" function to find the median value in each row. (5 points)

Verified Answer

Question 3 The data loaded below are sampled from IPUMS, https://ipums.org/, an interface for accessing survey and census data. These are drawn from U.S. Census microdata in a way that approximates a simple random sample from Colorado households in 2017 that are headed by unmarried men and a simple random sample from Colorado household in 2017 that are headed by unmarried women. Steven Ruggles, Katie Genadek, Ronald Goeken, Josiah Grover, and Matthew Sobek. Integrated Public Use Microdata Series: Version 6.0 [dataset]. Minneapolis: University of Minnesota, 2015. http://doi.org/10.18128/D010.V6.0. The cases with HHTYPE equal to 2 make up the sample of male-headed households. The cases with HHTYPE equal to 3 make up the sample of female-headed households./nHHINCOME 3.a. (5 points) Are the household incomes for the male-headed households approximately Normally distributed? Are the household incomes for the female-headed households approximately Normally distributed? Please provide visualizations to support your response./n3.b. (5 points) Please carry out a Mann-Whitney U-test on the two data sets, the household incomes for the male-headed households and the household incomes for the female-headed households. What can you conclude from the results? In particular, can this test be interpreted as a test of center in this case?/n3c. (0 points) The code below carries out a bootstrap test of the difference in means of the household incomes for the male-headed households and the household incomes for the female-headed households. Please study this and be prepared to ask questions about it in class. Basic bootstrap samples are samples with replacement of cases from the data. They are used to estimate confidence intervals on statistics non-parametrically. A data vector s defines an empirical probability distribution as follows. The sample space is the set of distinct values in s. The set of events is the power set of the sample space. The probability function is defined by the density function f(s) = where k is the number of occurrences of the values in s and n is the length of s. If the empirical distribution is close to the population distribution, then a bootstrap sample from the empirical distribution simulates a new sample. Computing the range of the statistic of interest for a large number of bootstrap samples gives an indication of the range of values that would be produced if the population actually was resampled.

Verified Answer

2. Wilcoxon signed rank Please perform a Wilcoxon signed rank test of the null hypothesis that dat_one_sample is drawn from a population symmetric around its mean with mean and hence median equal to 0.1. Please give an interpretation of the result. (5 points)

Verified Answer