Search for question
Question

This question does not require any software. Suppose the random variables, X1, X2 have the

Fig: 1


Most Viewed Questions Of R Programming

Team Paper This is a team assignment. Each team must use the R tool to use for the project. Your team will have to find a problem to solve that deals with data and data sources. The team should meet prior to the residency weekend and agree to which problem they will solve. There should be data available online to pull from. On Friday evening, teams will meet for the residency weekend and put together a one page proposal that must be reviewed and approved by the professor that states: 1. The problem to solve. 2. The data sources to pull from. 3. The tool that will be used (R) a. Note high level graphics that will be used to solve the problem and how they will be used. On Saturday, teams will reconvene and complete the following: 1. There must be a through data plan this includes: a. Where the data is online b. How you know the data is accurate and the plan for ensuring accuracy. c. An import of the data into the selected tool. 2. A paper that includes: a. The data plan mentioned above b. The problem- note the description and why it's a problem and how you are going to make a recommendation with the data presented. c. The analysis of why the data will solve the issue d. Graphical representation and formulas. The screenshots of the formulas in the tool must be present. e. A summary of the consideration and evaluation of results i. This includes your teams' final analysis of the problem and the resolution.


Download "llo Lab.zip" from Blackboard, rename it with your name and open (double click) the R project file. You run R script "llo Run.R" that contains all the code you need. The call to the function "Forecast Electric. Demand" in script "Project Functions.R" Calculation for R-squared measure. Plot the results (Note: To Plot type "p" in console) Run or debug "llot Run.R" to see how it works. (download needed packages if necessary) Note that the CSV data does not contain the day and the hour columns. In the function "Forecast Electric.Demand()" these fields are set to 1, thus the fit (r-square) is not good. This information can be extracted from the time stamp.


1. Shipments of Household Appliances: Line Graphs. The file ApplianceShipments.csv contains the series of quarterly shipments (in millions of dollars) of US household appliances between 1985 and 1989. a. Create a well-formatted time plot of the data using the ggplot2 package. Add a smoothed line to the graph. For a closer view of the patterns, zoom in to the range of 3500-5000 on the y- axis. Hint: in order to convert Quarter into a date format, use the zoo library's as.Date utility: as.Date (as. yearqtr (appship.df$Quarter,format="Q%q-%Y")). b. Does there appear to be a quarterly pattern? c. Using ggplot2 in R, create one chart with four separate lines, one line for each of Q1, Q2, Q3, and Q4. In R, this can be achieved by generating a data.frame for each quarter Q1, Q2, Q3, Q4 (use seq(1,20,4), seq (2,20,4), etc. to create indexes for different quarters), and then plotting them as separate series on the line graph. Does there appear to be a difference between quarters? Hint: For ggplot() to display the legend, the color aesthetics must be included inside the aes() specification. d. Using ggplot2, create a chart with one line of average shipments in each quarter. Hint: Use the quarter () command of the lubridate package to create a new column in the shipments data frame and use tapply to average shipments across quarters. e. Using ggplot2, create a line graph of the series at a yearly aggregated level (i.e., the total shipments in each year) and comment on what happened to shipments over years. Hint: Use the year() function of the lubridate package to extract the years the shipments data frame.


Part 1: Create an R script that computes the measures of central tendency and measures of variability and the relationships for each of the seven variables in the attitude dataset. Use the functions: var( ) sd() and cor() 3 mean, median, mode, max, min, range, quantile, IQR, Check your work by using the summary and/or describe functions.


(4) Wooldridge Chapter 7 Exercise 1 1 Using the data in SLEEP75 (see also Problem 3 in Chapter 3), we obtain the estimated equation . The variable sleep is total minutes per week spent sleeping at night, totwrk is total weekly minutes spent working, educ and age are measured in years, and male is a gender dummy. (i) All other factors being equal, is there evidence that men sleep more than women? How strong is the evidence? (ii) Is there a statistically significant tradeoff between working and sleeping? What is the estimated tradeoff? (iii) What other regression do you need to run to test the null hypothesis that, holding other factors fixed, age has no effect on sleeping?


a. With reference to the lecture slides (Lecture 4), determine the mean center and standard distance for each of the above points datasets. b. Create a plot showing the events for each dataset as well as the location of the mean center and standard distance overlaid on that plot. NOTE: see "symbols()" for plotting the standard distance and in particular the argument "inches" for that function and see "points" for plotting the centroid.


A researcher has a set of numbers whose mean is equal to 13.8. The researcher wants to know if that set of numbers likely comes from the uniform distribution on the interval of 1 to 16. a. Determine the theoretical expected value for the uniform distribution on the interval of 1 to 16 using the equation method. b. With reference to the lecture slides, create the distribution of means from 99 random simulated draws from the uniform distribution on the interval from 1 to 16. C. Plot the histogram (function hist()) of the simulated distribution of means and place a vertical line on that plot at the location of the researcher's mean (abline(v=13.8)) and another line showing the theoretically expected value. d. Determine the probability that the researcher's mean comes from that distribution (the monte-carlo p-value). e. Explain your conclusion.


Equipment Precision Comparison Suppose you are trying to make a difficult measurement. Fortunately there is commercial equipment available for this purpose, although it is expensive. Your company has a large budget and wants to obtain the best equipment, but it also does not want to waste money needlessly. You are responsible for performing some tests to guide their decision. You have ordered two trial samples of metering equipment to test which one is better: Equipment A (which costs £60,000) and Equipment B (which costs £30,000). You take 10 measurements using each in a controlled environment. Equipment A gives the following readings: 128.00, 125.04, 125.17, 128.62, 126.06, 124.54, 128.80, 129.98, 126.49, 127.16 Equipment B gives: 122.16, 127.35, 124.73, 129.51, 123.60, 132.67, 131.07, 126.20, 132.44, 126.91 You may assume that measurement errors are normally distributed. 1. The "correct" value for the measurement is supposed to be 127. Verify that both tools are properly "calibrated" (i.e., that they provide measurements that on average are consistent with this value) with an appropriate statistical test. 2. Suppose you did not know that the true value was 127, or there was a possibility that the true value was not 127. Use a statistical test to evaluate whether the two tools produce measurements that are, on average, consistent with each other. 3. Company specifications require that the calibration accuracy (the absolute difference between the average of a very large number of measurements and the "correct" value) of the tool must be better (less) than 5. Show that both tools meet this requirement to better than 99% confidence under the assumptions above. 4. The most important consideration in your decision is precision: you want the tool that produces measurements with the least variance (lowest standard deviation). Can you tell (using an appropriate statistical test) if one tool is significantly more precise than the other? If so, which tool? Quote a p-value, and use a confidence interval to quantify how much more precise one tool is (or isn't) than the other. 5. Tool A is much more expensive, and your company might not want to spend the extra money if it cannot shown to be clearly superior. Conduct a modified version of the above hypothesis test with this information in mind, and quote a new p-value. 6. Would you recommend purchasing tool A, tool B, or would you run more tests (at a cost of £5,000 in overheads plus £500 per test)? If you run more tests, how many more tests would you run? Explain the basis for your decision in a few sentences or less.


9. Make a scatterplot with the "AGE" variable on the horizontal axis and the "INCTOT" variable on the vertical axis. Include the line computed in question 8. (If you used the built-in function for question 8, please extract the values of the slope and intercept from the fitted model object, rather than using copy-paste.) (10 points)


Instructions You may use web searches, but not interactive methods such as asking others online or in person. In questions with code blocks, full credit will reserved for effective use of R to reach a correct solution. Questions 1. An team has 8 members. Denote them by {1,2,3,4,5,6,7,8}. Construct a reasonable, standard model for selecting a team member in such a way that any member is equally likely to be selected, recording the member selected, and repeating this process one more time using the remaining set of seven team members. Thus outcomes will be pairs of values (a, b) with a, b € {1,2,3,4,5,6,7,8} and a ‡ b. You don't have to explain the model, just provide the values requested below. What is the probability of the outcome (5,3)? (5 points) What is the probability of the event {(a, b)|a < b}? (5 points)