2. Sales of Riding Mowers: Scatter Plots. A company that manufactures riding mowers wants to

identify the best sales prospects for an intensive sales campaign. In particular, the manufacturer is

interested in classifying households as prospective owners or nonowners on the basis of Income (in

$1000s) and Lot Size (in 1000 ft2). The marketing expert looked at a random sample of 24 households,

given in the file Riding Mowers.csv.

a. Using ggplot() in R, create a scatter plot of Lot Size vs. Income, color-coded by the outcome variable

owner/nonowner. Make sure to obtain a well-formatted plot (create legible labels and a legend, etc.).

3. Laptop Sales at a London Computer Chain: Bar Charts and Boxplots. The file LaptopSales-

January 2008.csv contains data for all sales of laptops at a computer chain in London in January 2008.

This is a subset of the full dataset that includes data for the entire year.

a. Using ggplot() in R, create a histogram and density plot of the average retail price. Overlay the

histogram and density plot by a normal density plot. Does the price data look normally distributed?

b. Create a Q-Q plot of the price data. Does the Q-Q plot confirm your finding (in part a.) about the

normality of the data? Are there any outliers?

c. Create a bar chart, showing the average retail price by store postcode (StorePostcode). Which store

postcode has the highest average retail price? Which has the lowest? Hint: For better readability, feel free

to rotate the x axis labels. You can do it by adding the following statement to the ggplot() statement:

+theme (axis.text.x = element_text (angle = 90)). Also, in order to zoom in closer to the price

limit, add the following statement to the ggplot () call: + coord_cartesian (ylim-c (480, 500)).

d. Using the filter() function of the dplyr package, reduce your laptop data frame to only these two

store postcodes. Using ggplot2, create a side-by-side violin plot of retail prices of the two stores. Be

sure to jitter the markers for better visibility. Does there seem to be a huge difference between their prices?

e. To better compare retail prices across post codes, create side-by-side boxplots of retail prices of the two

postcodes and compare the price distribution in the two postcodes. Does there seem to be a difference

between their price distributions?

f. Suppose you are interested in what specific technical features greatly impact computer prices. Using

the cut() function of the base package, create a new categorical variable in your main laptop sales

data frame that contains 3 RetailPrice categories: "low", "medium", and "high." Call the variable

PriceCat and make sure that its class is factor. Subsequently, create another data frame that contains

this PriceCat variable and all the columns that describe laptop features (such as BatteryLife_Hrs,

ScreenSize In, etc.). Finally, create a box-plot enhanced parallel coordinate plot with all the features

on the horizontal axis and PriceCat on the vertical axis. Which feature(s) seem to be the most

important determinants of PriceCat?

Most Viewed Questions Of R Programming

Team Paper This is a team assignment. Each team must use the R tool to use for the project. Your team will have to find a problem to solve that deals with data and data sources. The team should meet prior to the residency weekend and agree to which problem they will solve. There should be data available online to pull from. On Friday evening, teams will meet for the residency weekend and put together a one page proposal that must be reviewed and approved by the professor that states: 1. The problem to solve. 2. The data sources to pull from. 3. The tool that will be used (R) a. Note high level graphics that will be used to solve the problem and how they will be used. On Saturday, teams will reconvene and complete the following: 1. There must be a through data plan this includes: a. Where the data is online b. How you know the data is accurate and the plan for ensuring accuracy. c. An import of the data into the selected tool. 2. A paper that includes: a. The data plan mentioned above b. The problem- note the description and why it's a problem and how you are going to make a recommendation with the data presented. c. The analysis of why the data will solve the issue d. Graphical representation and formulas. The screenshots of the formulas in the tool must be present. e. A summary of the consideration and evaluation of results i. This includes your teams' final analysis of the problem and the resolution.

Verified Answer

Download "llo Lab.zip" from Blackboard, rename it with your name and open (double click) the R project file. You run R script "llo Run.R" that contains all the code you need. The call to the function "Forecast Electric. Demand" in script "Project Functions.R" Calculation for R-squared measure. Plot the results (Note: To Plot type "p" in console) Run or debug "llot Run.R" to see how it works. (download needed packages if necessary) Note that the CSV data does not contain the day and the hour columns. In the function "Forecast Electric.Demand()" these fields are set to 1, thus the fit (r-square) is not good. This information can be extracted from the time stamp.

Verified Answer

1. Shipments of Household Appliances: Line Graphs. The file ApplianceShipments.csv contains the series of quarterly shipments (in millions of dollars) of US household appliances between 1985 and 1989. a. Create a well-formatted time plot of the data using the ggplot2 package. Add a smoothed line to the graph. For a closer view of the patterns, zoom in to the range of 3500-5000 on the y- axis. Hint: in order to convert Quarter into a date format, use the zoo library's as.Date utility: as.Date (as. yearqtr (appship.df$Quarter,format="Q%q-%Y")). b. Does there appear to be a quarterly pattern? c. Using ggplot2 in R, create one chart with four separate lines, one line for each of Q1, Q2, Q3, and Q4. In R, this can be achieved by generating a data.frame for each quarter Q1, Q2, Q3, Q4 (use seq(1,20,4), seq (2,20,4), etc. to create indexes for different quarters), and then plotting them as separate series on the line graph. Does there appear to be a difference between quarters? Hint: For ggplot() to display the legend, the color aesthetics must be included inside the aes() specification. d. Using ggplot2, create a chart with one line of average shipments in each quarter. Hint: Use the quarter () command of the lubridate package to create a new column in the shipments data frame and use tapply to average shipments across quarters. e. Using ggplot2, create a line graph of the series at a yearly aggregated level (i.e., the total shipments in each year) and comment on what happened to shipments over years. Hint: Use the year() function of the lubridate package to extract the years the shipments data frame.

Verified Answer

Part 1: Create an R script that computes the measures of central tendency and measures of variability and the relationships for each of the seven variables in the attitude dataset. Use the functions: var( ) sd() and cor() 3 mean, median, mode, max, min, range, quantile, IQR, Check your work by using the summary and/or describe functions.

Verified Answer

(4) Wooldridge Chapter 7 Exercise 1 1 Using the data in SLEEP75 (see also Problem 3 in Chapter 3), we obtain the estimated equation . The variable sleep is total minutes per week spent sleeping at night, totwrk is total weekly minutes spent working, educ and age are measured in years, and male is a gender dummy. (i) All other factors being equal, is there evidence that men sleep more than women? How strong is the evidence? (ii) Is there a statistically significant tradeoff between working and sleeping? What is the estimated tradeoff? (iii) What other regression do you need to run to test the null hypothesis that, holding other factors fixed, age has no effect on sleeping?

Verified Answer

A researcher has a set of numbers whose mean is equal to 13.8. The researcher wants to know if that set of numbers likely comes from the uniform distribution on the interval of 1 to 16. a. Determine the theoretical expected value for the uniform distribution on the interval of 1 to 16 using the equation method. b. With reference to the lecture slides, create the distribution of means from 99 random simulated draws from the uniform distribution on the interval from 1 to 16. C. Plot the histogram (function hist()) of the simulated distribution of means and place a vertical line on that plot at the location of the researcher's mean (abline(v=13.8)) and another line showing the theoretically expected value. d. Determine the probability that the researcher's mean comes from that distribution (the monte-carlo p-value). e. Explain your conclusion.

Verified Answer

a. With reference to the lecture slides (Lecture 4), determine the mean center and standard distance for each of the above points datasets. b. Create a plot showing the events for each dataset as well as the location of the mean center and standard distance overlaid on that plot. NOTE: see "symbols()" for plotting the standard distance and in particular the argument "inches" for that function and see "points" for plotting the centroid.

Verified Answer

Equipment Precision Comparison Suppose you are trying to make a difficult measurement. Fortunately there is commercial equipment available for this purpose, although it is expensive. Your company has a large budget and wants to obtain the best equipment, but it also does not want to waste money needlessly. You are responsible for performing some tests to guide their decision. You have ordered two trial samples of metering equipment to test which one is better: Equipment A (which costs £60,000) and Equipment B (which costs £30,000). You take 10 measurements using each in a controlled environment. Equipment A gives the following readings: 128.00, 125.04, 125.17, 128.62, 126.06, 124.54, 128.80, 129.98, 126.49, 127.16 Equipment B gives: 122.16, 127.35, 124.73, 129.51, 123.60, 132.67, 131.07, 126.20, 132.44, 126.91 You may assume that measurement errors are normally distributed. 1. The "correct" value for the measurement is supposed to be 127. Verify that both tools are properly "calibrated" (i.e., that they provide measurements that on average are consistent with this value) with an appropriate statistical test. 2. Suppose you did not know that the true value was 127, or there was a possibility that the true value was not 127. Use a statistical test to evaluate whether the two tools produce measurements that are, on average, consistent with each other. 3. Company specifications require that the calibration accuracy (the absolute difference between the average of a very large number of measurements and the "correct" value) of the tool must be better (less) than 5. Show that both tools meet this requirement to better than 99% confidence under the assumptions above. 4. The most important consideration in your decision is precision: you want the tool that produces measurements with the least variance (lowest standard deviation). Can you tell (using an appropriate statistical test) if one tool is significantly more precise than the other? If so, which tool? Quote a p-value, and use a confidence interval to quantify how much more precise one tool is (or isn't) than the other. 5. Tool A is much more expensive, and your company might not want to spend the extra money if it cannot shown to be clearly superior. Conduct a modified version of the above hypothesis test with this information in mind, and quote a new p-value. 6. Would you recommend purchasing tool A, tool B, or would you run more tests (at a cost of £5,000 in overheads plus £500 per test)? If you run more tests, how many more tests would you run? Explain the basis for your decision in a few sentences or less.

Verified Answer

9. Make a scatterplot with the "AGE" variable on the horizontal axis and the "INCTOT" variable on the vertical axis. Include the line computed in question 8. (If you used the built-in function for question 8, please extract the values of the slope and intercept from the fitted model object, rather than using copy-paste.) (10 points)

Verified Answer

Instructions You may use web searches, but not interactive methods such as asking others online or in person. In questions with code blocks, full credit will reserved for effective use of R to reach a correct solution. Questions 1. An team has 8 members. Denote them by {1,2,3,4,5,6,7,8}. Construct a reasonable, standard model for selecting a team member in such a way that any member is equally likely to be selected, recording the member selected, and repeating this process one more time using the remaining set of seven team members. Thus outcomes will be pairs of values (a, b) with a, b € {1,2,3,4,5,6,7,8} and a ‡ b. You don't have to explain the model, just provide the values requested below. What is the probability of the outcome (5,3)? (5 points) What is the probability of the event {(a, b)|a < b}? (5 points)

Verified Answer