Question

2. Sales of Riding Mowers: Scatter Plots. A company that manufactures riding mowers wants to identify the best sales prospects for an intensive sales campaign. In particular, the manufacturer is interested

in classifying households as prospective owners or nonowners on the basis of Income (in $1000s) and Lot Size (in 1000 ft2). The marketing expert looked at a random sample of 24 households, given in the file Riding Mowers.csv. a. Using ggplot() in R, create a scatter plot of Lot Size vs. Income, color-coded by the outcome variable owner/nonowner. Make sure to obtain a well-formatted plot (create legible labels and a legend, etc.). 3. Laptop Sales at a London Computer Chain: Bar Charts and Boxplots. The file LaptopSales- January 2008.csv contains data for all sales of laptops at a computer chain in London in January 2008. This is a subset of the full dataset that includes data for the entire year. a. Using ggplot() in R, create a histogram and density plot of the average retail price. Overlay the histogram and density plot by a normal density plot. Does the price data look normally distributed? b. Create a Q-Q plot of the price data. Does the Q-Q plot confirm your finding (in part a.) about the normality of the data? Are there any outliers? c. Create a bar chart, showing the average retail price by store postcode (StorePostcode). Which store postcode has the highest average retail price? Which has the lowest? Hint: For better readability, feel free to rotate the x axis labels. You can do it by adding the following statement to the ggplot() statement: +theme (axis.text.x = element_text (angle = 90)). Also, in order to zoom in closer to the price limit, add the following statement to the ggplot () call: + coord_cartesian (ylim-c (480, 500)). d. Using the filter() function of the dplyr package, reduce your laptop data frame to only these two store postcodes. Using ggplot2, create a side-by-side violin plot of retail prices of the two stores. Be sure to jitter the markers for better visibility. Does there seem to be a huge difference between their prices? e. To better compare retail prices across post codes, create side-by-side boxplots of retail prices of the two postcodes and compare the price distribution in the two postcodes. Does there seem to be a difference between their price distributions? f. Suppose you are interested in what specific technical features greatly impact computer prices. Using the cut() function of the base package, create a new categorical variable in your main laptop sales data frame that contains 3 RetailPrice categories: "low", "medium", and "high." Call the variable PriceCat and make sure that its class is factor. Subsequently, create another data frame that contains this PriceCat variable and all the columns that describe laptop features (such as BatteryLife_Hrs, ScreenSize In, etc.). Finally, create a box-plot enhanced parallel coordinate plot with all the features on the horizontal axis and PriceCat on the vertical axis. Which feature(s) seem to be the most important determinants of PriceCat?