Question

2. Sales of Riding Mowers: Scatter Plots. A company that manufactures riding mowers wants to

identify the best sales prospects for an intensive sales campaign. In particular, the manufacturer is

interested in classifying households as prospective owners or nonowners on the basis of Income (in

$1000s) and Lot Size (in 1000 ft2). The marketing expert looked at a random sample of 24 households,

given in the file Riding Mowers.csv.

a. Using ggplot() in R, create a scatter plot of Lot Size vs. Income, color-coded by the outcome variable

owner/nonowner. Make sure to obtain a well-formatted plot (create legible labels and a legend, etc.).

3. Laptop Sales at a London Computer Chain: Bar Charts and Boxplots. The file LaptopSales-

January 2008.csv contains data for all sales of laptops at a computer chain in London in January 2008.

This is a subset of the full dataset that includes data for the entire year.

a. Using ggplot() in R, create a histogram and density plot of the average retail price. Overlay the

histogram and density plot by a normal density plot. Does the price data look normally distributed?

b. Create a Q-Q plot of the price data. Does the Q-Q plot confirm your finding (in part a.) about the

normality of the data? Are there any outliers?

c. Create a bar chart, showing the average retail price by store postcode (StorePostcode). Which store

postcode has the highest average retail price? Which has the lowest? Hint: For better readability, feel free

to rotate the x axis labels. You can do it by adding the following statement to the ggplot() statement:

+theme (axis.text.x = element_text (angle = 90)). Also, in order to zoom in closer to the price

limit, add the following statement to the ggplot () call: + coord_cartesian (ylim-c (480, 500)).

d. Using the filter() function of the dplyr package, reduce your laptop data frame to only these two

store postcodes. Using ggplot2, create a side-by-side violin plot of retail prices of the two stores. Be

sure to jitter the markers for better visibility. Does there seem to be a huge difference between their prices?

e. To better compare retail prices across post codes, create side-by-side boxplots of retail prices of the two

postcodes and compare the price distribution in the two postcodes. Does there seem to be a difference

between their price distributions?

f. Suppose you are interested in what specific technical features greatly impact computer prices. Using

the cut() function of the base package, create a new categorical variable in your main laptop sales

data frame that contains 3 RetailPrice categories: "low", "medium", and "high." Call the variable

PriceCat and make sure that its class is factor. Subsequently, create another data frame that contains

this PriceCat variable and all the columns that describe laptop features (such as BatteryLife_Hrs,

ScreenSize In, etc.). Finally, create a box-plot enhanced parallel coordinate plot with all the features

on the horizontal axis and PriceCat on the vertical axis. Which feature(s) seem to be the most

important determinants of PriceCat?