Search for question
Question

# Homework # Use the following code to read dataset "WestRoxbury.csv". # The dataset is on Canvas. d = read.csv ("datasets/WestRoxbury.csv") View(d) # Use ggplot2 package to answer the following questions.

library(ggplot2) # Questions # 1) YR.BUILT column has a typo in it. Find it. # HINT: Boxplot or scatter plot the YR.BUILT variable to identify the outlier/typo. # 2) When you identify what it is, imputate it with median YR.BUILT. # 3) Create a histogram for the YR.BUILT variable. Use 20 bins. Comment on the shape of the distribution. # # 4) We want to investigate the relationship between TOTAL.VALUE and YR.BUILT. # Create a scatter plot using the two variables. Color is "navy" and transparency is 20%. Discuss any noticeable patterns in the plot. # # 5) Create a scatter plot using total value, lot sqft and, remodel as color parameter. Alpha is 0.40, color is navy. Discuss any noticeable patterns in the plot. # # 6) By modifying your code in Question 5, introduce rooms variable into the plot in the size parameter. # # # 7) Create a bar plot in which x-axis shows ROOMS and y-axis shows average TOTAL.VALUE. HINT: Use dplyr group_by Rooms & summarize total value, then ggplot(). library (dplyr) # 8) Create a 4-panel chart in which the following plots show up: # a) scatter plot of total value vs tax # # # # 9) In this question, you will find the count of properties built before 1900, and after 1900. Although there are much easier techniques to answer this question, you will write code that uses for() loop and if..else function. HINT: Create a loop to take each year from YR.BUILT. Then use if() to see if the year is before 1900. if so, increase an index called before1900 by 1. Otherwise, increase an index called after1900 by 1. Print both indexes right after the loop. * * * * * * * # # # # b) scatter plot of total value vs lot sqft c) box plot of total value and remodel d) box plot of lot sqr foot and remodel #

Fig: 1