Search for question
Question

DATA220 Linear Regression 1 Assignment Let's check linearities! Use R and check the linearities of the following datasets. Answer each question. Type up all the answers with all the R codes on one Word/PDF file. In each dataset, I want you to 1) check the correlation and interpret the result. 2) try to fit the linear regression a. Check the conditions (3 conditions, first). b. Use lm() and fit a line. c. Observe R² and interpret its value. d. Check the last condition, constant variability, by analyzing a residual plot. e. Draw a conclusion of if the dataset is reasonable to fit the linear regression. Explain why. 1. Data: Ice Cream Sales - temperatures.csv This data has two variables. • Temperature in Fahrenheit • Ice Cream Profits: revenue in ice cream sales in $USD Let Temperature be the predictor variable and Ice Cream Profits to be the response variable. 2. Data: built-in dataset, iris This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica. This data has four variables. But I want you to use the following two. • sepal length as a predictor variable sepal width as a response variable DATA220 #The correlation is weak between those two variables. But please go through all the required steps for this assignment to see what would happen. And write them on the report. 3. Data: diabetes.csv There are 2 types of diabetes viz. insulin-dependent diabetes mellitus (IDDM)/Type-I diabetes and non-insulin-dependent diabetes mellitus (NIDDM)/Type-II diabetes. Type-l is a disorder of carbohydrate metabolism due to insufficient insulin secretion which could be hereditary or acquired. Type-II diabetes is a condition in which the sensitivity of body cells to insulin gets reduced. The dataset contains information about Pima Indian women. It contains many variables, but I want you to use the following two. Glucose: Plasma glucose concentration in an oral glucose tolerance test. Outcome: The target variable; 0 for no diabetes, 1 for diabetes. As we learned in Section 8.2, a categorical variable is also used for the linear regression. Although some of the required steps do not really make sense, but please go over all the required steps and see what would happen. And write them on the report.