Big Data and Machine Learning for Economics and Finance
Provide your answers in a document generated by RMarkdown. For each answer,
provide the R code, the R output and your comments on the output. Comment each
line of your R code as well. Give thorough explanations throughout.
Exercise 1. (10 points) For this exercise, the only extra package allowed is ISLR2. The datset
Default will be used throughout the exercise and is accessible through the ISLR2 package.
I. Consider the following figure constructed from the dataset Default.
balance
2500
balance
2000
1500
1000
500
2000 2500
1500
1000
a) Write the R code to reproduce that plot.
b) What is the conditioning variable in that plot? Give a thorough interpretation.
II. Consider another figure constructed from the same dataset.
500
T
1.0
No
1.2
default
Figure 1. Two box plots
1.4
1
default
8
1.6
Yes
a) Write the R code to reproduce that plot.
Figure 2. A scatter plot.
T
1.8
T
2.0/nb) Carry out a regression exercise where you are attempting to predict balance
given only the variable default.
1. Write the R code to train that model.
2. Modify the plot on figure 2 to add the predicted regression line.
3. Give predictions of balance for all possible values of default. Show how
to do the calculations directly in R and by using the regression output.
III. Consider another figure from the same dataset
balance
ose coo C
1000
DOG
02
08
1.0
Figure 3. Another scatter plot
a) What are the differences between this plot and the previous one?
b) Would you obtain the same regression results as with the previous figure? Illus-
trate everything with R code and conceptual justifications if necessary.
Fig: 1
Fig: 2