Search for question
Question

10. (12 pts) Commercial banks receive a lot of applications for credit cards. Many of them get

rejected for many reasons, like high loan balances, low-income levels, or too many inquiries

on an individual's credit report, for example. Manually analyzing these applications is

mundane, error-prone, and time-consuming (and time is money!). Luckily, this task can be

automated with the power of machine learning and pretty much every commercial bank does

so nowadays. In this question, you are required to do basic data processing on credit approval

dataset step by step.

You are required to complete the following data visualization using matplotlib or seaborn

library.

a) Use Pandas library to read the file credit_approval_data.csv. Remove unusable attributes

from the dataset. Note that we will use the following 6 attributes: Age, Debt,

YearsEmployed, CreditScore, Income, ApprovalStatus. (1 pts)

b) Use Numpy or Pandas to compute the statistics (i.e., mean, standard deviation, minimum,

25% percentile, 50% percentile, 75% percentile, maximum) of 6 attributes (the same as

Q10 (a)). Then, show the boxplot of the 6 attributes. Note that you should plot the

original attributes before standardization. (3 pts)

c) Compute the Pearson correlation coefficient matrix of the same 6 attributes. You are

required to use NumPy or Pandas library to implement this. Then, plot the correlation

matrix that you have computed (note that you are required to set the attributes as the

labels of x and y axis and use color to represent the correlation coefficient. Plus, your

figure should also include a color bar.) (3 pts)

d) Standardize 5 attributes (the same attributes as Q10 (c) except ApprovalStatus). The

standardization means to rescale the feature such that its mean is 0 and its standard

deviation is 1. You are required to use NumPy or Pandas to implement it. (2 pts)

e) Update 5 attributes with your results in Q10 (d) and divide the dataset randomly into

training (80%) and testing (20%) sets. (1 pts)

1) Plot figures to compare the distribution of 5 attributes in training and testing set (the same

attributes in Q14 (e). (1 pts)

g) Please plot one figure (includes two sub-figures for training and testing set) for each

attribute. Note that you are recommended to use subplots function in Matplotlib to

implement this. (1 pts)

11. (12 pts) You are working as Assignment Filed Immigration Canada (CIC),

Fig: 1