Search for question
Question

2. Download the data file P2PLendingdata.xlsx and read the document LC-case- student-handout.pdf. Consider the P2P lending example with the following data struc- ture: Data instances: 50000 Features: id, loan_amnt, funded_amnt, term,

int_rate, grade, emp_length, home_ownership, annual_inc, verification_status, purpose, open_acc, pub_rec, fico_range_high, fico_range_low, revol_bal, revol_util, total_pymnt, recoveries, pub_rec, delinq-2yrs, dti (total: 22 features) Target: outcome The description of the variables appears in P2PLendingdatadescription.pdf. A file with the full description of all the variables Lending Club collects for each loan is also posted on Canvas just for your information - giving you an idea how extensive the list of features is. The data you will be working with involves a smaller set of features to reduce the computational complexity of this assignment. 2/n2.a 3 points Use the Orange software to compute a decision tree model for classi- fication with the following specifications: Training sample using 75 % of the data randomly selected as training/estimation sample Parameters: at least two instances in leaves, do not split subsets smaller than 5, maximum depth 6; Splitting: Stop splitting when majority reaches 95%. Use the remaining 25 % of the sample to compute predictions and appraise those with a Confusion Matrix. 2.b 2 points Can you explain why you have such a successful model?

Fig: 1

Fig: 2