Search for question
Question

/n MAT1008 HOMEWORK The aim of this homework is to predict customer churn based on historical customer data by using machine learning methods. The homework will adhere to a conventional workflow commonly seen in machine learning projects, which includes data preprocessing, model building, evaluation, reporting and presenting your work. DATA SET GENERATION First, generate your INDIVIDUAL customer data using your Student ID. The data consists of 1000 data instances and 10 features, each storing a different type of information about the customers. CreditScore: The customer's credit score at the time of data collection. Location: The customer's country or region. Age: The customer's age. EducationLevel: The customer's education level. CustomerName: The customer's name. Tenure: The number of years the customer has been with the bank. Balance: The customer's account balance. IsActive Member: Indicates whether the customer is an active member (yes) or not (no). EstimatedSalary: The customer's estimated salary in thousand dollars. Churn: The target variable, indicating whether the customer has churned (yes) or not (no). # Import libraries from faker import Faker import random import pandas as pd fake Faker () #defining a function to generate a data instance def generate_instance(): return { 'CreditScore': random.gauss (550,100), 'Tenure': random.randint(0,15), 'EducationLevel': random.choice (['HighSchool', 'College','Graduat e']), 'Balance': random. gauss (125000, 1000), 'EstimatedSalary': random.uniform (10,25000), 'Age': random.randint (18,90), 'IsActiveMember':random.choice (['yes','no']), 'Location': fake.city(), 'CustomerName': fake.name (), 'Churn':random. choice (['yes', 'no']) } #Customizing your data set random.seed(*) %23plug your student ID in * Faker.seed (*) #plug your student ID in * #Generating a data set consisting of 1000 instances values = [generate_instance() for in range (1000)] #Converting the data set to dataframe. data=pd.DataFrame (values) REPORT Once you generated the data, apply AT LEAST TWO machine learning methods covered in the lectures to predict which customer(s) to Churn. Compare and discuss the results in a Report. The Report may include the following parts: Problem Definition (describing the problem at hand and the aim of the work) Solution Methodology (a description of your machine learning models, why you choose them, how you apply them, the results of your experiments, including numbers, visualizations, and interpretations as appropriate) Comparison of the models (describing how you measure goodness of the models and how you get final decision of your work) Conclusion (summarizing your work with a result) PRESENTATION Present your work in the class with a 5-minute talk in the REVERSE ORDER of submission. That is, if you submit it early, then you will present in the 2nd round; if you submit it late, then you will present in the 1st round. I will announce the presentation list when submission is closed. Due Dates Report & Presentation Submission (via Itslearning) Presentation May 15th, 2024 (1st round) May 16th/17th, 2024 Important Notes (2nd round) May 23rd/24th, 2024 • You need to submit BOTH Report and Presentation Slides. • LATE SUBMISSION is not allowed. • Submitting but NOT attending the Presentation leads ZERO POINT for the overall homework score. • There is NO MAKE-UP for the Presentations.