Search for question
Question

Principles of Business Data Mining Project This group project offers you an opportunity to apply your data mining knowledge to real-life data and to mine managerially relevant insights. The objective of this assignment is for your team to implement the data mining process using real-life data that is of interest to you. Your project should be driven by a relevant and important question of business or social value. It is essential that the dataset you select have an output (also called target) variable. Note however, it is not enough to just have a target variable. Your dataset should also contain an adequate number of input variables which can help explain or predict the target variable. Data can be obtained from a publicly available source such as kaggle.com or through a real business (for which you need to have appropriate Non-Disclosure Agreements in place). You will need to set up an account to download datasets from kaggle which are usually in the csv format. Pay attention to the FIVE project deadlines: project proposal report, meeting to discuss project, project progress report, project presentation slides/presentation and final project report. All four reports/slides (one copy per team) -- the proposal, progress report, presentation slides and the final report -- should be submitted in Canvas. All late submissions will receive a zero. Each report should be a single Microsoft Word or pdf document with your group number and all group member names. Messy or hard-to-read reports will be penalized. You have to implement the following data mining techniques in your project: • • At least 2 data visualization techniques using Tableau to understand and draw conclusions on the data At least 2 prediction techniques to address your business or social question. I would recommend using a linear or logistic regression and classification trees since they are the most straightforward to translate to a relevant business question. Regardless of what data you will be mining, ensure that there is an appropriate match between the dataset you plan to mine and the data mining technique you plan to use and the business question. Deliverables: Project Proposal Report : ○ ○ Source of the dataset (e.g. kaggle.com) The proposal should address the following: A brief description of the data so you know what you are dealing with. You should include a list of all variables in the dataset. A short paragraph describing your objectives when mining this data. Explain what business problem/question, this project will address. What data visualization and prediction techniques you will use. Any pre-processing steps you think you need to take. 1 INSY5339 Dr. A.C. Sahoo Spring 2022 ○ Any initial results you expect or may have obtained. I strongly recommend that you run some of your data visualization techniques by the proposal due date and have a good view of what prediction techniques you will use. Project Progress Report: о ○ ○ At the beginning of this report, describe your business/social question and the data that support the addressing of this question. Data should be available and submitted as a separate file along with the proposal. The report should contain all the data visualization techniques (at least two) successfully completed and documented (it could change in the final report). Include your initial draft findings (it could change in the final report). Your report should contain at least one successful trial of the data prediction techniques (out of two techniques). You should report your draft findings (it could change in the final report). Project Presentation slides and presentation in the class ○ Your group will make a 15 minute presentation during class on December 1, 2021. Detailed guidelines will be presented prior to the final presentation. A copy of the presentation slides have to be submitted by the due date. ○ Every member should speak as part of the presentation. Professional quality presentation slides are required and should be updated based on feedback received. Final Project Report: ○ ○ This should be a professionally prepared report that addresses the following parts: cover page, executive summary, project motivation/background (business/social question), data description that supports addressing this question, data analysis using visualization and findings, your prediction models and findings, managerial or policy implications and conclusions, include all diagrams, graphs and tables to support your conclusions. Feel free to add any other sections if needed. What really matters is whether you successfully discovered useful knowledge from a dataset, and whether you presented it well to reader. Each report builds on the previous one. Feel free to reuse material in your earlier reports. Your Project Proposal Report submission should also contain the following completed table: How many observations in the dataset? How many binary/categorical variables? How many continuous variables? What is the outcome / target variable? 2 INSY5339 If binary or categorical: What percentage of the variables belong to each class. If continuous: What is the mean value of the target variable? Before doing any further processing, what would your prediction of the target variable be? 3 Dr. A.C. Sahoo Spring 2022