principles of business data mining project this group project offers y
Search for question
Question
Principles of Business Data Mining
Project
This group project offers you an opportunity to apply your data mining knowledge to real-life data and to
mine managerially relevant insights. The objective of this assignment is for your team to implement the
data mining process using real-life data that is of interest to you. Your project should be driven by a
relevant and important question of business or social value. It is essential that the dataset you select have
an output (also called target) variable. Note however, it is not enough to just have a target variable. Your
dataset should also contain an adequate number of input variables which can help explain or predict the
target variable.
Data can be obtained from a publicly available source such as kaggle.com or through a real business (for
which you need to have appropriate Non-Disclosure Agreements in place). You will need to set up an
account to download datasets from kaggle which are usually in the csv format.
Pay attention to the FIVE project deadlines: project proposal report, meeting to discuss
project, project progress report, project presentation slides/presentation and final project report.
All four reports/slides (one copy per team) -- the proposal, progress report, presentation slides and the
final report -- should be submitted in Canvas. All late submissions will receive a zero. Each report should
be a single Microsoft Word or pdf document with your group number and all group member names.
Messy or hard-to-read reports will be penalized.
You have to implement the following data mining techniques in your project:
•
•
At least 2 data visualization techniques using Tableau to understand and draw conclusions on
the data
At least 2 prediction techniques to address your business or social question. I would recommend
using a linear or logistic regression and classification trees since they are the most
straightforward to translate to a relevant business question.
Regardless of what data you will be mining, ensure that there is an appropriate match between the
dataset you plan to mine and the data mining technique you plan to use and the business question.
Deliverables:
Project Proposal Report :
○
○
Source of the dataset (e.g. kaggle.com)
The proposal should address the following:
A brief description of the data so you know what you are dealing with. You should include
a list of all variables in the dataset.
A short paragraph describing your objectives when mining this data. Explain what
business problem/question, this project will address.
What data visualization and prediction techniques you will use. Any pre-processing steps
you think you need to take.
1 INSY5339
Dr. A.C. Sahoo
Spring 2022
○ Any initial results you expect or may have obtained. I strongly recommend that you run
some of your data visualization techniques by the proposal due date and have a good
view of what prediction techniques you will use.
Project Progress Report:
о
○
○
At the beginning of this report, describe your business/social question and the data that
support the addressing of this question.
Data should be available and submitted as a separate file along with the proposal.
The report should contain all the data visualization techniques (at least two) successfully
completed and documented (it could change in the final report). Include your initial draft
findings (it could change in the final report).
Your report should contain at least one successful trial of the data prediction techniques
(out of two techniques). You should report your draft findings (it could change in the final
report).
Project Presentation slides and presentation in the class
○ Your group will make a 15 minute presentation during class on December 1, 2021.
Detailed guidelines will be presented prior to the final presentation. A copy of the
presentation slides have to be submitted by the due date.
○
Every member should speak as part of the presentation. Professional quality presentation
slides are required and should be updated based on feedback received.
Final Project Report:
○
○
This should be a professionally prepared report that addresses the following parts: cover
page, executive summary, project motivation/background (business/social question),
data description that supports addressing this question, data analysis using visualization
and findings, your prediction models and findings, managerial or policy implications and
conclusions, include all diagrams, graphs and tables to support your conclusions.
Feel free to add any other sections if needed. What really matters is whether you
successfully discovered useful knowledge from a dataset, and whether you presented it
well to reader.
Each report builds on the previous one. Feel free to reuse material in your earlier reports.
Your Project Proposal Report submission should also contain the following completed table:
How many observations in the dataset?
How many binary/categorical variables?
How many continuous variables?
What is the outcome / target variable?
2 INSY5339
If binary or categorical: What percentage of the
variables belong to each class.
If continuous: What is the mean value of the
target variable?
Before doing any further processing, what would
your prediction of the target variable be?
3
Dr. A.C. Sahoo
Spring 2022