instructions data mining the purpose of this assignment is to demonstr
Search for question
Question
Instructions Data Mining
The purpose of this assignment is to demonstrate understanding and application of the k-Nearest
Neighbor algorithm associated with classification.
You have been asked to build a classification model to help predict machine failure. Using the "Failure
Rate" dataset, build a classification model using the k-Nearest Neighbor technique in KNIME. Then,
using the model, predict whether machines will failure for 50 records of input data.
Training/Test Model
Use "hours_run" and "avg_hours_between_maint" as the input variables. Use "failure" as the target
variable (note that 0:
= no failure, and 1
=
failure).
For the Excel Reader node, exclude the "model_version" field from the data import operation.
Use a Normalizer node to normalize the "hours_run" and "avg_hours_between_maint" fields to be
between 0 and 1.
Use a Partition node with an 70/20 partition (i.e., 70% Training; 20% Test) for records 1-300.
Then, create the model using the Training data from the Partition Node. Use the k-Nearest Neighbor
node.
Attach a Scorer node to the k-Nearest Neighbor node in order to evaluate the model's accuracy (note
that this node only evaluates the results of the n=90 Test data). Attach a Table node to the
second output port of the Scorer node. Take note of the overall accuracy value. Run the model for k
values from 3 to 6. Select the base k value based on the highest accuracy value.
Predictions on the n=50 Data
After running the Training/Test model and ascertaining the optimal k value, create another workflow (in
the same KNIME file) to predict machine failure for the rest of the records on the dataset, i.e., for
records 301-350. Setup the workflow as shown in the attached "K-Nearest Neighbor Algorithm,
Prediction Workflow" document.
Attach a Table node to the k-Nearest Neighbor node to view the predicted results. Sort by the
appropriate column to show those records whose machines are predicted to fail at the top of the list. Also, attach an Excel Writer node to the k-Nearest Neighbor node to export the prediction results in an
Excel file.
In a 250-word document, provide the following information. Assume you are providing this information
to an audience that has limited knowledge of data mining concepts.
1. Summarize your approach to the problem.
2. Clearly state the optimal k value for the model.
3. Screenshot the results from the second output port of the Scorer node.
4.
Screenshot the predicted results, sorted by appropriate column to show at the top of the list
those records whose machines are predicted to fail.
5.
Include a conclusion based on the results of the analysis. Specify which machines in records
301-350 are predicted to fail. Speculate on why these particular records are predicted to fail.
Note that you are required to submit the completed KNIME *.knwf file to your instructor. Specifically,
export your KNIME model to a KNIME workflow file. To perform this task in KNIME, ensure that your
KNIME model is active (i.e., displayed). Then, go to File -> Export KNIME Workflow. In the "Destination
workflow file name (.knwf)" area, browse to a specific location on your computer. Click "Save" and then
click "Finish."/n Instructions
Using specified data files, chapter example files, and templates from the "Topic 4 Student Data,
Template, and Example Files" resource, complete Chapter 13 Problems 20, 26, 28, 50, and 52 in the
textbook. Use MAPE (mean absolute percentage error) to evaluate the forecasting performance for each
problem. Use the Palisade Decision Tools Excel software to complete these problems where requested
and applicable.
To receive full credit on the assignment, ensure that the Excel files include the associated cell formulas
if formulas are used or Excel-generated output based on the nature of the analyses.
Place each problem in its own Excel file. Ensure that your first and last name are in your Excel filenames.
THEN
The purpose of this assignment is to conduct analyses and present your findings and supporting
documentation in a professional PowerPoint presentation designed to summarize the information for
senior leadership within the organization.
Assume that you are delivering this presentation to the senior leadership in an organization.
Therefore, please be sure to create a professional presentation. Begin by reading the "13.2 Forecasting
Overhead at Wagner Printers" case, found at the end of Chapter 13 in the textbook. For the case, you
will perform a multiple regression analysis. You can perform additional analyses on each data set to
gain greater insight into the data set. You must be able to justify each of the approaches and methods
you selected for analyzing the data sets. Use the Palisade DecisionTools Excel software to perform the
regression analysis. Evaluate the regression model by performing and responding to all parts of the
"Multiple Regression Analysis Checklist."
Use the "BIT-435-RS-Predictive Case Template and Support Files" to complete the assignment and
submit answers. Prior to submission, rename the file to include your first name and last name in the
filename.
You will submit the completed template file along with the PowerPoint presentation.
Results of each analysis must be included in your presentation. The use of graphs, charts and
supporting data, and spreadsheets is encouraged. Interpret the results of each analysis and draw general conclusions from the results. Make recommendations for the organization and address the
organizational challenges that may be encountered based upon your recommendations. The
PowerPoint presentation should include the following information:
1. Introduction and case background.
2. Objectives for each analysis.
3.
Approach or method of analysis for each data set and justification for selecting the approach or
method.
Results of each analysis.
4.
5. Supporting graphs, charts, data, and spreadsheets for each analysis.
6. Interpretation of the results for each analysis.
7.
General conclusion of each analysis and recommendation to the organization, including
addressing organizational challenges that may be encountered based upon the
recommendation.
8. In the Speaker Notes section of each slide, include your talking points. This information should
align to the results of your analyses and be supported in the accompanying Excel files. In
addition to your PowerPoint file, submit the completed template file that contains the
supporting Excel files showing all data analyses performed. Submission of your Excel files is
required to obtain full credit for this assignment./n 50. The file P13_50.xlsx contains five years of monthly
data for a company. The first variable is Time (1-60).
The second variable, Sales1, has data on sales of a
product. Note that Sales1 increases linearly through-
out the period, with only a minor amount of noise.
(The third variable, Sales2, will be used in the next
problem.) For this problem, use the Sales1 variable to
see how the following forecasting methods are able to
track a linear trend.
a. Forecast this series with the moving averages
method with various spans such as 3, 6, and 12.
What can you conclude?
b. Forecast this series with simple exponential
smoothing with various smoothing constants such
as 0.1, 0.3, 0.5, and 0.7. What can you conclude?
c. Repeat part b with Holt's method, again for vari-
ous smoothing constants. Can you do much better
than in parts a and b? 52. The file P13_52.xlsx contains data on a motel chain's
revenue and advertising.
a. Use these data and multiple regression to make pre-
dictions of the motel chain's revenues during the next
four quarters. Assume that advertisingduring each
of the next four quarters is $50,000. (Hint: Try using
advertising, lagged by one period, as an explanatory
variable. See the Problem 60 for an explanation of a
lagged variable. Also, use dummy variables for the
quarters to account for possible seasonality.)
b. Use simple exponential smoothing to make predic-
tions for the motel chain's revenues during the
next four quarters. Experiment with the smoothing
constant.
c. Use Holt's method to make forecasts for the motel
chain's revenues during the next four quarters.
Experiment with the smoothing constants.
d. Use Winters' method to determine predictions for
the motel chain's revenues during the next four
quarters. Experiment with the smoothing constants.
e. Which forecasts from parts a to d would you
expect to be the most reliable? 20. The file P13_20.xlsx contains the monthly sales
of iPod cases at an electronics store for a two-year
period. Use the moving averages method, with spans
of your choice, to forecast sales for the next six
months. Does this method appear to track sales well?
If not, what might be the reason? 26. The file P13_26.xlsx contains the monthly number of
airline tickets sold by the CareFree Travel Agency.
a. Create a time series chart of the data. Based on
what you see, which of the exponential smoothing
models do you think will provide the best forecast-
ing model? Why?
b.
Use simple exponential smoothing to forecast these
data, using a smoothing constant of 0.1.
c. Repeat part b, but search for the smoothing con-
stant that makes RMSE as small as possible. Does
it make much of an improvement over the model
in part b? 28. The file P13_28.xlsx contains monthly retail sales of
U.S. liquor stores.
a. Is seasonality present in these data? If so, charac-
terize the seasonality pattern.
b. Use Winters' method to forecast this series with
smoothing constants a = B = 0.1 and y = 0.3.
Does the forecast series seem to track the seasonal
pattern well? What are your forecasts for the next
12 months?