Search for question
Question

Instructions Data Mining The purpose of this assignment is to demonstrate understanding and application of the k-Nearest Neighbor algorithm associated with classification. You have been asked to build a classification model to help predict machine failure. Using the "Failure Rate" dataset, build a classification model using the k-Nearest Neighbor technique in KNIME. Then, using the model, predict whether machines will failure for 50 records of input data. Training/Test Model Use "hours_run" and "avg_hours_between_maint" as the input variables. Use "failure" as the target variable (note that 0: = no failure, and 1 = failure). For the Excel Reader node, exclude the "model_version" field from the data import operation. Use a Normalizer node to normalize the "hours_run" and "avg_hours_between_maint" fields to be between 0 and 1. Use a Partition node with an 70/20 partition (i.e., 70% Training; 20% Test) for records 1-300. Then, create the model using the Training data from the Partition Node. Use the k-Nearest Neighbor node. Attach a Scorer node to the k-Nearest Neighbor node in order to evaluate the model's accuracy (note that this node only evaluates the results of the n=90 Test data). Attach a Table node to the second output port of the Scorer node. Take note of the overall accuracy value. Run the model for k values from 3 to 6. Select the base k value based on the highest accuracy value. Predictions on the n=50 Data After running the Training/Test model and ascertaining the optimal k value, create another workflow (in the same KNIME file) to predict machine failure for the rest of the records on the dataset, i.e., for records 301-350. Setup the workflow as shown in the attached "K-Nearest Neighbor Algorithm, Prediction Workflow" document. Attach a Table node to the k-Nearest Neighbor node to view the predicted results. Sort by the appropriate column to show those records whose machines are predicted to fail at the top of the list. Also, attach an Excel Writer node to the k-Nearest Neighbor node to export the prediction results in an Excel file. In a 250-word document, provide the following information. Assume you are providing this information to an audience that has limited knowledge of data mining concepts. 1. Summarize your approach to the problem. 2. Clearly state the optimal k value for the model. 3. Screenshot the results from the second output port of the Scorer node. 4. Screenshot the predicted results, sorted by appropriate column to show at the top of the list those records whose machines are predicted to fail. 5. Include a conclusion based on the results of the analysis. Specify which machines in records 301-350 are predicted to fail. Speculate on why these particular records are predicted to fail. Note that you are required to submit the completed KNIME *.knwf file to your instructor. Specifically, export your KNIME model to a KNIME workflow file. To perform this task in KNIME, ensure that your KNIME model is active (i.e., displayed). Then, go to File -> Export KNIME Workflow. In the "Destination workflow file name (.knwf)" area, browse to a specific location on your computer. Click "Save" and then click "Finish."/n Instructions Using specified data files, chapter example files, and templates from the "Topic 4 Student Data, Template, and Example Files" resource, complete Chapter 13 Problems 20, 26, 28, 50, and 52 in the textbook. Use MAPE (mean absolute percentage error) to evaluate the forecasting performance for each problem. Use the Palisade Decision Tools Excel software to complete these problems where requested and applicable. To receive full credit on the assignment, ensure that the Excel files include the associated cell formulas if formulas are used or Excel-generated output based on the nature of the analyses. Place each problem in its own Excel file. Ensure that your first and last name are in your Excel filenames. THEN The purpose of this assignment is to conduct analyses and present your findings and supporting documentation in a professional PowerPoint presentation designed to summarize the information for senior leadership within the organization. Assume that you are delivering this presentation to the senior leadership in an organization. Therefore, please be sure to create a professional presentation. Begin by reading the "13.2 Forecasting Overhead at Wagner Printers" case, found at the end of Chapter 13 in the textbook. For the case, you will perform a multiple regression analysis. You can perform additional analyses on each data set to gain greater insight into the data set. You must be able to justify each of the approaches and methods you selected for analyzing the data sets. Use the Palisade DecisionTools Excel software to perform the regression analysis. Evaluate the regression model by performing and responding to all parts of the "Multiple Regression Analysis Checklist." Use the "BIT-435-RS-Predictive Case Template and Support Files" to complete the assignment and submit answers. Prior to submission, rename the file to include your first name and last name in the filename. You will submit the completed template file along with the PowerPoint presentation. Results of each analysis must be included in your presentation. The use of graphs, charts and supporting data, and spreadsheets is encouraged. Interpret the results of each analysis and draw general conclusions from the results. Make recommendations for the organization and address the organizational challenges that may be encountered based upon your recommendations. The PowerPoint presentation should include the following information: 1. Introduction and case background. 2. Objectives for each analysis. 3. Approach or method of analysis for each data set and justification for selecting the approach or method. Results of each analysis. 4. 5. Supporting graphs, charts, data, and spreadsheets for each analysis. 6. Interpretation of the results for each analysis. 7. General conclusion of each analysis and recommendation to the organization, including addressing organizational challenges that may be encountered based upon the recommendation. 8. In the Speaker Notes section of each slide, include your talking points. This information should align to the results of your analyses and be supported in the accompanying Excel files. In addition to your PowerPoint file, submit the completed template file that contains the supporting Excel files showing all data analyses performed. Submission of your Excel files is required to obtain full credit for this assignment./n 50. The file P13_50.xlsx contains five years of monthly data for a company. The first variable is Time (1-60). The second variable, Sales1, has data on sales of a product. Note that Sales1 increases linearly through- out the period, with only a minor amount of noise. (The third variable, Sales2, will be used in the next problem.) For this problem, use the Sales1 variable to see how the following forecasting methods are able to track a linear trend. a. Forecast this series with the moving averages method with various spans such as 3, 6, and 12. What can you conclude? b. Forecast this series with simple exponential smoothing with various smoothing constants such as 0.1, 0.3, 0.5, and 0.7. What can you conclude? c. Repeat part b with Holt's method, again for vari- ous smoothing constants. Can you do much better than in parts a and b? 52. The file P13_52.xlsx contains data on a motel chain's revenue and advertising. a. Use these data and multiple regression to make pre- dictions of the motel chain's revenues during the next four quarters. Assume that advertisingduring each of the next four quarters is $50,000. (Hint: Try using advertising, lagged by one period, as an explanatory variable. See the Problem 60 for an explanation of a lagged variable. Also, use dummy variables for the quarters to account for possible seasonality.) b. Use simple exponential smoothing to make predic- tions for the motel chain's revenues during the next four quarters. Experiment with the smoothing constant. c. Use Holt's method to make forecasts for the motel chain's revenues during the next four quarters. Experiment with the smoothing constants. d. Use Winters' method to determine predictions for the motel chain's revenues during the next four quarters. Experiment with the smoothing constants. e. Which forecasts from parts a to d would you expect to be the most reliable? 20. The file P13_20.xlsx contains the monthly sales of iPod cases at an electronics store for a two-year period. Use the moving averages method, with spans of your choice, to forecast sales for the next six months. Does this method appear to track sales well? If not, what might be the reason? 26. The file P13_26.xlsx contains the monthly number of airline tickets sold by the CareFree Travel Agency. a. Create a time series chart of the data. Based on what you see, which of the exponential smoothing models do you think will provide the best forecast- ing model? Why? b. Use simple exponential smoothing to forecast these data, using a smoothing constant of 0.1. c. Repeat part b, but search for the smoothing con- stant that makes RMSE as small as possible. Does it make much of an improvement over the model in part b? 28. The file P13_28.xlsx contains monthly retail sales of U.S. liquor stores. a. Is seasonality present in these data? If so, charac- terize the seasonality pattern. b. Use Winters' method to forecast this series with smoothing constants a = B = 0.1 and y = 0.3. Does the forecast series seem to track the seasonal pattern well? What are your forecasts for the next 12 months?