Search for question
Question

2. In PC Tech's product mix problem, assume there is another PC model, the VXP, that the company can produce in addition to Basics and XPs. Each VXP requires eight

hours for assembling, three hours for testing, $275 for component parts, and sells for $560. At most 50 VXPs can be sold. a. Modify the spreadsheet model to include this new product, and use Solver to find the optimal product mix. 4. Again continuing Problem 2, suppose that you want to force the optimal solution to be integers. Do this in Solver by adding a new constraint. Select the deci- sion variable cells for the left side of the constraint, and in the middle dropdown list, select the "int" op- tion. How does the optimal integer solution compare to the optimal noninteger solution in Problem 2? Are the decision variable cell values rounded versions of those in Problem 2? Is the objective value more or less than in Problem 2? 26. A furniture company manufactures desks and chairs. Each desk uses four units of wood, and each chair uses three units of wood. A desk contributes $250 to profit, and a chair contributes $145. Marketing restrictions require that the number of chairs produced be at least four times the number of desks produced. There are 2000 units of wood available. a. Use Solver to maximize the company's profit. 46. During each four-hour period, the Smalltown police force requires the following number of on-duty police officers: four from midnight to 4 A.M.; four from 4 A.M. to 8 A.M.; seven from 8 A.M. to noon; seven from noon to 4 P.M.; eight from 4 P.M. to 8 P.M.; and ten from 8 P.M. to midnight. Each police officer works two consecutive four-hour shifts. a. Determine how to minimize the number of police of- ficers needed to meet Smalltown's dailyrequirements. 66. United Steel manufactures two types of steel at three different steel mills. During a given month, each steel mill has 240 hours of blast furnace time available. Because of differences in the furnaces at each mill, the time and cost to produce a ton of steel differ for each mill, as listed in the file P04_66.xlsx. Each month, the company must manufacture at least 700 tons of steel 1 and 600 tons of steel 2. Determine how United Steel can minimize the cost of manufacturing the desired steel.


Most Viewed Questions Of Data Mining

Assignment #3: DBSCAN, OPTICS, and Clustering Evaluation 1. If Epsilon is 2 and minpoint is 2 (including the centroid itself), what are the clusters that DBScan would discover with the following 8 examples: A1=(2,10), A2=(2,5), A3=(8,4), A4=(5,8), A5=(7,5), A6=(6,4), A7=(1,2), A8=(4,9). Use the Euclidean distance. Draw the 10 by 10 space and illustrate the discovered clusters. What if Epsilon is increased to sqrt(10)? (30 pts)


Programming Assignment Explanation • Fortune Cookie Classifier¹ You will build a binary fortune cookie classifier. This classifier will be used to classify fortune cookie messages into two classes: messages that predict what will happen in the future (class 1) and messages that just contain a wise saying (class 0). For example, "Never go in against a Sicilian when death is on the line" would be a message in class 0. "You will get an A in Machine learning class" would be a message in class 1. Files Provided There are three sets of files. All words in these files are lower case and punctuation has been removed. 1) The training data: traindata.txt: This is the training data consisting of fortune cookie messages. trainlabels.txt: This file contains the class labels for the training data. 2) The testing data: testdata.txt: This is the testing data consisting of fortune cookie messages. testlabels.txt: This file contains the class labels for the testing data.


2. Use OPTICS algorithm to output the reachability distance and the cluster ordering for the dataset provided, starting from Instance 1. Use the following parameters for discovering the cluster ordering: minPts =2 and epsilon =2. Use epsilonprime =1.2 to generate clusters from the cluster ordering and their reachability distance. Don't forget to record the core distance of a data point if it has a dense neighborhood. You don't need to include the core distance in your result but you may need to use them in generating clusters. (45 pts) 2 16 14 12 10 015 05 20 06 04 021 026 016 027 022 025 019 023 09 024 07 018 08 011 070 030 029 028 012 013 014 2 01 0 03 0 2 4 8 10 12 14 16 017 Dataset visualization Below are the first few lines of the calculation. You need to complete the remaining lines and generate clusters based on the given epsilonprime value: Instance (X,Y) Reachability Distance Instance 1: (1,1) Undefined(or infinity) Instance 2: (0, 1) 1.0 Instance 3: (1, 0) 1.0 Instance 16: (5,9) Undefined Instance 13: (9,2) Undefined Instance 12: (8,2) 1


Please create a K-means Clustering and Hierarchical Clustering with the line of code provided. The line of code should include a merger of the excel files. The excel files will also be provided


3. Use F-measure and the Pairwise measures (TP, FN, FP, TN) to measure the agreement between a clustering result (C1, C2, C3) and the ground truth partitions (T1, T2, T3) as shown below. Show details of your calculation. (25 pts) Ground Truth T, TT, Cluster C, CC3


Discussion - Data Mining, Text Mining, and Sentiment Analysis Explain the relationship between data mining, text mining, and sentiment analysis. Provide situations where you would use each of the three techniques. Respond to the following in a minimum of 230 words:


Q4. You were just hired by MetaMind. MetaMind is expanding rapidly, and you decide to use your machine learning skills to assist them in their attempts to hire the best. To do so, you have the following available to you for each candidate i in the pool of candidates Z: (i) Their GPA, (ii) Whether they took Data Mining course and achieved an A, (iii) Whether they took Algorithms course and achieved an A, (iv) Whether they have a job offer from Google, (v) Whether they have a job offer from Facebook, (vi) The number of misspelled words on their resume. You decide to represent each candidate i € I by a corresponding 6-dimensional feature vector f(z)). You believe that if you just knew the right weight vector w R you could reliably predict the quality of a candidate i by computing w- f(z). To determine w your boss lets you sample pairs of candidates from the pool. For a pair of candidates (k, 1) you can have them face off in a "DataMining-fight." The result is score (k > 1), which tells you that candidate k is at least score (k> 1) better than candidate 1. Note that the score will be negative when I is a better candidate than k. Assume you collected scores for a set of pairs of candidates P. Describe how you could use a perceptron based algorithm to learn the weight vector w. Make sure to describe the basic intuition; how the weight updates will be done; and pseudo-code for the entire algorithm.


1. We will use Flower classification dataset a. https://www.kaggle.com/competitions/tpu-getting-started 2. Your goal is improving the average accuracy of classification. a. You SHOULD use google collab as the main computing. (Using Kaggle is okay) b. You SHOULD create a github reposit for the source code i. Put a readme file for execution c. You SHOULD explain your source code in the BLOG. d. Try experimenting with various hyperparameters i. Network topology 1. Number of neurons per layer (for example, 100 x 200 x 100, 200 x 300 x 100...) 2. number of layers (For example, 2 vs 3 vs 4 ... ) 3. shape of conv2d ii. While doing experiments, make sure you record your performance such that you can create a bar chart of the performance iii. An additional graph idea might be a training time comparison Do some research on ideas for improving this. iv. e. You can refer to the code or tutorial internet. But the main question you have to answer is what improvement you made over the existing reference. i. Make sure it is very clear which lines of code is yours or not. When you copy the source code, add a reference. 3. Documentation is the half of your work. Write a good blog post for your work and step-by-step how to guide. a. A good example is https://jalammar.github.io/visual-interactive-guide-basics-neural-networks/ 4. Add a reference a. You add a citation number in the contents and put the reference in the separate reference section


Q2. (10 points) Consider the following setting. You are provided with n training examples: (T₁, 9₁, h₁), (2, 92, h₂),, (In, Yn, hn), where z, is the input example, y, is the class label (+1 or -1), and h₁> 0 is the importance weight of the example. The teacher gave you some additional information by specifying the importance of each training example. How will you modify the perceptron algorithm to be able to leverage this extra information? Please justify your answer.


1. Use the insurance fraud dataset. Consider the data quality issues (e.g., missing data) and preprocess the data. Split the data into a 10% train and 90% test set using random_state = 1. Create a decision tree with a max depth of 3 using a gini measure. Print the accuracy on the test set and the tree. Is this a good approach? Why or why not? 2. Create a decision tree on the same data with max depth of 3 and an entropy measure. Does the accuracy change? Does the tree change? Discuss which measure you think is better. 3. Now split the data into 70% train and 30% test using random_state = 1. Redo 2 and 3. Have the trees and accuracy changed? Are the trees more or less similar now? Discuss which split you think is better and why. 4. Evaluate how the accuracy changes with the depth of the tree with the 70-30 data. Look at the accuracy for a max depth of 1, 2, 3, ... 10, 15, 20. Plot the curve of changing. Do you see underfitting? Do you see overfitting? 5. What variable provides the most information gain in the insurance fraud data (for the 70-30 split)? 6. Decision trees are a "white box" method. What do you observe about the insurance fraud data using decision trees?