tutorbin

data mining homework help

Boost your journey with 24/7 access to skilled experts, offering unmatched data mining homework help

tutorbin

Trusted by 1.1 M+ Happy Students

Recently Asked data mining Questions

Expert help when you need it
  • Q1:Use the dataset for Airplanes, Motorbikes, Schooners your goal is improving the average accuracy of classificationSee Answer
  • Q2:Programming Assignment Explanation • Fortune Cookie Classifier¹ You will build a binary fortune cookie classifier. This classifier will be used to classify fortune cookie messages into two classes: messages that predict what will happen in the future (class 1) and messages that just contain a wise saying (class 0). For example, "Never go in against a Sicilian when death is on the line" would be a message in class 0. "You will get an A in Machine learning class" would be a message in class 1. Files Provided There are three sets of files. All words in these files are lower case and punctuation has been removed. 1) The training data: traindata.txt: This is the training data consisting of fortune cookie messages. trainlabels.txt: This file contains the class labels for the training data. 2) The testing data: testdata.txt: This is the testing data consisting of fortune cookie messages. testlabels.txt: This file contains the class labels for the testing data.See Answer
  • Q3:Q1. (10 points) Answer the following with a yes or no along with proper justification. a. Is the decision boundary of voted perceptron linear? b. Is the decision boundary of averaged perceptron linear?See Answer
  • Q4:Q2. (10 points) Consider the following setting. You are provided with n training examples: (T₁, 9₁, h₁), (2, 92, h₂),, (In, Yn, hn), where z, is the input example, y, is the class label (+1 or -1), and h₁> 0 is the importance weight of the example. The teacher gave you some additional information by specifying the importance of each training example. How will you modify the perceptron algorithm to be able to leverage this extra information? Please justify your answer.See Answer
  • Q5:Q3. (10 points) Consider the following setting. You are provided with n training examples: (₁, ₁), (2, 2), (In, Yn), where zi is the input example, and y, is the class label (+1 or -1). However, the training data is highly imbalanced (say 90% of the examples are negative and 10% of the examples are positive) and we care more about the accuracy of positive examples. How will you modify the perceptron algorithm to solve this learning problem? Please justify your answer.See Answer
  • Q6:Q4. You were just hired by MetaMind. MetaMind is expanding rapidly, and you decide to use your machine learning skills to assist them in their attempts to hire the best. To do so, you have the following available to you for each candidate i in the pool of candidates Z: (i) Their GPA, (ii) Whether they took Data Mining course and achieved an A, (iii) Whether they took Algorithms course and achieved an A, (iv) Whether they have a job offer from Google, (v) Whether they have a job offer from Facebook, (vi) The number of misspelled words on their resume. You decide to represent each candidate i € I by a corresponding 6-dimensional feature vector f(z)). You believe that if you just knew the right weight vector w R you could reliably predict the quality of a candidate i by computing w- f(z). To determine w your boss lets you sample pairs of candidates from the pool. For a pair of candidates (k, 1) you can have them face off in a "DataMining-fight." The result is score (k > 1), which tells you that candidate k is at least score (k> 1) better than candidate 1. Note that the score will be negative when I is a better candidate than k. Assume you collected scores for a set of pairs of candidates P. Describe how you could use a perceptron based algorithm to learn the weight vector w. Make sure to describe the basic intuition; how the weight updates will be done; and pseudo-code for the entire algorithm.See Answer
  • Q7:Please create a K-means Clustering and Hierarchical Clustering with the line of code provided. The line of code should include a merger of the excel files. The excel files will also be provided See Answer
  • Q8:Discussion - Data Mining, Text Mining, and Sentiment Analysis Explain the relationship between data mining, text mining, and sentiment analysis. Provide situations where you would use each of the three techniques. Respond to the following in a minimum of 230 words:See Answer
  • Q9:Assignment #3: DBSCAN, OPTICS, and Clustering Evaluation 1. If Epsilon is 2 and minpoint is 2 (including the centroid itself), what are the clusters that DBScan would discover with the following 8 examples: A1=(2,10), A2=(2,5), A3=(8,4), A4=(5,8), A5=(7,5), A6=(6,4), A7=(1,2), A8=(4,9). Use the Euclidean distance. Draw the 10 by 10 space and illustrate the discovered clusters. What if Epsilon is increased to sqrt(10)? (30 pts)See Answer
  • Q10:2. Use OPTICS algorithm to output the reachability distance and the cluster ordering for the dataset provided, starting from Instance 1. Use the following parameters for discovering the cluster ordering: minPts =2 and epsilon =2. Use epsilonprime =1.2 to generate clusters from the cluster ordering and their reachability distance. Don't forget to record the core distance of a data point if it has a dense neighborhood. You don't need to include the core distance in your result but you may need to use them in generating clusters. (45 pts) 10 2 01 012 0²0 OFF OF 013 02 0¹4 10 OF ON 04 off Dataset visualization 8 07 ON % 06 OD 04 ON 040 of 017 Below are the first few lines of the calculation. You need to complete the remaining lines and generate clusters based on the given epsilonprime value: Instance (X,Y) Reachability Distance Instance 1: (1, 1) Undefined (or infinity) Instance 2: (0, 1) 1.0 Instance 3: (1, 0) 1.0 Instance 16: (5,9) Undefined/nInstance 13: (9, 2) Instance 12: (8, 2) *** Undefined 1See Answer
  • Q11:3. Use F-measure and the Pairwise measures (TP, FN, FP, TN) to measure the agreement between a clustering result (C1, C2, C3) and the ground truth partitions (T1, T2, T3) as shown below. Show details of your calculation. (25 pts) Ground Truth T₂ Cluster C₁ T₂ T₂ C₂C₂See Answer
  • Q12:1. We will use Flower classification dataset a. https://www.kaggle.com/competitions/tpu-getting-started 2. Your goal is improving the average accuracy of classification. a. You SHOULD use google collab as the main computing. (Using Kaggle is okay) b. You SHOULD create a github reposit for the source code i. Put a readme file for execution c. You SHOULD explain your source code in the BLOG. d. Try experimenting with various hyperparameters i. Network topology 1. Number of neurons per layer (for example, 100 x 200 x 100, 200 x 300 x 100...) 2. number of layers (For example, 2 vs 3 vs 4 ... ) 3. shape of conv2d ii. While doing experiments, make sure you record your performance such that you can create a bar chart of the performance iii. An additional graph idea might be a training time comparison Do some research on ideas for improving this. iv. e. You can refer to the code or tutorial internet. But the main question you have to answer is what improvement you made over the existing reference. i. Make sure it is very clear which lines of code is yours or not. When you copy the source code, add a reference. 3. Documentation is the half of your work. Write a good blog post for your work and step-by-step how to guide. a. A good example is https://jalammar.github.io/visual-interactive-guide-basics-neural-networks/ 4. Add a reference a. You add a citation number in the contents and put the reference in the separate reference sectionSee Answer
  • Q13:This tutorial will guide you how to do homework in this course. 1. Goto https://www.kaggle.com/c/titanic and follow walkthrough as https://www.kaggle.com/alexisbcook/titanic-tutorial B 2. Submit your result to Kaggle challenge. 3. Post jupyter notebook to your homepage as blog post. A good example of blog post is. https://jalammar.github.io/visual-interactive-guide-basics-neural-networks/ 4. Submit your homepage link and screenshot pdf in the canvas. 5. Doing 1-4 will give you 8 points. To get additional 2 points, create a section as "Contribution" and try to improve the performance. I expect one or two paragraph minimum (the longer the better). Show the original score and improved score.See Answer
  • Q14:1. Use the insurance fraud dataset. Consider the data quality issues (e.g., missing data) and preprocess the data. Split the data into a 10% train and 90% test set using random_state = 1. Create a decision tree with a max depth of 3 using a gini measure. Print the accuracy on the test set and the tree. Is this a good approach? Why or why not? 2. Create a decision tree on the same data with max depth of 3 and an entropy measure. Does the accuracy change? Does the tree change? Discuss which measure you think is better. 3. Now split the data into 70% train and 30% test using random_state = 1. Redo 2 and 3. Have the trees and accuracy changed? Are the trees more or less similar now? Discuss which split you think is better and why. 4. Evaluate how the accuracy changes with the depth of the tree with the 70-30 data. Look at the accuracy for a max depth of 1, 2, 3, ... 10, 15, 20. Plot the curve of changing. Do you see underfitting? Do you see overfitting? 5. What variable provides the most information gain in the insurance fraud data (for the 70-30 split)? 6. Decision trees are a "white box" method. What do you observe about the insurance fraud data using decision trees?See Answer
  • Q15:You are required to write a 1 page proposal for your project as a pdf. Your proposal must include the following pieces of information: 1. Data Mining Task: What is your data mining task? This task could be a series of exploratory questions that you want to investigate or analyze. What is your motivation behind choosing this task for your project? 2. Dataset: What is the source of your data? Provide a link to your data source if you acquired it online. 3. Methodology: How will you solve the data mining task? You should have some idea of the algorithms or software tools you plan to investigate. Please feel free to use existing data mining and machine learning tool kits (e.g., Weka, Scikit-Learn) as needed for your project. 4. Final product: What will be the outcome of this project? How will you measure the success of your course project? Will this project help you explore or learn something new?See Answer

TutorBin Testimonials

I got my Data Mining homework done on time. My assignment is proofread and edited by professionals. Got zero plagiarism as experts developed my assignment from scratch. Feel relieved and super excited.

Joey Dip

I found TutorBin Data Mining homework help when I was struggling with complex concepts. Experts provided step-wise explanations and examples to help me understand concepts clearly.

Rick Jordon

TutorBin experts resolve your doubts without making you wait for long. Their experts are responsive & available 24/7 whenever you need Data Mining subject guidance.

Andrea Jacobs

I trust TutorBin for assisting me in completing Data Mining assignments with quality and 100% accuracy. Experts are polite, listen to my problems, and have extensive experience in their domain.

Lilian King

I got my Data Mining homework done on time. My assignment is proofread and edited by professionals. Got zero plagiarism as experts developed my assignment from scratch. Feel relieved and super excited.

Joey Dip

I found TutorBin Data Mining homework help when I was struggling with complex concepts. Experts provided step-wise explanations and examples to help me understand concepts clearly.

Rick Jordon

TutorBin helping students around the globe

TutorBin believes that distance should never be a barrier to learning. Over 500000+ orders and 100000+ happy customers explain TutorBin has become the name that keeps learning fun in the UK, USA, Canada, Australia, Singapore, and UAE.