Search for question
Question

1. Design and develop a text classifier which can be used as an amazon

review categorizer. Your classifier must be able to train to classify

reviews into one of two classes. Positive and negative reviews.

Description can be found in the readme file. Please note that we are

using only the test set as the dataset is huge. This test set contains

400k data points.

a. Data set can be found in the canvas

b. Use the TfidfVectorizer found in Sciekit-learn library in python to

vectorize the dataset

c. Use GaussianNB for the classifier

d. Calculate the accuracy of the model. You need to use the data

partitioning to create train set and test set from the data set

given.

e. Input a sample text and determine the class of the text provided