Question

1. Design and develop a text classifier which can be used as an amazon review categorizer. Your classifier must be able to train to classify reviews into one of two classes.

Positive and negative reviews. Description can be found in the readme file. Please note that we are using only the test set as the dataset is huge. This test set contains 400k data points. a. Data set can be found in the canvas b. Use the TfidfVectorizer found in Sciekit-learn library in python to vectorize the dataset c. Use GaussianNB for the classifier d. Calculate the accuracy of the model. You need to use the data partitioning to create train set and test set from the data set given. e. Input a sample text and determine the class of the text provided