Search for question
Question

Part 4: Text classification model (25%)

Building a text classifier in python for movie review corpus in 2000 NLTK movie review

corpus, you should write a 600 (+/- 50) words short report to summarize:

1: Your proposed text classification method (using relevant NLTK text classification

functions or other text classification functions) can use any classification method

EXCEPTNaïve Bayes Classifier.

2: Pre-processing methods such as removing stop words, punctuations.

3: Feature selection methods such as selecting the 1000 most important words or

eliminatesome least important methods.

4: You should train your model on 80 percent of the movie review corpus, fine-tune your

model on 10 percent of movie review (as an evaluation test) and then test your model on

another 10 percent of the movie review. Would you please report the precision, recall,

accuracy, and F-score of each class? In addition, you can also use the AUC-ROC score to

evaluate your model.

My evaluation will be based on whether you can demonstrate your method in

textclassification, pre-processing, feature selection, and performance evaluation.

Please check the bellowing NLTK's official documents for using NLTK movie review

corpus:

https://www.nltk.org/_modules/nltk/corpus/reader/reviews.html

You should submit both code and report.

Fig: 1