Building a text classifier in python for movie review corpus in 2000 NLTK movie review
corpus, you should write a 600 (+/- 50) words short report to summarize:
1: Your proposed text classification method (using relevant NLTK text classification
functions or other text classification functions) can use any classification method
EXCEPTNaïve Bayes Classifier.
2: Pre-processing methods such as removing stop words, punctuations.
3: Feature selection methods such as selecting the 1000 most important words or
eliminatesome least important methods.
4: You should train your model on 80 percent of the movie review corpus, fine-tune your
model on 10 percent of movie review (as an evaluation test) and then test your model on
another 10 percent of the movie review. Would you please report the precision, recall,
accuracy, and F-score of each class? In addition, you can also use the AUC-ROC score to
evaluate your model.
My evaluation will be based on whether you can demonstrate your method in
textclassification, pre-processing, feature selection, and performance evaluation.
Please check the bellowing NLTK's official documents for using NLTK movie review
corpus:
https://www.nltk.org/_modules/nltk/corpus/reader/reviews.html
You should submit both code and report.
Fig: 1