Your code should include:
1: Read the file, incorporate the instances into the training set and testing set.
2: Pre-processing the text, you can choose whether you need stemming, removing stop words, removing non-alphabetical words. (Not all classification models need this step, it is OK if you think your model can perform better without this step, and you can give some justification in the report.)
3: Analysing the feature of the training set, report the linguistic features of the training dataset.
4: Build a text classification model, train your model on the training set and test your model on the test set.
5: Summarize the performance of your model (You can gain additional marks if you have some graph visualization).
6: (Optional) You can speculate how you can improve your works based on your proposed model.