Problem Comment Classification
- Valerie Dobrelya
- Feb 12, 2018
- 1 min read
Updated: Feb 13, 2018

This is my capstone project at the Data Science Immersive course at General Assembly, Sydney. I decided to undertake the Kaggle Toxic Comment Classification challenge. The aim is to detect toxic, hateful, obscene or inappropriate user comments in the context of Wikipedia edits.
The specific academic context and prevalence of jargon, swear words(often misspelled) and online speech meant that using an existing model such as Google's word2vec, which was trained on news articles, would not be fitting, as it doesn't recognise many of the problem words.

After extensive data cleaning, removing capitalisation and any non-ASCII symbols, and modifying a swear word filter I found on GitHub to include various common misspellings, I endeavoured to use neural networks to find patterns that may help identify problem comments.
My initial attempt at building a model using word2vec returned an accuracy of 91%, which sounded great until I realised that the baseline accuracy was 89%. Only ~11% of the comments were problematic, so guessing that most of the comments are “ok” returned a very high accuracy.
After many attempts to improve the word2vec model, I attempted to write my own neural network, which was fun but led to similar results.
Eventually I built a model using CountVectorizer and a scikitlearn supervised neural network, which in combination with a modified stop words filter started producing improved results. The final model accuracy is over 95% which I was happy with as it is a significant improvement.
Comments