Abstract
The paper introduces a deep learning-based Twitter hate-speech text classification system. The classifier assigns each tweet to one of four predefined categories: racism, sexism, both (racism and sexism) and non-hate-speech. Four Convolutional Neural Network models were trained on resp. character 4-grams, word vectors based on semantic information built using word2vec, randomly generated word vectors, and word vectors combined with character n-grams. The feature set was down-sized in the networks by max-pooling, and a softmax function used to classify tweets. Tested by 10-fold cross-validation, the model based on word2vec embeddings performed best, with higher precision than recall, and a 78.3% F-score.- Anthology ID:
- W17-3013
- Volume:
- Proceedings of the First Workshop on Abusive Language Online
- Month:
- August
- Year:
- 2017
- Address:
- Vancouver, BC, Canada
- Editors:
- Zeerak Waseem, Wendy Hui Kyong Chung, Dirk Hovy, Joel Tetreault
- Venue:
- ALW
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 85–90
- Language:
- URL:
- https://aclanthology.org/W17-3013
- DOI:
- 10.18653/v1/W17-3013
- Cite (ACL):
- Björn Gambäck and Utpal Kumar Sikdar. 2017. Using Convolutional Neural Networks to Classify Hate-Speech. In Proceedings of the First Workshop on Abusive Language Online, pages 85–90, Vancouver, BC, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Using Convolutional Neural Networks to Classify Hate-Speech (Gambäck & Sikdar, ALW 2017)
- PDF:
- https://preview.aclanthology.org/emnlp22-frontmatter/W17-3013.pdf