Abstract
In this paper we examine methods to detect hate speech in social media, while distinguishing this from general profanity. We aim to establish lexical baselines for this task by applying supervised classification methods using a recently released dataset annotated for this purpose. As features, our system uses character n-grams, word n-grams and word skip-grams. We obtain results of 78% accuracy in identifying posts across three classes. Results demonstrate that the main challenge lies in discriminating profanity and hate speech from each other. A number of directions for future work are discussed.- Anthology ID:
- R17-1062
- Volume:
- Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017
- Month:
- September
- Year:
- 2017
- Address:
- Varna, Bulgaria
- Venue:
- RANLP
- SIG:
- Publisher:
- INCOMA Ltd.
- Note:
- Pages:
- 467–472
- Language:
- URL:
- https://doi.org/10.26615/978-954-452-049-6_062
- DOI:
- 10.26615/978-954-452-049-6_062
- Cite (ACL):
- Shervin Malmasi and Marcos Zampieri. 2017. Detecting Hate Speech in Social Media. In Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pages 467–472, Varna, Bulgaria. INCOMA Ltd..
- Cite (Informal):
- Detecting Hate Speech in Social Media (Malmasi & Zampieri, RANLP 2017)
- PDF:
- https://doi.org/10.26615/978-954-452-049-6_062