Learning Representations for Detecting Abusive Language
Abstract
This paper discusses the question whether it is possible to learn a generic representation that is useful for detecting various types of abusive language. The approach is inspired by recent advances in transfer learning and word embeddings, and we learn representations from two different datasets containing various degrees of abusive language. We compare the learned representation with two standard approaches; one based on lexica, and one based on data-specific n-grams. Our experiments show that learned representations do contain useful information that can be used to improve detection performance when training data is limited.- Anthology ID:
- W18-5115
- Volume:
- Proceedings of the 2nd Workshop on Abusive Language Online (ALW2)
- Month:
- October
- Year:
- 2018
- Address:
- Brussels, Belgium
- Editors:
- Darja Fišer, Ruihong Huang, Vinodkumar Prabhakaran, Rob Voigt, Zeerak Waseem, Jacqueline Wernimont
- Venue:
- ALW
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 115–123
- Language:
- URL:
- https://aclanthology.org/W18-5115
- DOI:
- 10.18653/v1/W18-5115
- Cite (ACL):
- Magnus Sahlgren, Tim Isbister, and Fredrik Olsson. 2018. Learning Representations for Detecting Abusive Language. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), pages 115–123, Brussels, Belgium. Association for Computational Linguistics.
- Cite (Informal):
- Learning Representations for Detecting Abusive Language (Sahlgren et al., ALW 2018)
- PDF:
- https://preview.aclanthology.org/teach-a-man-to-fish/W18-5115.pdf