Abstract
This paper describes a method and system to solve the problem of detecting offensive language in social media using anti-adversarial features. Our submission to the SemEval-2020 task 12 challenge was generated by an stacked ensemble of neural networks fine-tuned on the OLID dataset and additional external sources. For Task-A (English), text normalisation filters were applied at both graphical and lexical level. The normalisation step effectively mitigates not only the natural presence of lexical variants but also intentional attempts to bypass moderation by introducing out of vocabulary words. Our approach provides strong F1 scores for both 2020 (0.9134) and 2019 (0.8258) challenges.- Anthology ID:
- 2020.semeval-1.250
- Volume:
- Proceedings of the Fourteenth Workshop on Semantic Evaluation
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona (online)
- Editors:
- Aurelie Herbelot, Xiaodan Zhu, Alexis Palmer, Nathan Schneider, Jonathan May, Ekaterina Shutova
- Venue:
- SemEval
- SIG:
- SIGLEX
- Publisher:
- International Committee for Computational Linguistics
- Note:
- Pages:
- 1898–1905
- Language:
- URL:
- https://aclanthology.org/2020.semeval-1.250
- DOI:
- 10.18653/v1/2020.semeval-1.250
- Cite (ACL):
- Alejandro Mosquera. 2020. Amsqr at SemEval-2020 Task 12: Offensive Language Detection Using Neural Networks and Anti-adversarial Features. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 1898–1905, Barcelona (online). International Committee for Computational Linguistics.
- Cite (Informal):
- Amsqr at SemEval-2020 Task 12: Offensive Language Detection Using Neural Networks and Anti-adversarial Features (Mosquera, SemEval 2020)
- PDF:
- https://preview.aclanthology.org/emnlp22-frontmatter/2020.semeval-1.250.pdf
- Data
- IMDb Movie Reviews, OLID