Abstract
This paper presents our contribution to the Offensive Language Classification Task (English SubTask A) of Semeval 2020. We propose different Bert models trained on several offensive language classification and profanity datasets, and combine their output predictions in an ensemble model. We experimented with different ensemble approaches, such as SVMs, Gradient boosting, AdaBoosting and Logistic Regression. We further propose an under-sampling approach of the current SOLID dataset, which removed the most uncertain partitions of the dataset, increasing the recall of the dataset. Our best model, an average ensemble of four different Bert models, achieved 11th place out of 82 participants with a macro F1 score of 0.91344 in the English SubTask A.- Anthology ID:
- 2020.semeval-1.207
- Volume:
- Proceedings of the Fourteenth Workshop on Semantic Evaluation
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona (online)
- Editors:
- Aurelie Herbelot, Xiaodan Zhu, Alexis Palmer, Nathan Schneider, Jonathan May, Ekaterina Shutova
- Venue:
- SemEval
- SIG:
- SIGLEX
- Publisher:
- International Committee for Computational Linguistics
- Note:
- Pages:
- 1587–1597
- Language:
- URL:
- https://aclanthology.org/2020.semeval-1.207
- DOI:
- 10.18653/v1/2020.semeval-1.207
- Cite (ACL):
- Susan Wang and Zita Marinho. 2020. Nova-Wang at SemEval-2020 Task 12: OffensEmblert: An Ensemble ofOffensive Language Classifiers. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 1587–1597, Barcelona (online). International Committee for Computational Linguistics.
- Cite (Informal):
- Nova-Wang at SemEval-2020 Task 12: OffensEmblert: An Ensemble ofOffensive Language Classifiers (Wang & Marinho, SemEval 2020)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2020.semeval-1.207.pdf
- Data
- OLID