Abstract
We present our submission and results for SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020) where we participated in offensive tweet classification tasks in English, Arabic, Greek, Turkish and Danish. Our approach included classical machine learning architectures such as support vector machines and logistic regression combined in an ensemble with a multilingual transformer-based model (XLM-R). The transformer model is trained on all languages combined in order to create a fully multilingual model which can leverage knowledge between languages. The machine learning model hyperparameters are fine-tuned and the statistically best performing ones included in the final ensemble.- Anthology ID:
- 2020.semeval-1.252
- Volume:
- Proceedings of the Fourteenth Workshop on Semantic Evaluation
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona (online)
- Venue:
- SemEval
- SIGs:
- SIGLEX | SIGSEM
- Publisher:
- International Committee for Computational Linguistics
- Note:
- Pages:
- 1916–1924
- Language:
- URL:
- https://aclanthology.org/2020.semeval-1.252
- DOI:
- 10.18653/v1/2020.semeval-1.252
- Cite (ACL):
- Kathryn Chapman, Johannes Bernhard, and Dietrich Klakow. 2020. CoLi at UdS at SemEval-2020 Task 12: Offensive Tweet Detection with Ensembling. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 1916–1924, Barcelona (online). International Committee for Computational Linguistics.
- Cite (Informal):
- CoLi at UdS at SemEval-2020 Task 12: Offensive Tweet Detection with Ensembling (Chapman et al., SemEval 2020)
- PDF:
- https://preview.aclanthology.org/auto-file-uploads/2020.semeval-1.252.pdf