GruPaTo at SemEval-2020 Task 12: Retraining mBERT on Social Media and Fine-tuned Offensive Language Models

Davide Colla, Tommaso Caselli, Valerio Basile, Jelena Mitrović, Michael Granitzer


Abstract
We introduce an approach to multilingual Offensive Language Detection based on the mBERT transformer model. We download extra training data from Twitter in English, Danish, and Turkish, and use it to re-train the model. We then fine-tuned the model on the provided training data and, in some configurations, implement transfer learning approach exploiting the typological relatedness between English and Danish. Our systems obtained good results across the three languages (.9036 for EN, .7619 for DA, and .7789 for TR).
Anthology ID:
2020.semeval-1.202
Volume:
Proceedings of the Fourteenth Workshop on Semantic Evaluation
Month:
December
Year:
2020
Address:
Barcelona (online)
Venues:
COLING | SemEval
SIGs:
SIGLEX | SIGSEM
Publisher:
International Committee for Computational Linguistics
Note:
Pages:
1546–1554
Language:
URL:
https://aclanthology.org/2020.semeval-1.202
DOI:
10.18653/v1/2020.semeval-1.202
Bibkey:
Cite (ACL):
Davide Colla, Tommaso Caselli, Valerio Basile, Jelena Mitrović, and Michael Granitzer. 2020. GruPaTo at SemEval-2020 Task 12: Retraining mBERT on Social Media and Fine-tuned Offensive Language Models. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 1546–1554, Barcelona (online). International Committee for Computational Linguistics.
Cite (Informal):
GruPaTo at SemEval-2020 Task 12: Retraining mBERT on Social Media and Fine-tuned Offensive Language Models (Colla et al., SemEval 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/2020.semeval-1.202.pdf