Abstract
The present paper describes the system submitted by the PRHLT-UPV team for the task 12 of SemEval-2020: OffensEval 2020. The official title of the task is Multilingual Offensive Language Identification in Social Media, and aims to identify offensive language in texts. The languages included in the task are English, Arabic, Danish, Greek and Turkish. We propose a model based on the BERT architecture for the analysis of texts in English. The approach leverages knowledge within a pre-trained model and performs fine-tuning for the particular task. In the analysis of the other languages the Multilingual BERT is used, which has been pre-trained for a large number of languages. In the experiments, the proposed method for English texts is compared with other approaches to analyze the relevance of the architecture used. Furthermore, simple models for the other languages are evaluated to compare them with the proposed one. The experimental results show that the model based on BERT outperforms other approaches. The main contribution of this work lies in this study, despite not obtaining the first positions in most cases of the competition ranking.- Anthology ID:
- 2020.semeval-1.209
- Volume:
- Proceedings of the Fourteenth Workshop on Semantic Evaluation
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona (online)
- Editors:
- Aurelie Herbelot, Xiaodan Zhu, Alexis Palmer, Nathan Schneider, Jonathan May, Ekaterina Shutova
- Venue:
- SemEval
- SIG:
- SIGLEX
- Publisher:
- International Committee for Computational Linguistics
- Note:
- Pages:
- 1605–1614
- Language:
- URL:
- https://aclanthology.org/2020.semeval-1.209
- DOI:
- 10.18653/v1/2020.semeval-1.209
- Cite (ACL):
- Gretel Liz De la Peña Sarracén and Paolo Rosso. 2020. PRHLT-UPV at SemEval-2020 Task 12: BERT for Multilingual Offensive Language Detection. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 1605–1614, Barcelona (online). International Committee for Computational Linguistics.
- Cite (Informal):
- PRHLT-UPV at SemEval-2020 Task 12: BERT for Multilingual Offensive Language Detection (De la Peña Sarracén & Rosso, SemEval 2020)
- PDF:
- https://preview.aclanthology.org/add_acl24_videos/2020.semeval-1.209.pdf