UJNLP at SemEval-2020 Task 12: Detecting Offensive Language Using Bidirectional Transformers

Yinnan Yao, Nan Su, Kun Ma


Abstract
In this paper, we built several pre-trained models to participate SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media. In the common task of Offensive Language Identification in Social Media, pre-trained models such as Bidirectional Encoder Representation from Transformer (BERT) have achieved good results. We preprocess the dataset by the language habits of users in social network. Considering the data imbalance in OffensEval, we screened the newly provided machine annotation samples to construct a new dataset. We use the dataset to fine-tune the Robustly Optimized BERT Pretraining Approach (RoBERTa). For the English subtask B, we adopted the method of adding Auxiliary Sentences (AS) to transform the single-sentence classification task into a relationship recognition task between sentences. Our team UJNLP wins the ranking 16th of 85 in English subtask A (Offensive language identification).
Anthology ID:
2020.semeval-1.293
Volume:
Proceedings of the Fourteenth Workshop on Semantic Evaluation
Month:
December
Year:
2020
Address:
Barcelona (online)
Venues:
COLING | SemEval
SIGs:
SIGLEX | SIGSEM
Publisher:
International Committee for Computational Linguistics
Note:
Pages:
2203–2208
Language:
URL:
https://aclanthology.org/2020.semeval-1.293
DOI:
10.18653/v1/2020.semeval-1.293
Bibkey:
Cite (ACL):
Yinnan Yao, Nan Su, and Kun Ma. 2020. UJNLP at SemEval-2020 Task 12: Detecting Offensive Language Using Bidirectional Transformers. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 2203–2208, Barcelona (online). International Committee for Computational Linguistics.
Cite (Informal):
UJNLP at SemEval-2020 Task 12: Detecting Offensive Language Using Bidirectional Transformers (Yao et al., SemEval 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/2020.semeval-1.293.pdf
Code
 yaoyinnan/offenseval
Data
OLID