BNU-HKBU UIC NLP Team 2 at SemEval-2019 Task 6: Detecting Offensive Language Using BERT model

Zhenghao Wu, Hao Zheng, Jianming Wang, Weifeng Su, Jefferson Fong


Abstract
In this study we deal with the problem of identifying and categorizing offensive language in social media. Our group, BNU-HKBU UIC NLP Team2, use supervised classification along with multiple version of data generated by different ways of pre-processing the data. We then use the state-of-the-art model Bidirectional Encoder Representations from Transformers, or BERT (Devlin et al, 2018), to capture linguistic, syntactic and semantic features. Long range dependencies between each part of a sentence can be captured by BERT’s bidirectional encoder representations. Our results show 85.12% accuracy and 80.57% F1 scores in Subtask A (offensive language identification), 87.92% accuracy and 50% F1 scores in Subtask B (categorization of offense types), and 69.95% accuracy and 50.47% F1 score in Subtask C (offense target identification). Analysis of the results shows that distinguishing between targeted and untargeted offensive language is not a simple task. More work needs to be done on the unbalance data problem in Subtasks B and C. Some future work is also discussed.
Anthology ID:
S19-2099
Volume:
Proceedings of the 13th International Workshop on Semantic Evaluation
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota, USA
Editors:
Jonathan May, Ekaterina Shutova, Aurelie Herbelot, Xiaodan Zhu, Marianna Apidianaki, Saif M. Mohammad
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
551–555
Language:
URL:
https://preview.aclanthology.org/build-pipeline-with-new-library/S19-2099/
DOI:
10.18653/v1/S19-2099
Bibkey:
Cite (ACL):
Zhenghao Wu, Hao Zheng, Jianming Wang, Weifeng Su, and Jefferson Fong. 2019. BNU-HKBU UIC NLP Team 2 at SemEval-2019 Task 6: Detecting Offensive Language Using BERT model. In Proceedings of the 13th International Workshop on Semantic Evaluation, pages 551–555, Minneapolis, Minnesota, USA. Association for Computational Linguistics.
Cite (Informal):
BNU-HKBU UIC NLP Team 2 at SemEval-2019 Task 6: Detecting Offensive Language Using BERT model (Wu et al., SemEval 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/build-pipeline-with-new-library/S19-2099.pdf