NUIG at SemEval-2020 Task 12: Pseudo Labelling for Offensive Content Classification

Shardul Suryawanshi; Mihael Arcan; Paul Buitelaar

doi:10.18653/v1/2020.semeval-1.208

NUIG at SemEval-2020 Task 12: Pseudo Labelling for Offensive Content Classification

Shardul Suryawanshi, Mihael Arcan, Paul Buitelaar

Abstract

This work addresses the classification problem defined by sub-task A (English only) of the OffensEval 2020 challenge. We used a semi-supervised approach to classify given tweets into an offensive (OFF) or not-offensive (NOT) class. As the OffensEval 2020 dataset is loosely labelled with confidence scores given by unsupervised models, we used last year’s offensive language identification dataset (OLID) to label the OffensEval 2020 dataset. Our approach uses a pseudo-labelling method to annotate the current dataset. We trained four text classifiers on the OLID dataset and the classifier with the highest macro-averaged F1-score has been used to pseudo label the OffensEval 2020 dataset. The same model which performed best amongst four text classifiers on OLID dataset has been trained on the combined dataset of OLID and pseudo labelled OffensEval 2020. We evaluated the classifiers with precision, recall and macro-averaged F1-score as the primary evaluation metric on the OLID and OffensEval 2020 datasets. This work is licensed under a Creative Commons Attribution 4.0 International Licence. Licence details: http://creativecommons.org/licenses/by/4.0/.

Anthology ID:: 2020.semeval-1.208
Volume:: Proceedings of the Fourteenth Workshop on Semantic Evaluation
Month:: December
Year:: 2020
Address:: Barcelona (online)
Editors:: Aurelie Herbelot, Xiaodan Zhu, Alexis Palmer, Nathan Schneider, Jonathan May, Ekaterina Shutova
Venue:: SemEval
SIG:: SIGLEX
Publisher:: International Committee for Computational Linguistics
Note:
Pages:: 1598–1604
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2020.semeval-1.208/
DOI:: 10.18653/v1/2020.semeval-1.208
Bibkey:
Cite (ACL):: Shardul Suryawanshi, Mihael Arcan, and Paul Buitelaar. 2020. NUIG at SemEval-2020 Task 12: Pseudo Labelling for Offensive Content Classification. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 1598–1604, Barcelona (online). International Committee for Computational Linguistics.
Cite (Informal):: NUIG at SemEval-2020 Task 12: Pseudo Labelling for Offensive Content Classification (Suryawanshi et al., SemEval 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2020.semeval-1.208.pdf
Data: OLID

PDF Cite Search Fix data