Towards Safer Communities: Detecting Aggression and Offensive Language in Code-Mixed Tweets to Combat Cyberbullying

Nazia Nafis; Diptesh Kanojia; Naveen Saini; Rudra Murthy

doi:10.18653/v1/2023.woah-1.3

Towards Safer Communities: Detecting Aggression and Offensive Language in Code-Mixed Tweets to Combat Cyberbullying

Nazia Nafis, Diptesh Kanojia, Naveen Saini, Rudra Murthy

Abstract

Cyberbullying is a serious societal issue widespread on various channels and platforms, particularly social networking sites. Such platforms have proven to be exceptionally fertile grounds for such behavior. The dearth of high-quality training data for multilingual and low-resource scenarios, data that can accurately capture the nuances of social media conversations, often poses a roadblock to this task. This paper attempts to tackle cyberbullying, specifically its two most common manifestations - aggression and offensiveness. We present a novel, manually annotated dataset of a total of 10,000 English and Hindi-English code-mixed tweets, manually annotated for aggression detection and offensive language detection tasks. Our annotations are supported by inter-annotator agreement scores of 0.67 and 0.74 for the two tasks, indicating substantial agreement. We perform comprehensive fine-tuning of pre-trained language models (PTLMs) using this dataset to check its efficacy. Our challenging test sets show that the best models achieve macro F1-scores of 67.87 and 65.45 on the two tasks, respectively. Further, we perform cross-dataset transfer learning to benchmark our dataset against existing aggression and offensive language datasets. We also present a detailed quantitative and qualitative analysis of errors in prediction, and with this paper, we publicly release the novel dataset, code, and models.

Anthology ID:: 2023.woah-1.3
Volume:: The 7th Workshop on Online Abuse and Harms (WOAH)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Yi-ling Chung, Paul R{\"ottger}, Debora Nozza, Zeerak Talat, Aida Mostafazadeh Davani
Venue:: WOAH
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 29–41
Language:
URL:: https://aclanthology.org/2023.woah-1.3
DOI:: 10.18653/v1/2023.woah-1.3
Bibkey:
Cite (ACL):: Nazia Nafis, Diptesh Kanojia, Naveen Saini, and Rudra Murthy. 2023. Towards Safer Communities: Detecting Aggression and Offensive Language in Code-Mixed Tweets to Combat Cyberbullying. In The 7th Workshop on Online Abuse and Harms (WOAH), pages 29–41, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Towards Safer Communities: Detecting Aggression and Offensive Language in Code-Mixed Tweets to Combat Cyberbullying (Nafis et al., WOAH 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/dois-2013-emnlp/2023.woah-1.3.pdf

PDF Cite Search