In the digital age, cyberbullying (CB) poses a significant concern, impacting individuals as early as primary school and leading to severe or lasting consequences, including an increased risk of self-harm. CB incidents, are not limited to bullies and victims, but include bystanders with various roles, and usually have numerous sub-categories and variations of online harms. This position paper emphasises the complexity of CB incidents by drawing on insights from psychology, social sciences, and computational linguistics. While awareness of CB complexities is growing, existing computational techniques tend to oversimplify CB as a binary classification task, often relying on training datasets that capture peripheries of CB behaviours. Inconsistent definitions and categories of CB-related online harms across various platforms further complicates the issue. Ethical concerns arise when CB research involves children to role-play CB incidents to curate datasets. Through multi-disciplinary collaboration, we propose strategies for consideration when developing CB detection systems. We present our position on leveraging large language models (LLMs) such as Claude-2 and Llama2-Chat as an alternative approach to generate CB-related role-playing datasets. Our goal is to assist researchers, policymakers, and online platforms in making informed decisions regarding the automation of CB incident detection and intervention. By addressing these complexities, our research contributes to a more nuanced and effective approach to combating CB especially in young people.
Cyberbullying is bullying perpetrated via the medium of modern communication technologies like social media networks and gaming platforms. Unfortunately, most existing datasets focusing on cyberbullying detection or classification are i) limited in number ii) usually targeted to one specific online social networking (OSN) platform, or iii) often contain low-quality annotations. In this study, we fine-tune and benchmark state of the art neural transformers for the binary classification of cyberbullying in social media texts, which is of high value to Natural Language Processing (NLP) researchers and computational social scientists. Furthermore, this work represents the first step toward building neural language models for cross OSN platform cyberbullying classification to make them as OSN platform agnostic as possible.
Automated textual cyberbullying detection is known to be a challenging task. It is sometimes expected that messages associated with bullying will either be a) abusive, b) targeted at a specific individual or group, or c) have a negative sentiment. Transfer learning by fine-tuning pre-trained attention-based transformer language models (LMs) has achieved near state-of-the-art (SOA) precision in identifying textual fragments as being bullying-related or not. This study looks closely at two SOA LMs, BERT and HateBERT, fine-tuned on real-life cyberbullying datasets from multiple social networking platforms. We intend to determine whether these finely calibrated pre-trained LMs learn textual cyberbullying attributes or syntactical features in the text. The results of our comprehensive experiments show that despite the fact that attention weights are drawn more strongly to syntactical features of the text at every layer, attention weights cannot completely account for the decision-making of such attention-based transformers.