2024
pdf
abs
Beyond Binary: Towards Embracing Complexities in Cyberbullying Detection and Intervention - a Position Paper
Kanishk Verma
|
Kolawole John Adebayo
|
Joachim Wagner
|
Megan Reynolds
|
Rebecca Umbach
|
Tijana Milosevic
|
Brian Davis
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
In the digital age, cyberbullying (CB) poses a significant concern, impacting individuals as early as primary school and leading to severe or lasting consequences, including an increased risk of self-harm. CB incidents, are not limited to bullies and victims, but include bystanders with various roles, and usually have numerous sub-categories and variations of online harms. This position paper emphasises the complexity of CB incidents by drawing on insights from psychology, social sciences, and computational linguistics. While awareness of CB complexities is growing, existing computational techniques tend to oversimplify CB as a binary classification task, often relying on training datasets that capture peripheries of CB behaviours. Inconsistent definitions and categories of CB-related online harms across various platforms further complicates the issue. Ethical concerns arise when CB research involves children to role-play CB incidents to curate datasets. Through multi-disciplinary collaboration, we propose strategies for consideration when developing CB detection systems. We present our position on leveraging large language models (LLMs) such as Claude-2 and Llama2-Chat as an alternative approach to generate CB-related role-playing datasets. Our goal is to assist researchers, policymakers, and online platforms in making informed decisions regarding the automation of CB incident detection and intervention. By addressing these complexities, our research contributes to a more nuanced and effective approach to combating CB especially in young people.
2023
pdf
abs
DCU at SemEval-2023 Task 10: A Comparative Analysis of Encoder-only and Decoder-only Language Models with Insights into Interpretability
Kanishk Verma
|
Kolawole Adebayo
|
Joachim Wagner
|
Brian Davis
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
We conduct a comparison of pre-trained encoder-only and decoder-only language models with and without continued pre-training, to detect online sexism. Our fine-tuning-based classifier system achieved the 16th rank in the SemEval 2023 Shared Task 10 Subtask A that asks to distinguish sexist and non-sexist texts. Additionally, we conduct experiments aimed at enhancing the interpretability of systems designed to detect online sexism. Our findings provide insights into the features and decision-making processes underlying our classifier system, thereby contributing to a broader effort to develop explainable AI models to detect online sexism.
2022
pdf
bib
Proceedings of the First Workshop on Language Technology and Resources for a Fair, Inclusive, and Safe Society within the 13th Language Resources and Evaluation Conference
Kolawole Adebayo
|
Rohan Nanda
|
Kanishk Verma
|
Brian Davis
Proceedings of the First Workshop on Language Technology and Resources for a Fair, Inclusive, and Safe Society within the 13th Language Resources and Evaluation Conference
pdf
abs
Benchmarking Language Models for Cyberbullying Identification and Classification from Social-media Texts
Kanishk Verma
|
Tijana Milosevic
|
Keith Cortis
|
Brian Davis
Proceedings of the First Workshop on Language Technology and Resources for a Fair, Inclusive, and Safe Society within the 13th Language Resources and Evaluation Conference
Cyberbullying is bullying perpetrated via the medium of modern communication technologies like social media networks and gaming platforms. Unfortunately, most existing datasets focusing on cyberbullying detection or classification are i) limited in number ii) usually targeted to one specific online social networking (OSN) platform, or iii) often contain low-quality annotations. In this study, we fine-tune and benchmark state of the art neural transformers for the binary classification of cyberbullying in social media texts, which is of high value to Natural Language Processing (NLP) researchers and computational social scientists. Furthermore, this work represents the first step toward building neural language models for cross OSN platform cyberbullying classification to make them as OSN platform agnostic as possible.
pdf
abs
Can Attention-based Transformers Explain or Interpret Cyberbullying Detection?
Kanishk Verma
|
Tijana Milosevic
|
Brian Davis
Proceedings of the Third Workshop on Threat, Aggression and Cyberbullying (TRAC 2022)
Automated textual cyberbullying detection is known to be a challenging task. It is sometimes expected that messages associated with bullying will either be a) abusive, b) targeted at a specific individual or group, or c) have a negative sentiment. Transfer learning by fine-tuning pre-trained attention-based transformer language models (LMs) has achieved near state-of-the-art (SOA) precision in identifying textual fragments as being bullying-related or not. This study looks closely at two SOA LMs, BERT and HateBERT, fine-tuned on real-life cyberbullying datasets from multiple social networking platforms. We intend to determine whether these finely calibrated pre-trained LMs learn textual cyberbullying attributes or syntactical features in the text. The results of our comprehensive experiments show that despite the fact that attention weights are drawn more strongly to syntactical features of the text at every layer, attention weights cannot completely account for the decision-making of such attention-based transformers.
2021
pdf
abs
Fine-tuning Neural Language Models for Multidimensional Opinion Mining of English-Maltese Social Data
Keith Cortis
|
Kanishk Verma
|
Brian Davis
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
This paper presents multidimensional Social Opinion Mining on user-generated content gathered from newswires and social networking services in three different languages: English —a high-resourced language, Maltese —a low-resourced language, and Maltese-English —a code-switched language. Multiple fine-tuned neural classification language models which cater for the i) English, Maltese and Maltese-English languages as well as ii) five different social opinion dimensions, namely subjectivity, sentiment polarity, emotion, irony and sarcasm, are presented. Results per classification model for each social opinion dimension are discussed.