Arshad Jhumka


2025

pdf bib
SafeSpeech: A Comprehensive and Interactive Tool for Analysing Sexist and Abusive Language in Conversations
Xingwei Tan | Chen Lyu | Hafiz Muhammad Umer | Sahrish Khan | Mahathi Parvatham | Lois Arthurs | Simon Cullen | Shelley Wilson | Arshad Jhumka | Gabriele Pergola
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations)

Detecting toxic language, including sexism, harassment, and abusive behaviour, remains a critical challenge, particularly in its subtle and context-dependent forms. Existing approaches largely focus on isolated message-level classification, overlooking toxicity that emerges across conversational contexts. To promote and enable future research in this direction, we introduce *SafeSpeech*, a comprehensive platform for toxic content detection and analysis that bridges message-level and conversation-level insights. The platform integrates fine-tuned classifiers and large language models (LLMs) to enable multi-granularity detection, toxic-aware conversation summarization, and persona profiling. *SafeSpeech* also incorporates explainability mechanisms, such as perplexity gain analysis, to highlight the linguistic elements driving predictions. Evaluations on benchmark datasets, including EDOS, OffensEval, and HatEval, demonstrate the reproduction of state-of-the-art performance across multiple tasks, including fine-grained sexism detection.

2024

pdf bib
U-BERTopic: An Urgency-Aware BERT-Topic Modeling Approach for Detecting CyberSecurity Issues via Social Media
Majed Albarrak | Gabriele Pergola | Arshad Jhumka
Proceedings of the First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security

For computer systems to remain secure, timely information about system vulnerabilities and security threats are vital. Such information can be garnered from various sources, most notably from social media platforms. However, such information may often lack context and structure and, more importantly, are often unlabelled. For such media to act as alert systems, it is important to be able to first distinguish among the topics being discussed. Subsequently, identifying the nature of the threat or vulnerability is of importance as this will influence the remedial actions to be taken, e.g., is the threat imminent? In this paper, we propose U-BERTopic, an urgency-aware BERTtopic modelling approach for detecting cybersecurity issues through social media, by integrating sentiment analysis with contextualized topic modelling like BERTopic. We compare UBERTopic against three other topic modelling techniques using four different evaluation metrics for topic modelling and cybersecurity classification by running on a 2018 cyber security-related Twitter dataset. Our results show that (i) for topic modelling and under certain settings (e.g., number of topics), U-BERTopic often outperforms all other topic modelling techniques and (ii) for attack classification, U-BERTopic performs better for some attacks such as vulnerability identification in some settings.