U-BERTopic: An Urgency-Aware BERT-Topic Modeling Approach for Detecting CyberSecurity Issues via Social Media

Majed Albarrak, Gabriele Pergola, Arshad Jhumka


Abstract
For computer systems to remain secure, timely information about system vulnerabilities and security threats are vital. Such information can be garnered from various sources, most notably from social media platforms. However, such information may often lack context and structure and, more importantly, are often unlabelled. For such media to act as alert systems, it is important to be able to first distinguish among the topics being discussed. Subsequently, identifying the nature of the threat or vulnerability is of importance as this will influence the remedial actions to be taken, e.g., is the threat imminent? In this paper, we propose U-BERTopic, an urgency-aware BERTtopic modelling approach for detecting cybersecurity issues through social media, by integrating sentiment analysis with contextualized topic modelling like BERTopic. We compare UBERTopic against three other topic modelling techniques using four different evaluation metrics for topic modelling and cybersecurity classification by running on a 2018 cyber security-related Twitter dataset. Our results show that (i) for topic modelling and under certain settings (e.g., number of topics), U-BERTopic often outperforms all other topic modelling techniques and (ii) for attack classification, U-BERTopic performs better for some attacks such as vulnerability identification in some settings.
Anthology ID:
2024.nlpaics-1.22
Volume:
Proceedings of the First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security
Month:
July
Year:
2024
Address:
Lancaster, UK
Editors:
Ruslan Mitkov, Saad Ezzini, Tharindu Ranasinghe, Ignatius Ezeani, Nouran Khallaf, Cengiz Acarturk, Matthew Bradbury, Mo El-Haj, Paul Rayson
Venue:
NLPAICS
SIG:
Publisher:
International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security
Note:
Pages:
196–211
Language:
URL:
https://preview.aclanthology.org/moar-dois/2024.nlpaics-1.22/
DOI:
Bibkey:
Cite (ACL):
Majed Albarrak, Gabriele Pergola, and Arshad Jhumka. 2024. U-BERTopic: An Urgency-Aware BERT-Topic Modeling Approach for Detecting CyberSecurity Issues via Social Media. In Proceedings of the First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security, pages 196–211, Lancaster, UK. International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security.
Cite (Informal):
U-BERTopic: An Urgency-Aware BERT-Topic Modeling Approach for Detecting CyberSecurity Issues via Social Media (Albarrak et al., NLPAICS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/moar-dois/2024.nlpaics-1.22.pdf