Deciphering Cyber Threats: A Unifying Framework with GPT-3.5, BERTopic and Feature Importance

Chun Man Tsang, Tom Bell, Antonios Gouglidis, Mo El-Haj


Abstract
This paper presents a methodology for the categorisation and attribute quantification of cyber threats. The data was sourced from Common Weakness Enumeration (CWE) entries, encompassing 503 hardware and software vulnerabilities. For each entry, GPT-3.5 generated detailed descriptions for 12 key threat attributes. Employing BERTopic for topic modelling, our research focuses on clustering cyber threats and evaluates the efficacy of various dimensionality reduction and clustering algorithms, notably finding that UMAP combined with HDBSCAN, optimised through parameterisation, outperforms other configurations. The study further explores feature importance analysis by converting topic modelling results into a classification paradigm, achieving classification accuracies between 60% and 80% with algorithms such as Random Forest, XGBoost, and Linear SVM. This feature importance analysis quantifies the significance of each threat attribute, with SHAP identified as the most effective method for this calculation.
Anthology ID:
2024.nlpaics-1.20
Volume:
Proceedings of the First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security
Month:
July
Year:
2024
Address:
Lancaster, UK
Editors:
Ruslan Mitkov, Saad Ezzini, Tharindu Ranasinghe, Ignatius Ezeani, Nouran Khallaf, Cengiz Acarturk, Matthew Bradbury, Mo El-Haj, Paul Rayson
Venue:
NLPAICS
SIG:
Publisher:
International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security
Note:
Pages:
175–185
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2024.nlpaics-1.20/
DOI:
Bibkey:
Cite (ACL):
Chun Man Tsang, Tom Bell, Antonios Gouglidis, and Mo El-Haj. 2024. Deciphering Cyber Threats: A Unifying Framework with GPT-3.5, BERTopic and Feature Importance. In Proceedings of the First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security, pages 175–185, Lancaster, UK. International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security.
Cite (Informal):
Deciphering Cyber Threats: A Unifying Framework with GPT-3.5, BERTopic and Feature Importance (Tsang et al., NLPAICS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2024.nlpaics-1.20.pdf