A Proposal Framework Security Assessment for Large Language Models

Daniel Mendonça Colares, Raimir Holanda Filho, Luis Borges Gouveia


Abstract
Large Language Models (LLMs), despite their numerous applications and the significant benefits they offer, have proven to be extremely susceptible to attacks of various natures. Due to their large number of vulnerabilities, often unknown, and which consequently become potential targets for attacks, investing in the implementation of this technology becomes a gamble. Ensuring the security of LLMs is of utmost importance, but unfortunately, providing effective security for so many different vulnerabilities is a costly task, especially for companies seeking rapid growth. Many studies focus on analyzing the security of LLMs for specific types of vulnerabilities, such as prompt inject or jailbreaking, but they rarely assess the security of the model as a whole. Therefore, this study aims to facilitate the evaluation of vulnerabilities across various models and identify their main weaknesses. To achieve this, our work sought to develop a comprehensive framework capable of utilizing various scanners to assess the security of LLMs, allowing for a detailed analysis of their vulnerabilities. Through the use of the framework, we tested and evaluated multiple models, and with the results collected from these assessments of various vulnerabilities for each model tested, we analyzed the obtained data. Our results not only demonstrated potential weaknesses in certain models but also revealed a possible relationship between model security and the number of parameters for similar models.
Anthology ID:
2024.nlpaics-1.23
Volume:
Proceedings of the First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security
Month:
July
Year:
2024
Address:
Lancaster, UK
Editors:
Ruslan Mitkov, Saad Ezzini, Tharindu Ranasinghe, Ignatius Ezeani, Nouran Khallaf, Cengiz Acarturk, Matthew Bradbury, Mo El-Haj, Paul Rayson
Venue:
NLPAICS
SIG:
Publisher:
International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security
Note:
Pages:
212–219
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2024.nlpaics-1.23/
DOI:
Bibkey:
Cite (ACL):
Daniel Mendonça Colares, Raimir Holanda Filho, and Luis Borges Gouveia. 2024. A Proposal Framework Security Assessment for Large Language Models. In Proceedings of the First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security, pages 212–219, Lancaster, UK. International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security.
Cite (Informal):
A Proposal Framework Security Assessment for Large Language Models (Colares et al., NLPAICS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2024.nlpaics-1.23.pdf