Fighting Cyber-malice: A Forensic Linguistics Approach to Detecting AI-generated Malicious Texts

Rui Sousa-Silva


Abstract
Technology has long been used for criminal purposes, but the technological developments of the last decades have allowed users to remain anonymous online, which in turn increased the volume and heterogeneity of cybercrimes and made it more difficult for law enforcement agencies to detect and fight them. However, as they ignore the very nature of language, cybercriminals tend to overlook the potential of linguistic analysis to positively identify them by the language that they use. Forensic linguistics research and practice has therefore proven reliable in fighting cybercrime, either by analysing authorship to confirm or reject the law enforcement agents’ suspicions, or by sociolinguistically profiling the author of the cybercriminal communications to provide the investigators with sociodemographic information to help guide the investigation. However, large language models and generative AI have raised new challenges: not only has cybercrime increased as a result of AI-generated texts, but also generative AI makes it more difficult for forensic linguists to attribute the authorship of the texts to the perpetrators. This paper argues that, although a shift of focus is required, forensic linguistics plays a core role in detecting and fighting cybercrime. A focus on deep linguistic features, rather than low-level and purely stylistic elements, has the potential to discriminate between human- and AI-generated texts and provide the investigation with vital information. We conclude by discussing the foreseeable future limitations, especially resulting from the developments expected from language models.
Anthology ID:
2024.nlpaics-1.19
Volume:
Proceedings of the First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security
Month:
July
Year:
2024
Address:
Lancaster, UK
Editors:
Ruslan Mitkov, Saad Ezzini, Tharindu Ranasinghe, Ignatius Ezeani, Nouran Khallaf, Cengiz Acarturk, Matthew Bradbury, Mo El-Haj, Paul Rayson
Venue:
NLPAICS
SIG:
Publisher:
International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security
Note:
Pages:
164–174
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2024.nlpaics-1.19/
DOI:
Bibkey:
Cite (ACL):
Rui Sousa-Silva. 2024. Fighting Cyber-malice: A Forensic Linguistics Approach to Detecting AI-generated Malicious Texts. In Proceedings of the First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security, pages 164–174, Lancaster, UK. International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security.
Cite (Informal):
Fighting Cyber-malice: A Forensic Linguistics Approach to Detecting AI-generated Malicious Texts (Sousa-Silva, NLPAICS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2024.nlpaics-1.19.pdf