Alberto José Gutiérrez Megías
2024
The Influence of the Perplexity Score in the Detection of Machine-generated Texts
Alberto José Gutiérrez Megías
|
L. Alfonso Ureña-López
|
Eugenio Martínez Cámara
Proceedings of the First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security
The high performance of large language models (LLM) generating natural language represents a real threat, since they can be leveraged to generate any kind of deceptive content. Since there are still disparities among the language generated by machines and the human language, we claim that perplexity may be used as classification signal to discern between machine and human text. We propose a classification model based on XLM-RoBERTa, and we evaluate it on the M4 dataset. The results show that the perplexity score is useful for the identification of machine generated text, but it is constrained by the differences among the LLMs used in the training and test sets.