Alberto José Gutiérrez Megías

2024

pdf bib abs
The Influence of the Perplexity Score in the Detection of Machine-generated Texts
Alberto José Gutiérrez Megías | L. Alfonso Ureña-López | Eugenio Martínez Cámara
Proceedings of the First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security

The high performance of large language models (LLM) generating natural language represents a real threat, since they can be leveraged to generate any kind of deceptive content. Since there are still disparities among the language generated by machines and the human language, we claim that perplexity may be used as classification signal to discern between machine and human text. We propose a classification model based on XLM-RoBERTa, and we evaluate it on the M4 dataset. The results show that the perplexity score is useful for the identification of machine generated text, but it is constrained by the differences among the LLMs used in the training and test sets.

Co-authors

Venues

nlpaics1

Fix data