SINAI at SemEval-2024 Task 8: Fine-tuning on Words and Perplexity as Features for Detecting Machine Written Text

Alberto Gutiérrez Megías, L. Alfonso Ureña-lópez, Eugenio Martínez Cámara


Abstract
This work presents the proposed systems of the SINAI team for the subtask A of the Task 8 in SemEval 2024. We present the evaluation of two disparate systems, and our final submitted system. We claim that the perplexity value of a text may be used as classification signal. Accordingly, we conduct a study on the utility of perplexity for discerning text authorship, and we perform a comparative analysis of the results obtained on the datasets of the task. This comparative evaluation includes results derived from the systems evaluated, such as fine-tuning using an XLM-RoBERTa-Large transformer or using perplexity as a classification criterion. In addition, we discuss the results reached on the test set, where we show that there is large differences among the language probability distribution of the training and test sets. These analysis allows us to open new research lines to improve the detection of machine-generated text.
Anthology ID:
2024.semeval-1.216
Volume:
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Atul Kr. Ojha, A. Seza Doğruöz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosá
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
1505–1510
Language:
URL:
https://aclanthology.org/2024.semeval-1.216
DOI:
Bibkey:
Cite (ACL):
Alberto Gutiérrez Megías, L. Alfonso Ureña-lópez, and Eugenio Martínez Cámara. 2024. SINAI at SemEval-2024 Task 8: Fine-tuning on Words and Perplexity as Features for Detecting Machine Written Text. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 1505–1510, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
SINAI at SemEval-2024 Task 8: Fine-tuning on Words and Perplexity as Features for Detecting Machine Written Text (Gutiérrez Megías et al., SemEval 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl-24-ingestion-2/2024.semeval-1.216.pdf
Supplementary material:
 2024.semeval-1.216.SupplementaryMaterial.txt