iimasNLP at SemEval-2024 Task 8: Unveiling structure-aware language models for automatic generated text identification

Andric Valdez, Fernando Márquez, Jorge Pantaleón, Helena Gómez, Gemma Bel-enguix


Abstract
Large language models (LLMs) are artificial intelligence systems that can generate text, translate languages, and answer questions in a human-like way. While these advances are impressive, there is concern that LLMs could also be used to generate fake or misleading content. In this work, as a part of our participation in SemEval-2024 Task-8, we investigate the ability of LLMs to identify whether a given text was written by a human or by a specific AI. We believe that human and machine writing style patterns are different from each other, so integrating features at different language levels can help in this classification task. For this reason, we evaluate several LLMs that aim to extract valuable multilevel information (such as lexical, semantic, and syntactic) from the text in their training processing. Our best scores on Sub- taskA (monolingual) and SubtaskB were 71.5% and 38.2% in accuracy, respectively (both using the ConvBERT LLM); for both subtasks, the baseline (RoBERTa) achieved an accuracy of 74%.
Anthology ID:
2024.semeval-1.161
Volume:
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Atul Kr. Ojha, A. Seza Doğruöz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosá
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
1110–1114
Language:
URL:
https://aclanthology.org/2024.semeval-1.161
DOI:
Bibkey:
Cite (ACL):
Andric Valdez, Fernando Márquez, Jorge Pantaleón, Helena Gómez, and Gemma Bel-enguix. 2024. iimasNLP at SemEval-2024 Task 8: Unveiling structure-aware language models for automatic generated text identification. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 1110–1114, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
iimasNLP at SemEval-2024 Task 8: Unveiling structure-aware language models for automatic generated text identification (Valdez et al., SemEval 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/jeptaln-2024-ingestion/2024.semeval-1.161.pdf
Supplementary material:
 2024.semeval-1.161.SupplementaryMaterial.txt