Andric Valdez


2024

pdf
iimasNLP at SemEval-2024 Task 8: Unveiling structure-aware language models for automatic generated text identification
Andric Valdez | Fernando Márquez | Jorge Pantaleón | Helena Gómez | Gemma Bel-enguix
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

Large language models (LLMs) are artificial intelligence systems that can generate text, translate languages, and answer questions in a human-like way. While these advances are impressive, there is concern that LLMs could also be used to generate fake or misleading content. In this work, as a part of our participation in SemEval-2024 Task-8, we investigate the ability of LLMs to identify whether a given text was written by a human or by a specific AI. We believe that human and machine writing style patterns are different from each other, so integrating features at different language levels can help in this classification task. For this reason, we evaluate several LLMs that aim to extract valuable multilevel information (such as lexical, semantic, and syntactic) from the text in their training processing. Our best scores on Sub- taskA (monolingual) and SubtaskB were 71.5% and 38.2% in accuracy, respectively (both using the ConvBERT LLM); for both subtasks, the baseline (RoBERTa) achieved an accuracy of 74%.