Andric Valdez

2026

LATE-IIMAS at Semeval-2026 Task 13: Evaluating GNNs, PLMs, LLMs, and Stylometry for Automatic Code Identification
Andric Valdez | Emmanuel Ancona | Sebastián Bernardino | Helena Gomez-Adorno | Fazlourrahman Balouchzahi | Fabian Herrera
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

The generation of source code via Artificial Intelligence has become a prevalent practice in both academia and industry, posing significant challenges to academic integrity and authorship attribution. In this work, we address SemEval-2026 Task 13: Detecting Machine-Generated Code by evaluating the effectiveness of four distinct methodologies: Graph Neural Networks (GNNs), Pre-trained Language Models (PLMs), Large Language Models (LLMs), and Stylometric Feature Engineering using XGBoost. Our approach focuses on three specific scenarios: Subtask A (Binary Detection), Subtask B (Multi-Class Authorship), and Subtask C (Hybrid Code Detection). While our models achieved high performance during the validation phase, the transition to the final test set revealed substantial challenges in generalization, likely due to the increased diversity of programming languages and generators in the unseen data. This work serves as a foundational first step, identifying critical gaps in model robustness and highlighting the need for more sophisticated methodologies to bridge the performance gap in complex, real-world environments.

2024

pdf bib abs

iimasNLP at SemEval-2024 Task 8: Unveiling structure-aware language models for automatic generated text identification
Andric Valdez | Fernando Márquez | Jorge Pantaleón | Helena Gómez | Gemma Bel-enguix
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

Large language models (LLMs) are artificial intelligence systems that can generate text, translate languages, and answer questions in a human-like way. While these advances are impressive, there is concern that LLMs could also be used to generate fake or misleading content. In this work, as a part of our participation in SemEval-2024 Task-8, we investigate the ability of LLMs to identify whether a given text was written by a human or by a specific AI. We believe that human and machine writing style patterns are different from each other, so integrating features at different language levels can help in this classification task. For this reason, we evaluate several LLMs that aim to extract valuable multilevel information (such as lexical, semantic, and syntactic) from the text in their training processing. Our best scores on Sub- taskA (monolingual) and SubtaskB were 71.5% and 38.2% in accuracy, respectively (both using the ConvBERT LLM); for both subtasks, the baseline (RoBERTa) achieved an accuracy of 74%.

Co-authors

Helena Gomez Adorno 1

Fabian Herrera 1

Fernando Márquez 1

Jorge Pantaleón 1

Venues

SemEval2
WS1

Fix author