LATE-IIMAS at Semeval-2026 Task 13: Evaluating GNNs, PLMs, LLMs, and Stylometry for Automatic Code Identification

Andric Valdez, Emmanuel Ancona, Sebastián Bernardino, Helena Gomez-Adorno, Fazlourrahman Balouchzahi, Fabian Herrera


Abstract
The generation of source code via Artificial Intelligence has become a prevalent practice in both academia and industry, posing significant challenges to academic integrity and authorship attribution. In this work, we address SemEval-2026 Task 13: Detecting Machine-Generated Code by evaluating the effectiveness of four distinct methodologies: Graph Neural Networks (GNNs), Pre-trained Language Models (PLMs), Large Language Models (LLMs), and Stylometric Feature Engineering using XGBoost. Our approach focuses on three specific scenarios: Subtask A (Binary Detection), Subtask B (Multi-Class Authorship), and Subtask C (Hybrid Code Detection). While our models achieved high performance during the validation phase, the transition to the final test set revealed substantial challenges in generalization, likely due to the increased diversity of programming languages and generators in the unseen data. This work serves as a foundational first step, identifying critical gaps in model robustness and highlighting the need for more sophisticated methodologies to bridge the performance gap in complex, real-world environments.
Anthology ID:
2026.semeval-1.339
Volume:
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Ekaterina Kochmar, Debanjan Ghosh, Kai North, Mamoru Komachi
Venues:
SemEval | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2689–2696
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.339/
DOI:
Bibkey:
Cite (ACL):
Andric Valdez, Emmanuel Ancona, Sebastián Bernardino, Helena Gomez-Adorno, Fazlourrahman Balouchzahi, and Fabian Herrera. 2026. LATE-IIMAS at Semeval-2026 Task 13: Evaluating GNNs, PLMs, LLMs, and Stylometry for Automatic Code Identification. In Proceedings of the 20th International Workshop on Semantic Evaluation (2026), pages 2689–2696, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
LATE-IIMAS at Semeval-2026 Task 13: Evaluating GNNs, PLMs, LLMs, and Stylometry for Automatic Code Identification (Valdez et al., SemEval 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.339.pdf
Supplementarymaterial:
 2026.semeval-1.339.SupplementaryMaterial.zip