MALTO at SemEval-2026 Task 13: Detecting Human, AI, and Hybrid Code via Hard Negative Mining and Curriculum-Driven Ensembles

Hüseyin Arslan, Evren Ayberk Munis, Timofei Khudonogov, Mert Akgun, Murat Besli, Ayhan Meherrem, Claudio Savelli, Flavio Giobergia


Abstract
The rapid advancement of Large Language Models (LLMs) has significantly impacted software engineering, posing challenges for determining the origin and authenticity of source code. This paper presents the MALTO team’s submission for SemEval-2026 Task 13, explicitly focusing on Subtask B (Authorship Attribution among 11 classes) and Subtask C (Hybrid Code Detection). To address severe class imbalance and the complex boundaries of mixed human-machine code, we propose a unified framework that leverages an ensemble of UniXcoder and CodeT5. Our approach integrates a robust Tree-sitter-based Universal Canonicalization strategy, Data Augmentation, and a novel 3-Phase Curriculum Training schedule enhanced by Hard Negative Mining. Specifically, UniXcoder’s cross-modal representations excel at distinguishing among semantically overlapping LLM families (Subtask B), whereas CodeT5’s identifier-aware architecture is superior at detecting subtle structural anomalies in hybrid and adversarial snippets (Subtask C). By aggregating these complementary strengths, our soft-voting ensemble overcomes the limitations of individual models, demonstrating strong robustness against imbalanced distributions and effectively discriminating between purely human, purely machine, hybrid, and adversarial code snippets.
Anthology ID:
2026.semeval-1.304
Volume:
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Ekaterina Kochmar, Debanjan Ghosh, Kai North, Mamoru Komachi
Venues:
SemEval | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2413–2420
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.304/
DOI:
Bibkey:
Cite (ACL):
Hüseyin Arslan, Evren Ayberk Munis, Timofei Khudonogov, Mert Akgun, Murat Besli, Ayhan Meherrem, Claudio Savelli, and Flavio Giobergia. 2026. MALTO at SemEval-2026 Task 13: Detecting Human, AI, and Hybrid Code via Hard Negative Mining and Curriculum-Driven Ensembles. In Proceedings of the 20th International Workshop on Semantic Evaluation (2026), pages 2413–2420, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
MALTO at SemEval-2026 Task 13: Detecting Human, AI, and Hybrid Code via Hard Negative Mining and Curriculum-Driven Ensembles (Arslan et al., SemEval 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.304.pdf