YNU-HPCC at SemEval-2026 Task 13: Robust Machine-Generated Code Detection under Distribution Shifts

Lixian Xing, Jin Wang, Xuejie Zhang


Abstract
As Large Language Models (LLMs) become prevalent in software development, distinguishing machine-generated from human-written code is increasingly important. This paper describes the system developed by the YNU-HPCC team for SemEval-2026 Task 13, which evaluates detection under cross-language, multi-generator, and hybrid settings. Three modeling paradigms are systematically examined: encoder-based fine-tuning, feature-based machine learning, and task-specific robustness strategies. For Subtask A (Binary Detection), frozen pre-trained encoders and shallow stylometric features exhibit stronger cross-domain robustness than full fine-tuning, with indentation entropy identified as a key discriminative signal. For Subtask B (Multi-Class Attribution), a hierarchical two-stage framework is adopted to decouple human–machine discrimination from fine-grained generator attribution, alleviating severe class imbalance. For Subtask C (Hybrid Detection), a token-level splicing augmentation strategy combined with Supervised Contrastive Learning and Focal Loss is employed to model intra-sample stylistic variation. According to the official leaderboard, our system ranked 12th out of 81 teams in Subtask A, 14th out of 34 in Subtask B, and 8th out of 32 in Subtask C.
Anthology ID:
2026.semeval-1.205
Volume:
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Ekaterina Kochmar, Debanjan Ghosh, Kai North, Mamoru Komachi
Venues:
SemEval | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1582–1590
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.205/
DOI:
Bibkey:
Cite (ACL):
Lixian Xing, Jin Wang, and Xuejie Zhang. 2026. YNU-HPCC at SemEval-2026 Task 13: Robust Machine-Generated Code Detection under Distribution Shifts. In Proceedings of the 20th International Workshop on Semantic Evaluation (2026), pages 1582–1590, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
YNU-HPCC at SemEval-2026 Task 13: Robust Machine-Generated Code Detection under Distribution Shifts (Xing et al., SemEval 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.205.pdf
Supplementarymaterial:
 2026.semeval-1.205.SupplementaryMaterial.tex