Codexa at SemEval-2026 Task 13: Loss Engineering and Diverse Ensemble Strategies for Multi-Class Code Authorship Attribution

Anıl Dervişoğlu; Atakan Site

Codexa at SemEval-2026 Task 13: Loss Engineering and Diverse Ensemble Strategies for Multi-Class Code Authorship Attribution

Abstract

We describe our system for SemEval-2026 Task 13, Subtask B: code classification into 11 categories (human-written or generated by one of 10 LLM families). The task presents extreme class imbalance and distribution shift across multiple generators provided in the dataset (31 in training, 59 in test, with 36 unseen). On that focus, we approached with two components: (1) UniXcoder as the encoder with Label-Distribution-Aware Margin (LDAM) loss for handling class imbalance, which provides a +7% absolute improvement over the cross-entropy baseline; and (2) a diverse ensemble of 12 models trained with different objectives and architectures which is detailed in the appendix, combined with hard voting. Our system achieves 41.28% Macro F1 on the official test set. We find that loss engineering and ensemble diversity matter more than domain adaptation techniques, which consistently degraded test performance.

Anthology ID:: 2026.semeval-1.441
Volume:: Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Ekaterina Kochmar, Debanjan Ghosh, Kai North, Mamoru Komachi
Venues:: SemEval | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3602–3607
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.441/
DOI:
Bibkey:
Cite (ACL):: Anıl Dervişoğlu and Atakan Site. 2026. Codexa at SemEval-2026 Task 13: Loss Engineering and Diverse Ensemble Strategies for Multi-Class Code Authorship Attribution. In Proceedings of the 20th International Workshop on Semantic Evaluation (2026), pages 3602–3607, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: Codexa at SemEval-2026 Task 13: Loss Engineering and Diverse Ensemble Strategies for Multi-Class Code Authorship Attribution (Dervişoğlu & Site, SemEval 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.441.pdf
Supplementarymaterial:: 2026.semeval-1.441.SupplementaryMaterial.zip

PDF Cite Search Supplementarymaterial Fix data