NUST CodeIntel at SemEval-2026 Task 13: Cross-Domain Detection of Machine-Generated Code via Stylometric Features and Transformer Models

Azher Ali, Mehwish Fatima


Abstract
We present our submission to SemEval-2026 Task 13 on cross-language and cross-domain detection of machine-generated code. We compare TF-IDF-based models with stylometric features against LoRA-tuned transformer encoders. While transformers achieve near-perfect in-distribution performance, they degrade sharply on unseen languages and domains. In contrast, a TF-IDF + Logistic Regression model attains the best test Macro-F1 and shows greater robustness. These results highlight the limitations of neural models under distribution shift and the strength of lexical baselines for cross-domain generalization.
Anthology ID:
2026.semeval-1.61
Volume:
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Ekaterina Kochmar, Debanjan Ghosh, Kai North, Mamoru Komachi
Venues:
SemEval | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
426–433
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.61/
DOI:
Bibkey:
Cite (ACL):
Azher Ali and Mehwish Fatima. 2026. NUST CodeIntel at SemEval-2026 Task 13: Cross-Domain Detection of Machine-Generated Code via Stylometric Features and Transformer Models. In Proceedings of the 20th International Workshop on Semantic Evaluation (2026), pages 426–433, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
NUST CodeIntel at SemEval-2026 Task 13: Cross-Domain Detection of Machine-Generated Code via Stylometric Features and Transformer Models (Ali & Fatima, SemEval 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.61.pdf
Supplementarymaterial:
 2026.semeval-1.61.SupplementaryMaterial.zip