MIUN BiasPatrol at SemEval-2026 Task 13: Why TF-IDF can Beat Transformers for OOD Code Detection

Loviza Sahlen; Thomas Springfeldt; Mehwish Fatima; Raja Khurram Shahzad

MIUN BiasPatrol at SemEval-2026 Task 13: Why TF-IDF can Beat Transformers for OOD Code Detection

Loviza Sahlen, Thomas Springfeldt, Mehwish Fatima, Raja Khurram Shahzad

Abstract

The increasing use of AI-generated code underscores the need for effective detection systems. However, their performance often deteriorates when faced with distribution shifts. This paper presents our system for SemEval-2026 Task 13: A, which focuses on binary classification of human-written versus machine-generated code across various programming languages and domains. We systematically compare traditional classifiers, such as Random Forest and XGBoost, which utilize statistical and TF-IDF features, against pipelines that leverage frozen embeddings from advanced code transformers like UniXcoder and GraphCodeBERT. Our results reveal a notable trade-off, i.e., while transformer-based pipelines excel in in-distribution validation (reaching up to 0.89 Macro F1), they experience severe performance drops up to 77%; when applied to out-of-distribution languages and domains. In contrast, models employing TF-IDF with tree-based classifiers demonstrate significantly greater stability. We identify this fragility as a result of a bias toward superficial formatting, particularly whitespace, which is accentuated by transformers. By implementing simple space normalization, we enhance the out-of-distribution robustness of traditional models; however, this also highlights the ongoing dependence of embeddings on these non-semantic features. Our findings suggest that for creating generalizable code detection systems, straightforward, well-normalized lexical features may be more reliable than complex, unrefined embeddings.

Anthology ID:: 2026.semeval-1.312
Volume:: Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Ekaterina Kochmar, Debanjan Ghosh, Kai North, Mamoru Komachi
Venues:: SemEval | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2469–2474
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.312/
DOI:
Bibkey:
Cite (ACL):: Loviza Sahlen, Thomas Springfeldt, Mehwish Fatima, and Raja Khurram Shahzad. 2026. MIUN BiasPatrol at SemEval-2026 Task 13: Why TF-IDF can Beat Transformers for OOD Code Detection. In Proceedings of the 20th International Workshop on Semantic Evaluation (2026), pages 2469–2474, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: MIUN BiasPatrol at SemEval-2026 Task 13: Why TF-IDF can Beat Transformers for OOD Code Detection (Sahlen et al., SemEval 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.312.pdf
Supplementarymaterial:: 2026.semeval-1.312.SupplementaryMaterial.zip

PDF Cite Search Supplementarymaterial Fix data