Königsberg at SemEval-2026 Task 13: Beyond Language Models: A Low-Resource Feature-Driven and Data-Flow Embedding Approach for Machine-Generated Code Detection

Shahir Habib

Königsberg at SemEval-2026 Task 13: Beyond Language Models: A Low-Resource Feature-Driven and Data-Flow Embedding Approach for Machine-Generated Code Detection

Abstract

The rise of Large Language Models (LLMs)has increased the need for reliable detection ofmachine-generated code. This paper presentsa low-resource, hybrid detection frameworkdeveloped for for SemEval-2026 Task 13 ,designed to operate efficiently without the computational overhead of end-to-end fine-tuningof large models. Our approach combines(i) comprehensive feature extraction pipelinethat calculates interpretable software metricscapturing stylistic and structural properties ofcode, and (ii) we leverage the semantic capabilities of GraphCodeBERT by extractingfrozen embeddings from its pre-trained encoder to model semantic and data-flow information while preserving generalizability. Thisfusion enables efficient detection of machinegenerated code across multiple programminglanguages (Python, C++, Java, and Go) andimproves robustness under out-of-distributionsettings. This feature-driven fusion offers acompetitive, computation-efficient alternativeto purely LLM-based fully fine-tuned models,achieving an F1-score of 38.26.

Anthology ID:: 2026.semeval-1.169
Volume:: Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Ekaterina Kochmar, Debanjan Ghosh, Kai North, Mamoru Komachi
Venues:: SemEval | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1288–1297
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.169/
DOI:
Bibkey:
Cite (ACL):: Shahir Habib. 2026. Königsberg at SemEval-2026 Task 13: Beyond Language Models: A Low-Resource Feature-Driven and Data-Flow Embedding Approach for Machine-Generated Code Detection. In Proceedings of the 20th International Workshop on Semantic Evaluation (2026), pages 1288–1297, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: Königsberg at SemEval-2026 Task 13: Beyond Language Models: A Low-Resource Feature-Driven and Data-Flow Embedding Approach for Machine-Generated Code Detection (Habib, SemEval 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.169.pdf
Supplementarymaterial:: 2026.semeval-1.169.SupplementaryMaterial.zip

PDF Cite Search Supplementarymaterial Fix data