FMISUYotkovaKastreva at SemEval-2026 Task 13: Lightweight Detection of LLM-Generated Code via Stylometric Signals
Elitsa Yotkova, Violeta Kastreva, Dimitar Dimitrov, Ivan Koychev, Preslav Nakov
Abstract
SemEval-2026 Task 13 investigates machine-generated code detection across multiple programming languages and application scenarios, asking participating systems to generalize to unseen languages and domains. This paper describes our participation in Subtask A (binary classification) and explores both pretrained code encoders and lightweight feature-based methods.We design ratio-based features that are less sensitive to snippet length. To support the extraction of descriptiveness-related signals, we use parsing engines and a programming-language classifier. Additionally, we train a separate code-vs-text line classifier to identify raw natural language segments embedded within samples. We combine a shallow decision tree with heuristic rules derived from data analysis to produce the final predictions. Our approach is computationally efficient, requires only CPU resources for training, and achieves near-instant inference time, offering a lightweight alternative to large pretrained models.- Anthology ID:
- 2026.semeval-1.275
- Volume:
- Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, USA
- Editors:
- Ekaterina Kochmar, Debanjan Ghosh, Kai North, Mamoru Komachi
- Venues:
- SemEval | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2179–2186
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.275/
- DOI:
- Cite (ACL):
- Elitsa Yotkova, Violeta Kastreva, Dimitar Dimitrov, Ivan Koychev, and Preslav Nakov. 2026. FMISUYotkovaKastreva at SemEval-2026 Task 13: Lightweight Detection of LLM-Generated Code via Stylometric Signals. In Proceedings of the 20th International Workshop on Semantic Evaluation (2026), pages 2179–2186, San Diego, California, USA. Association for Computational Linguistics.
- Cite (Informal):
- FMISUYotkovaKastreva at SemEval-2026 Task 13: Lightweight Detection of LLM-Generated Code via Stylometric Signals (Yotkova et al., SemEval 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.275.pdf