Stylometry at SemEval-2026 Task 13: Clustered Stylometric Modeling for Machine-Generated Code Detection

Sruthi Santhanam; Parthib Sarkar; Yashvardhan Sharma

Stylometry at SemEval-2026 Task 13: Clustered Stylometric Modeling for Machine-Generated Code Detection

Sruthi Santhanam, Parthib Sarkar, Yashvardhan Sharma

Abstract

Machine-generated code detection is examined under out-of-distribution conditions where robust generalization is required. A hybrid feature representation is used in which code snippets are encoded through character-level TF–IDF patterns together with explicit structural indicators capturing properties such as verbosity and formatting behavior. Variability across generators is handled through clustering-based expert specialization, and predictions are produced using an ensemble of logistic regression and Naïve Bayes models with calibrated thresholds. Experimental results show that the proposed approach performs competitively despite relying on simple linear classifiers. The findings suggest that persistent structural patterns in code provide reliable cross-domain signals for identifying machine-generated programs.

Anthology ID:: 2026.semeval-1.172
Volume:: Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Ekaterina Kochmar, Debanjan Ghosh, Kai North, Mamoru Komachi
Venues:: SemEval | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1319–1325
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.172/
DOI:
Bibkey:
Cite (ACL):: Sruthi Santhanam, Parthib Sarkar, and Yashvardhan Sharma. 2026. Stylometry at SemEval-2026 Task 13: Clustered Stylometric Modeling for Machine-Generated Code Detection. In Proceedings of the 20th International Workshop on Semantic Evaluation (2026), pages 1319–1325, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: Stylometry at SemEval-2026 Task 13: Clustered Stylometric Modeling for Machine-Generated Code Detection (Santhanam et al., SemEval 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.172.pdf
Supplementarymaterial:: 2026.semeval-1.172.SupplementaryMaterial.zip

PDF Cite Search Supplementarymaterial Fix data