Sruthi Santhanam

2026

Stylometry at SemEval-2026 Task 13: Clustered Stylometric Modeling for Machine-Generated Code Detection
Sruthi Santhanam | Parthib Sarkar | Yashvardhan Sharma
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

Machine-generated code detection is examined under out-of-distribution conditions where robust generalization is required. A hybrid feature representation is used in which code snippets are encoded through character-level TF–IDF patterns together with explicit structural indicators capturing properties such as verbosity and formatting behavior. Variability across generators is handled through clustering-based expert specialization, and predictions are produced using an ensemble of logistic regression and Naïve Bayes models with calibrated thresholds. Experimental results show that the proposed approach performs competitively despite relying on simple linear classifiers. The findings suggest that persistent structural patterns in code provide reliable cross-domain signals for identifying machine-generated programs.

Co-authors

Venues

SemEval1
WS1

Fix author