Sruthi Santhanam


2026

Machine-generated code detection is examined under out-of-distribution conditions where robust generalization is required. A hybrid feature representation is used in which code snippets are encoded through character-level TF–IDF patterns together with explicit structural indicators capturing properties such as verbosity and formatting behavior. Variability across generators is handled through clustering-based expert specialization, and predictions are produced using an ensemble of logistic regression and Naïve Bayes models with calibrated thresholds. Experimental results show that the proposed approach performs competitively despite relying on simple linear classifiers. The findings suggest that persistent structural patterns in code provide reliable cross-domain signals for identifying machine-generated programs.