Nithila R
2026
Segmentation Fault at SemEval-2026 Task 13: A Regularization-First Approach with Generator-Based Out-of-Distribution Splits for Detecting AI-Generated Code
Lakshmi Priya Swaminatha Rao | Dhannya Santhakumari Madhavan | Sreya Kodeswaran | Nithila R | Kanmani R
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Lakshmi Priya Swaminatha Rao | Dhannya Santhakumari Madhavan | Sreya Kodeswaran | Nithila R | Kanmani R
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
This paper describes our submission to SemEval-2026 Task 13 (Subtask A) on detecting AI-generated code. We fine-tune CodeBERT-base using a generator-aware out-of-distribution (OOD) validation split to better simulate unseen test generators. Strong regularization techniques, including stochastic data augmentation, dropout, weight decay, and label smoothing, are applied to prevent overfitting to generator-specific patterns. Experiments with logistic regression, UniXcoder, and vanilla CodeBERT reveal that evaluation design has a larger impact on generalization than model scale or training data volume. Our final system achieves a macro F1 score of 0.439 on the hidden test set, representing a 62% relative improvement over unregularized baselines.