MedHastra at SemEval-2026 Task 13: Stylometric Ensembles and Transformer Fine-Tuning for Robust AI Code Detection, Attribution, and Adversarial Analysis

Shruti Chandrasekar; Vedajanaani R S; Vijayalakshmi P

MedHastra at SemEval-2026 Task 13: Stylometric Ensembles and Transformer Fine-Tuning for Robust AI Code Detection, Attribution, and Adversarial Analysis

Shruti Chandrasekar, Vedajanaani R S, Vijayalakshmi P

Abstract

This paper describes Team MedHastra’s submission to SemEval-2026 Task 13 on detecting machine-generated code across diverse programming languages, generators, and application scenarios. We participated in all three subtasks: (A) binary detection of AI-generated code under out-of-distribution conditions, (B) multi-class attribution across ten large language model families, and (C) classification of human, fully AI-generated, hybrid, and adversarial code.For Subtask A, we implemented a stylometric ensemble combining structural formatting features with word- and character-level TF-IDF representations, trained using Random Forest, Gradient Boosting, and Logistic Regression with soft voting. For Subtasks B and C, we fine-tuned CodeBERT to leverage contextual code representations, incorporating class balancing strategies such as downsampling and weighted cross-entropy.Our results demonstrate that handcrafted stylometric features struggle under strong distribution shift, while transformer-based contextual modeling is more effective for fine-grained attribution and hybrid/adversarial detection. The study highlights the importance of robust contextual representations for realistic AI-assisted programming scenarios.

Anthology ID:: 2026.semeval-1.264
Volume:: Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Ekaterina Kochmar, Debanjan Ghosh, Kai North, Mamoru Komachi
Venues:: SemEval | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2098–2103
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.264/
DOI:
Bibkey:
Cite (ACL):: Shruti Chandrasekar, Vedajanaani R S, and Vijayalakshmi P. 2026. MedHastra at SemEval-2026 Task 13: Stylometric Ensembles and Transformer Fine-Tuning for Robust AI Code Detection, Attribution, and Adversarial Analysis. In Proceedings of the 20th International Workshop on Semantic Evaluation (2026), pages 2098–2103, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: MedHastra at SemEval-2026 Task 13: Stylometric Ensembles and Transformer Fine-Tuning for Robust AI Code Detection, Attribution, and Adversarial Analysis (Chandrasekar et al., SemEval 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.264.pdf

PDF Cite Search Fix data