Droid: A Resource Suite for AI-Generated Code Detection

Daniil Orel, Indraneil Paul, Iryna Gurevych, Preslav Nakov


Abstract
We present DroidCollection, the most extensive open data suite for training and evaluating machine-generated code detectors, comprising over a million code samples, seven programming languages, outputs from 43 coding models, and three real-world coding domains. Alongside fully AI-generated examples, our collection includes human-AI co-authored code, as well as adversarial examples explicitly crafted to evade detection. Subsequently, we develop DroidDetect, a suite of encoder-only detectors trained using a multi-task objective over DroidCollection. Our experiments show that existing detectors’ performance fails to generalise to diverse coding domains and programming languages outside of their narrow training data. We further demonstrate that while most detectors are easily compromised by humanising the output distributions using superficial prompting and alignment approaches, this problem can be easily amended by training on a small number of adversarial examples. Finally, we demonstrate the effectiveness of metric learning and uncertainty-based resampling as way to enhance detector training on possibly noisy distributions.
Anthology ID:
2025.emnlp-main.1593
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
31251–31277
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1593/
DOI:
Bibkey:
Cite (ACL):
Daniil Orel, Indraneil Paul, Iryna Gurevych, and Preslav Nakov. 2025. Droid: A Resource Suite for AI-Generated Code Detection. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 31251–31277, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Droid: A Resource Suite for AI-Generated Code Detection (Orel et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1593.pdf
Checklist:
 2025.emnlp-main.1593.checklist.pdf