CIC-NLP at GenAI Detection Task 1: Leveraging DistilBERT for Detecting Machine-Generated Text in English
Tolulope Olalekan Abiola, Tewodros Achamaleh Bizuneh, Oluwatobi Joseph Abiola, Temitope Olasunkanmi Oladepo, Olumide Ebenezer Ojo, Grigori Sidorov, Olga Kolesnikova
Abstract
As machine-generated texts (MGT) become increasingly similar to human writing, these dis- tinctions are harder to identify. In this paper, we as the CIC-NLP team present our submission to the Gen-AI Content Detection Workshop at COLING 2025 for Task 1 Subtask A, which involves distinguishing between text generated by LLMs and text authored by humans, with an emphasis on detecting English-only MGT. We applied the DistilBERT model to this binary classification task using the dataset provided by the organizers. Fine-tuning the model effectively differentiated between the classes, resulting in a micro-average F1-score of 0.70 on the evaluation test set. We provide a detailed explanation of the fine-tuning parameters and steps involved in our analysis.- Anthology ID:
- 2025.genaidetect-1.29
- Volume:
- Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect)
- Month:
- January
- Year:
- 2025
- Address:
- Abu Dhabi, UAE
- Editors:
- Firoj Alam, Preslav Nakov, Nizar Habash, Iryna Gurevych, Shammur Chowdhury, Artem Shelmanov, Yuxia Wang, Ekaterina Artemova, Mucahid Kutlu, George Mikros
- Venues:
- GenAIDetect | WS
- SIG:
- Publisher:
- International Conference on Computational Linguistics
- Note:
- Pages:
- 271–277
- Language:
- URL:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2025.genaidetect-1.29/
- DOI:
- Cite (ACL):
- Tolulope Olalekan Abiola, Tewodros Achamaleh Bizuneh, Oluwatobi Joseph Abiola, Temitope Olasunkanmi Oladepo, Olumide Ebenezer Ojo, Grigori Sidorov, and Olga Kolesnikova. 2025. CIC-NLP at GenAI Detection Task 1: Leveraging DistilBERT for Detecting Machine-Generated Text in English. In Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect), pages 271–277, Abu Dhabi, UAE. International Conference on Computational Linguistics.
- Cite (Informal):
- CIC-NLP at GenAI Detection Task 1: Leveraging DistilBERT for Detecting Machine-Generated Text in English (Abiola et al., GenAIDetect 2025)
- PDF:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2025.genaidetect-1.29.pdf