Minh Ngoc Ta
2026
FAID: Fine-grained AI-generated Text Detection using Multi-task Auxiliary and Multi-level Contrastive Learning
Minh Ngoc Ta | Dong Cao Van | Duc-Anh Hoang | Minh Le-Anh | Truong Nguyen | My Anh Tran Nguyen | Yuxia Wang | Preslav Nakov | Dinh Viet Sang
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Minh Ngoc Ta | Dong Cao Van | Duc-Anh Hoang | Minh Le-Anh | Truong Nguyen | My Anh Tran Nguyen | Yuxia Wang | Preslav Nakov | Dinh Viet Sang
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
The growing collaboration between humans and AI models in generative tasks has introduced new challenges in distinguishing between *human-written*, *LLM-generated*, and *human-LLM collaborative* texts. In this work, we collect a multilingual, multi-domain, multi-generator dataset *FAIDSet*. We further introduce a fine-grained detection framework *FAID* to classify text into these three categories, and also to identify the underlying LLM family of the generator. Unlike existing binary classifiers, FAID is built to capture both authorship and model-specific characteristics. Our method combines multi-level contrastive learning with multi-task auxiliary classification to learn subtle stylistic cues. By modeling LLM families as distinct stylistic entities, we incorporate an adaptation to address distributional shifts without retraining for unseen data. Our experimental results demonstrate that FAID outperforms several baselines, particularly enhancing the generalization accuracy on unseen domains and new LLMs, thus offering a potential solution for improving transparency and accountability in AI-assisted writing. Our data and code are available at [https://github.com/mbzuai-nlp/FAID](https://github.com/mbzuai-nlp/FAID)
2025
GenAI Content Detection Task 1: English and Multilingual Machine-Generated Text Detection: AI vs. Human
Yuxia Wang | Artem Shelmanov | Jonibek Mansurov | Akim Tsvigun | Vladislav Mikhailov | Rui Xing | Zhuohan Xie | Jiahui Geng | Giovanni Puccetti | Ekaterina Artemova | Jinyan Su | Minh Ngoc Ta | Mervat Abassy | Kareem Ashraf Elozeiri | Saad El Dine Ahmed El Etter | Maiya Goloburda | Tarek Mahmoud | Raj Vardhan Tomar | Nurkhan Laiyk | Osama Mohammed Afzal | Ryuto Koike | Masahiro Kaneko | Alham Fikri Aji | Nizar Habash | Iryna Gurevych | Preslav Nakov
Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect)
Yuxia Wang | Artem Shelmanov | Jonibek Mansurov | Akim Tsvigun | Vladislav Mikhailov | Rui Xing | Zhuohan Xie | Jiahui Geng | Giovanni Puccetti | Ekaterina Artemova | Jinyan Su | Minh Ngoc Ta | Mervat Abassy | Kareem Ashraf Elozeiri | Saad El Dine Ahmed El Etter | Maiya Goloburda | Tarek Mahmoud | Raj Vardhan Tomar | Nurkhan Laiyk | Osama Mohammed Afzal | Ryuto Koike | Masahiro Kaneko | Alham Fikri Aji | Nizar Habash | Iryna Gurevych | Preslav Nakov
Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect)
We present the GenAI Content Detection Task 1 – a shared task on binary machine generated text detection, conducted as a part of the GenAI workshop at COLING 2025. The task consists of two subtasks: Monolingual (English) and Multilingual. The shared task attracted many participants: 36 teams made official submissions to the Monolingual subtask during the test phase and 27 teams – to the Multilingual. We provide a comprehensive overview of the data, a summary of the results – including system rankings and performance scores – detailed descriptions of the participating systems, and an in-depth analysis of submissions.
2024
LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection
Mervat Abassy | Kareem Elozeiri | Alexander Aziz | Minh Ngoc Ta | Raj Vardhan Tomar | Bimarsha Adhikari | Saad El Dine Ahmed | Yuxia Wang | Osama Mohammed Afzal | Zhuohan Xie | Jonibek Mansurov | Ekaterina Artemova | Vladislav Mikhailov | Rui Xing | Jiahui Geng | Hasan Iqbal | Zain Muhammad Mujahid | Tarek Mahmoud | Akim Tsvigun | Alham Fikri Aji | Artem Shelmanov | Nizar Habash | Iryna Gurevych | Preslav Nakov
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Mervat Abassy | Kareem Elozeiri | Alexander Aziz | Minh Ngoc Ta | Raj Vardhan Tomar | Bimarsha Adhikari | Saad El Dine Ahmed | Yuxia Wang | Osama Mohammed Afzal | Zhuohan Xie | Jonibek Mansurov | Ekaterina Artemova | Vladislav Mikhailov | Rui Xing | Jiahui Geng | Hasan Iqbal | Zain Muhammad Mujahid | Tarek Mahmoud | Akim Tsvigun | Alham Fikri Aji | Artem Shelmanov | Nizar Habash | Iryna Gurevych | Preslav Nakov
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
The ease of access to large language models (LLMs) has enabled a widespread of machine-generated texts, and now it is often hard to tell whether a piece of text was human-written or machine-generated. This raises concerns about potential misuse, particularly within educational and academic domains. Thus, it is important to develop practical systems that can automate the process. Here, we present one such system, LLM-DetectAIve, designed for fine-grained detection. Unlike most previous work on machine-generated text detection, which focused on binary classification, LLM-DetectAIve supports four categories: (i) human-written, (ii) machine-generated, (iii) machine-written, then machine-humanized, and (iv) human-written, then machine-polished. Category (iii) aims to detect attempts to obfuscate the fact that a text was machine-generated, while category (iv) looks for cases where the LLM was used to polish a human-written text, which is typically acceptable in academic writing, but not in education. Our experiments show that LLM-DetectAIve can effectively identify the above four categories, which makes it a potentially useful tool in education, academia, and other domains.LLM-DetectAIve is publicly accessible at https://github.com/mbzuai-nlp/LLM-DetectAIve. The video describing our system is available at https://youtu.be/E8eT_bE7k8c.
Search
Fix author
Co-authors
- Preslav Nakov 3
- Yuxia Wang 3
- Mervat Abassy 2
- Osama Mohammed Afzal 2
- Alham Fikri Aji 2
- Ekaterina Artemova 2
- Jiahui Geng 2
- Iryna Gurevych 2
- Nizar Habash 2
- Tarek Mahmoud 2
- Jonibek Mansurov 2
- Vladislav Mikhailov 2
- Artem Shelmanov 2
- Raj Vardhan Tomar 2
- Akim Tsvigun 2
- Zhuohan Xie 2
- Rui Xing 2
- Bimarsha Adhikari 1
- Saad El Dine Ahmed 1
- Alexander Aziz 1
- Saad El Dine Ahmed El Etter 1
- Kareem Elozeiri 1
- Kareem Ashraf Elozeiri 1
- Maiya Goloburda 1
- Duc-Anh Hoang 1
- Hasan Iqbal 1
- Masahiro Kaneko 1
- Ryuto Koike 1
- Nurkhan Laiyk 1
- Minh Le-Anh 1
- Zain Muhammad Mujahid 1
- Truong Nguyen 1
- My Anh Tran Nguyen 1
- Giovanni Puccetti 1
- Dinh Viet Sang 1
- Jinyan Su 1
- Dong Cao Van 1