Nhi Hoai Doan


2025

pdf bib
Grape at GenAI Detection Task 1: Leveraging Compact Models and Linguistic Features for Robust Machine-Generated Text Detection
Nhi Hoai Doan | Kentaro Inui
Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect)

In this project, we aim to address two subtasks of Task 1: Binary Multilingual Machine-Generated Text (MGT) Detection (Human vs. Machine) as part of the COLING 2025 Workshop on MGT Detection (Wang et al., 2025) using different approaches. The first method involves separately fine-tuning small language models tailored to the specific subtask. The second approach builds on this methodology by incorporating linguistic, syntactic, and semantic features, leveraging ensemble learning to integrate these features with model predictions for more robust classification. By evaluating and comparing these approaches, we aim to identify the most effective techniques for detecting machine-generated content across languages, providing insights into improving automated verification tools amidst the rapid growth of LLM-generated text in digital spaces.

pdf bib
Understanding and Controlling Repetition Neurons and Induction Heads in In-Context Learning
Nhi Hoai Doan | Tatsuya Hiraoka | Kentaro Inui
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

This paper investigates the relationship between large language models’ (LLMs) ability to recognize repetitive input patterns and their performance on in-context learning (ICL). In contrast to prior work that has primarily focused on attention heads, we examine this relationship from the perspective of skill neurons, specifically repetition neurons. Our experiments reveal that the impact of these neurons on ICL performance varies depending on the depth of the layer in which they reside. By comparing the effects of repetition neurons and induction heads, we further identify strategies for reducing repetitive outputs while maintaining strong ICL capabilities.