Jesse Atuhurra

2025

pdf bib abs
HLU: Human Vs LLM Generated Text Detection Dataset for Urdu at Multiple Granularities
Iqra Ali | Jesse Atuhurra | Hidetaka Kamigaito | Taro Watanabe
Proceedings of the 31st International Conference on Computational Linguistics

The rise of large language models (LLMs) generating human-like text has raised concerns about misuse, especially in low-resource languages like Urdu. To address this gap, we introduce the HLU dataset, which consists of three datasets: Document, Paragraph, and Sentence level. The document-level dataset contains 1,014 instances of human-written and LLM-generated articles across 13 domains, while the paragraph and sentence-level datasets each contain 667 instances. We conducted both human and automatic evaluations. In the human evaluation, the average accuracy at the document level was 35%, while at the paragraph and sentence levels, accuracies were 75.68% and 88.45%, respectively. For automatic evaluation, we finetuned the XLMRoBERTa model for both monolingual and multilingual settings achieving consistent results in both. Additionally, we assessed the performance of GPT4 and Claude3Opus using zero-shot prompting. Our experiments and evaluations indicate that distinguishing between human and machine-generated text is challenging for both humans and LLMs, marking a significant step in addressing this issue in Urdu.

Co-authors

Venues

coling1

Fix data

Jesse Atuhurra

Fixing paper assignments

2025

Co-authors

Venues