Prompt-driven Detection of Offensive Urdu Language using Large Language Models

Iffat Maab, Usman Haider, Junichi Yamagishi


Abstract
Offensive language detection poses a significant challenge in modern social spaces, necessitating advanced solutions. Online media platforms have been known to escalate acts of violence and broader conflicts, and thus, an automated system to help counter offensive content is essential. Traditional NLP models have typically dominated the field of hate speech detection, but require careful model design and extensive tuning. Moreover, a notable resource gap exists for addressing offensive languages, particularly those transcribed in non-native scripts, such as Roman Urdu and Urdu. This study explores the potential of pre-trained LLMs in using prompt-based methods using different transcriptions of the Urdu language, particularly their efficacy in detecting offensive content in diverse linguistic contexts. Our study employs state-of-the-art open-source LLMs, including advanced variants of Llama, Qwen, Lughaat, and proprietary GPT-4, which are evaluated through prompting strategies in different under-resourced languages. Our findings show that pre-trained LLMs achieve performance comparable to traditional fine-tuned benchmarks in detecting hateful and offensive content.
Anthology ID:
2026.eacl-long.201
Volume:
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4302–4327
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.201/
DOI:
Bibkey:
Cite (ACL):
Iffat Maab, Usman Haider, and Junichi Yamagishi. 2026. Prompt-driven Detection of Offensive Urdu Language using Large Language Models. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4302–4327, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Prompt-driven Detection of Offensive Urdu Language using Large Language Models (Maab et al., EACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.201.pdf