Usman Haider
2026
Prompt-driven Detection of Offensive Urdu Language using Large Language Models
Iffat Maab | Usman Haider | Junichi Yamagishi
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Iffat Maab | Usman Haider | Junichi Yamagishi
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Offensive language detection poses a significant challenge in modern social spaces, necessitating advanced solutions. Online media platforms have been known to escalate acts of violence and broader conflicts, and thus, an automated system to help counter offensive content is essential. Traditional NLP models have typically dominated the field of hate speech detection, but require careful model design and extensive tuning. Moreover, a notable resource gap exists for addressing offensive languages, particularly those transcribed in non-native scripts, such as Roman Urdu and Urdu. This study explores the potential of pre-trained LLMs in using prompt-based methods using different transcriptions of the Urdu language, particularly their efficacy in detecting offensive content in diverse linguistic contexts. Our study employs state-of-the-art open-source LLMs, including advanced variants of Llama, Qwen, Lughaat, and proprietary GPT-4, which are evaluated through prompting strategies in different under-resourced languages. Our findings show that pre-trained LLMs achieve performance comparable to traditional fine-tuned benchmarks in detecting hateful and offensive content.