Crossing Language Boundaries: Evaluation of Large Language Models on Urdu-English Question Answering

Samreen Kazi; Maria Rahim; Shakeel Ahmed Khoja

Crossing Language Boundaries: Evaluation of Large Language Models on Urdu-English Question Answering

Samreen Kazi, Maria Rahim, Shakeel Ahmed Khoja

Abstract

This study evaluates the question-answering capabilities of Large Language Models (LLMs) in Urdu, addressing a critical gap in low-resource language processing. Four models GPT-4, mBERT, XLM-R, and mT5 are assessed across monolingual, cross-lingual, and mixed-language settings using the UQuAD1.0 and SQuAD2.0 datasets. Results reveal significant performance gaps between English and Urdu processing, with GPT-4 achieving the highest F1 scores (89.1% in English, 76.4% in Urdu) while demonstrating relative robustness in cross-lingual scenarios. Boundary detection and translation mismatches emerge as primary challenges, particularly in cross-lingual settings. The study further demonstrates that question complexity and length significantly impact performance, with factoid questions yielding 14.2% higher F1 scores compared to complex questions. These findings establish important benchmarks for enhancing LLM performance in low-resource languages and identify key areas for improvement in multilingual question-answering systems.

Anthology ID:: 2025.indonlp-1.17
Volume:: Proceedings of the First Workshop on Natural Language Processing for Indo-Aryan and Dravidian Languages
Month:: January
Year:: 2025
Address:: Abu Dhabi
Editors:: Ruvan Weerasinghe, Isuri Anuradha, Deshan Sumanathilaka
Venues:: IndoNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 141–151
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2025.indonlp-1.17/
DOI:
Bibkey:
Cite (ACL):: Samreen Kazi, Maria Rahim, and Shakeel Ahmed Khoja. 2025. Crossing Language Boundaries: Evaluation of Large Language Models on Urdu-English Question Answering. In Proceedings of the First Workshop on Natural Language Processing for Indo-Aryan and Dravidian Languages, pages 141–151, Abu Dhabi. Association for Computational Linguistics.
Cite (Informal):: Crossing Language Boundaries: Evaluation of Large Language Models on Urdu-English Question Answering (Kazi et al., IndoNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2025.indonlp-1.17.pdf

PDF Cite Search Fix data