@inproceedings{kazi-etal-2025-crossing,
    title = "Crossing Language Boundaries: Evaluation of Large Language Models on {U}rdu-{E}nglish Question Answering",
    author = "Kazi, Samreen  and
      Rahim, Maria  and
      Khoja, Shakeel Ahmed",
    editor = "Weerasinghe, Ruvan  and
      Anuradha, Isuri  and
      Sumanathilaka, Deshan",
    booktitle = "Proceedings of the First Workshop on Natural Language Processing for Indo-Aryan and Dravidian Languages",
    month = jan,
    year = "2025",
    address = "Abu Dhabi",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2025.indonlp-1.17/",
    pages = "141--151",
    abstract = "This study evaluates the question-answering capabilities of Large Language Models (LLMs) in Urdu, addressing a critical gap in low-resource language processing. Four models GPT-4, mBERT, XLM-R, and mT5 are assessed across monolingual, cross-lingual, and mixed-language settings using the UQuAD1.0 and SQuAD2.0 datasets. Results reveal significant performance gaps between English and Urdu processing, with GPT-4 achieving the highest F1 scores (89.1{\%} in English, 76.4{\%} in Urdu) while demonstrating relative robustness in cross-lingual scenarios. Boundary detection and translation mismatches emerge as primary challenges, particularly in cross-lingual settings. The study further demonstrates that question complexity and length significantly impact performance, with factoid questions yielding 14.2{\%} higher F1 scores compared to complex questions. These findings establish important benchmarks for enhancing LLM performance in low-resource languages and identify key areas for improvement in multilingual question-answering systems."
}Markdown (Informal)
[Crossing Language Boundaries: Evaluation of Large Language Models on Urdu-English Question Answering](https://preview.aclanthology.org/ingest-emnlp/2025.indonlp-1.17/) (Kazi et al., IndoNLP 2025)
ACL