Abhishek Kumar Singh
2025
INDIC QA BENCHMARK: A Multilingual Benchmark to Evaluate Question Answering capability of LLMs for Indic Languages
Abhishek Kumar Singh
|
Vishwajeet Kumar
|
Rudra Murthy
|
Jaydeep Sen
|
Ashish Mittal
|
Ganesh Ramakrishnan
Findings of the Association for Computational Linguistics: NAACL 2025
Large Language Models (LLMs) perform well on unseen tasks in English, but their abilities in non-English languages are less explored due to limited benchmarks and training data. To bridge this gap, we introduce the Indic-QA Benchmark, a large dataset for context-grounded question answering in 11 major Indian languages, covering both extractive and abstractive tasks. Evaluations of multilingual LLMs, including instruction fine-tuned versions, revealed weak performance in low-resource languages due to a strong English-language bias in their training data. We also investigated the Translate-Test paradigm,where inputs are translated to English for processing and the results are translated back into the source language for output. This approach outperformed multilingual LLMs, particularly in low-resource settings. By releasing Indic-QA, we aim to promote further research into LLMs’ question-answering capabilities in low-resource languages. This benchmark offers a critical resource to address existing limitations and foster multilingual understanding.