Benchmarking Hindi LLMs: A New Suite of Datasets and a Comparative Analysis

Anusha Kamath, Kanishk Singla, Rakesh Paul, Raviraj Bhuminand Joshi, Utkarsh Vaidya, Sanjay Singh Chauhan, Niranjan Wartikar


Abstract
Evaluating instruction-tuned Large Language Models (LLMs) in Hindi is challenging due to a lack of high-quality benchmarks, as direct translation of English datasets fails to capture crucial linguistic and cultural nuances. To address this, we introduce a suite of five Hindi LLM evaluation datasets: IFEval-Hi, MT-Bench-Hi, GSM8K-Hi, ChatRAG-Hi, and BFCL-Hi. These were created using a methodology that combines from-scratch human annotation with a translate-and-verify process. We leverage this suite to conduct an extensive benchmarking of open-source LLMs supporting Hindi, providing a detailed comparative analysis of their current capabilities. Our curation process also serves as a replicable methodology for developing benchmarks in other low-resource languages.
Anthology ID:
2025.bhasha-1.5
Volume:
Proceedings of the 1st Workshop on Benchmarks, Harmonization, Annotation, and Standardization for Human-Centric AI in Indian Languages (BHASHA 2025)
Month:
December
Year:
2025
Address:
Mumbai, India
Editors:
Arnab Bhattacharya, Pawan Goyal, Saptarshi Ghosh, Kripabandhu Ghosh
Venues:
BHASHA | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
52–68
Language:
URL:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.bhasha-1.5/
DOI:
Bibkey:
Cite (ACL):
Anusha Kamath, Kanishk Singla, Rakesh Paul, Raviraj Bhuminand Joshi, Utkarsh Vaidya, Sanjay Singh Chauhan, and Niranjan Wartikar. 2025. Benchmarking Hindi LLMs: A New Suite of Datasets and a Comparative Analysis. In Proceedings of the 1st Workshop on Benchmarks, Harmonization, Annotation, and Standardization for Human-Centric AI in Indian Languages (BHASHA 2025), pages 52–68, Mumbai, India. Association for Computational Linguistics.
Cite (Informal):
Benchmarking Hindi LLMs: A New Suite of Datasets and a Comparative Analysis (Kamath et al., BHASHA 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.bhasha-1.5.pdf