Benchmarking Hindi LLMs: A New Suite of Datasets and a Comparative Analysis
Anusha Kamath, Kanishk Singla, Rakesh Paul, Raviraj Bhuminand Joshi, Utkarsh Vaidya, Sanjay Singh Chauhan, Niranjan Wartikar
Abstract
Evaluating instruction-tuned Large Language Models (LLMs) in Hindi is challenging due to a lack of high-quality benchmarks, as direct translation of English datasets fails to capture crucial linguistic and cultural nuances. To address this, we introduce a suite of five Hindi LLM evaluation datasets: IFEval-Hi, MT-Bench-Hi, GSM8K-Hi, ChatRAG-Hi, and BFCL-Hi. These were created using a methodology that combines from-scratch human annotation with a translate-and-verify process. We leverage this suite to conduct an extensive benchmarking of open-source LLMs supporting Hindi, providing a detailed comparative analysis of their current capabilities. Our curation process also serves as a replicable methodology for developing benchmarks in other low-resource languages.- Anthology ID:
- 2025.bhasha-1.5
- Volume:
- Proceedings of the 1st Workshop on Benchmarks, Harmonization, Annotation, and Standardization for Human-Centric AI in Indian Languages (BHASHA 2025)
- Month:
- December
- Year:
- 2025
- Address:
- Mumbai, India
- Editors:
- Arnab Bhattacharya, Pawan Goyal, Saptarshi Ghosh, Kripabandhu Ghosh
- Venues:
- BHASHA | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 52–68
- Language:
- URL:
- https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.bhasha-1.5/
- DOI:
- Cite (ACL):
- Anusha Kamath, Kanishk Singla, Rakesh Paul, Raviraj Bhuminand Joshi, Utkarsh Vaidya, Sanjay Singh Chauhan, and Niranjan Wartikar. 2025. Benchmarking Hindi LLMs: A New Suite of Datasets and a Comparative Analysis. In Proceedings of the 1st Workshop on Benchmarks, Harmonization, Annotation, and Standardization for Human-Centric AI in Indian Languages (BHASHA 2025), pages 52–68, Mumbai, India. Association for Computational Linguistics.
- Cite (Informal):
- Benchmarking Hindi LLMs: A New Suite of Datasets and a Comparative Analysis (Kamath et al., BHASHA 2025)
- PDF:
- https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.bhasha-1.5.pdf