Nikunj Kotecha


2026

Indic languages, spoken by over 1.5 billion people, pose unique challenges for NLP due to their cultural richness, linguistic diversity, and structural complexity. We present IndicMMLU-Pro, a comprehensive benchmark extending the MMLU-Pro framework to nine major Indic languages: Hindi, Bengali, Gujarati, Marathi, Kannada, Punjabi, Tamil, Telugu, and Urdu. Covering a wide range of tasks in comprehension, reasoning, and generation, IndicMMLU-Pro offers a standardized evaluation framework to advance AI model development in Indic contexts. This paper details the benchmark’s design, taxonomy, and data curation, and establishes baseline results using state-of-the-art multilingual models. As an open resource IndicMMLU-Pro aims to accelerate progress in Indic language technologies and support inclusive research in multilingual NLP.