TeluguEval: A Comprehensive Benchmark for Evaluating LLM Capabilities in Telugu

Revanth Kumar Gundam; Radhika Mamidi

TeluguEval: A Comprehensive Benchmark for Evaluating LLM Capabilities in Telugu

Abstract

Large Language Models (LLMs) excel on English reasoning tasks but falter on morphologically rich, low-resource languages such as Telugu, Tamil, and Kannada. We present TeluguEval, a human-curated reasoning benchmark created by translating GSM8K (math), Winogrande (commonsense), ARC (science), CaseHOLD (law), and Hendrycks Ethics into Telugu. We evaluate eight models spanning global (Llama-3.1-8B, Llama-2-7B, Qwen-8B, Gemma-7B, Gemini-2.0) and regional (Telugu-Llama2-7B, Indic-Gemma-7B, Sarvam-m-24B) systems. While extremely strong models such as Gemini and Sarvam-m largely retain performance in Telugu, most English-centric models suffer severe accuracy drops, often exceeding 30 to 40 points, particularly on mathematical and scientific reasoning. We further observe systematic failure modes including script sensitivity, option-selection bias, repetition loops, and unintended code-switching. Our results demonstrate that surface-level Telugu fluency does not imply robust reasoning capability, underscoring the need for Telugu-specific data, tokenization, and pretraining. TeluguEval provides a standardized testbed to drive progress on reasoning in low-resource Indian languages.

Anthology ID:: 2026.loreslm-1.20
Volume:: Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Hansi Hettiarachchi, Tharindu Ranasinghe, Alistair Plum, Paul Rayson, Ruslan Mitkov, Mohamed Gaber, Damith Premasiri, Fiona Anting Tan, Lasitha Uyangodage
Venue:: LoResLM
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 212–224
Language:
URL:: https://preview.aclanthology.org/manual-author-scripts/2026.loreslm-1.20/
DOI:
Bibkey:
Cite (ACL):: Revanth Kumar Gundam and Radhika Mamidi. 2026. TeluguEval: A Comprehensive Benchmark for Evaluating LLM Capabilities in Telugu. In Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026), pages 212–224, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: TeluguEval: A Comprehensive Benchmark for Evaluating LLM Capabilities in Telugu (Gundam & Mamidi, LoResLM 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/manual-author-scripts/2026.loreslm-1.20.pdf

PDF Cite Search Fix data