Systematic Performance Degradation in Indic Vision-Language Models: Evidence from Hindi and Telugu

Rishikant Chigrupaatii, Ponnada Sai Tulasi Kanishka, Lalit Chandra Routhu, Martin Patel, Sama Supratheek Reddy, Divyam Gupta, Rajiv Misra, Rohun Tripathi


Abstract
With 1.5 billion people speaking over 120 major languages, India exemplifies the challenges of multilingual AI evaluation. Current multilingual VLM benchmarks suffer from unverified auto-translations, narrow task coverage, small sample sizes, and lack of culturally grounded content. We present HinTel-AlignBench, a comprehensive evaluation framework and benchmark for Hindi and Telugu vision-language models with English-aligned samples. Our framework combines semi-automated translation with human verification to generate 4k QA pairs per language across five domains: adapted English datasets (VQAv2, RealWorldQA, CLEVR-Math) and native Indic sets (JEE for STEM, VAANI for cultural grounding). Evaluation of state-of-the-art open and closed-source VLMs reveals consistent performance regression from English to Indic languages, with average drops of 8.3 points for Hindi and 5.5 points for Telugu across four of five tasks. We identify key failure modes and establish reproducible baselines for multilingual multimodal evaluation.
Anthology ID:
2026.alvr-main.26
Volume:
Proceedings of the 4th Workshop on Advances in Language and Vision Research (ALVR)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Qianqi Yan, Syrielle Montariol, Yue Fan, Jing Gu, Jiayi Pan, Manling Li, Parisa Kordjamshidi, Alane Suhr, Xin Eric Wang
Venues:
ALVR | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
272–277
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.alvr-main.26/
DOI:
Bibkey:
Cite (ACL):
Rishikant Chigrupaatii, Ponnada Sai Tulasi Kanishka, Lalit Chandra Routhu, Martin Patel, Sama Supratheek Reddy, Divyam Gupta, Rajiv Misra, and Rohun Tripathi. 2026. Systematic Performance Degradation in Indic Vision-Language Models: Evidence from Hindi and Telugu. In Proceedings of the 4th Workshop on Advances in Language and Vision Research (ALVR), pages 272–277, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
Systematic Performance Degradation in Indic Vision-Language Models: Evidence from Hindi and Telugu (Chigrupaatii et al., ALVR 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.alvr-main.26.pdf