Divyam Gupta
2026
Systematic Performance Degradation in Indic Vision-Language Models: Evidence from Hindi and Telugu
Rishikant Chigrupaatii | Ponnada Sai Tulasi Kanishka | Lalit Chandra Routhu | Martin Patel | Sama Supratheek Reddy | Divyam Gupta | Rajiv Misra | Rohun Tripathi
Proceedings of the 4th Workshop on Advances in Language and Vision Research (ALVR)
Rishikant Chigrupaatii | Ponnada Sai Tulasi Kanishka | Lalit Chandra Routhu | Martin Patel | Sama Supratheek Reddy | Divyam Gupta | Rajiv Misra | Rohun Tripathi
Proceedings of the 4th Workshop on Advances in Language and Vision Research (ALVR)
With 1.5 billion people speaking over 120 major languages, India exemplifies the challenges of multilingual AI evaluation. Current multilingual VLM benchmarks suffer from unverified auto-translations, narrow task coverage, small sample sizes, and lack of culturally grounded content. We present HinTel-AlignBench, a comprehensive evaluation framework and benchmark for Hindi and Telugu vision-language models with English-aligned samples. Our framework combines semi-automated translation with human verification to generate 4k QA pairs per language across five domains: adapted English datasets (VQAv2, RealWorldQA, CLEVR-Math) and native Indic sets (JEE for STEM, VAANI for cultural grounding). Evaluation of state-of-the-art open and closed-source VLMs reveals consistent performance regression from English to Indic languages, with average drops of 8.3 points for Hindi and 5.5 points for Telugu across four of five tasks. We identify key failure modes and establish reproducible baselines for multilingual multimodal evaluation.