Mind the (Language) Gap: Towards Probing Numerical and Cross-Lingual Limits of LVLMs
Somraj Gautam, Abhirama Subramanyam Penamakuri, Abhishek Bhandari, Gaurav Harit
Abstract
We introduce MMCRICBENCH-3K, a benchmark for Visual Question Answering (VQA) on cricket scorecards, designed to evaluate large vision-language models (LVLMs) on complex numerical and cross-lingual reasoning over semi-structured tabular images. MMCRICBENCH-3K comprises 1,463 synthetically generated scorecard images from ODI, T20, and Test formats, accompanied by 1,500 English QA pairs. It includes two subsets: MMCRICBENCH-E-1.5K, featuring English scorecards, and MMCRICBENCH-H1.5K, containing visually similar Hindi scorecards, with all questions and answers kept in English to enable controlled cross-script evaluation. The task demands reasoning over structured numerical data, multi-image context, and implicit domain knowledge. Empirical results show that even state-of-the-art LVLMs, such as GPT-4o and Qwen2.5VL, struggle on the English subset despite it being their primary training language and exhibit a further drop in performance on the Hindi subset. This reveals key limitations in structure-aware visual text understanding, numerical reasoning, and cross-lingual generalization. The dataset is publicly available via Hugging Face at https://huggingface.co/ datasets/DIALab/MMCricBench, to promote LVLM research in this direction.- Anthology ID:
- 2025.mrl-main.38
- Volume:
- Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025)
- Month:
- November
- Year:
- 2025
- Address:
- Suzhuo, China
- Editors:
- David Ifeoluwa Adelani, Catherine Arnett, Duygu Ataman, Tyler A. Chang, Hila Gonen, Rahul Raja, Fabian Schmidt, David Stap, Jiayi Wang
- Venues:
- MRL | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 568–584
- Language:
- URL:
- https://preview.aclanthology.org/ingest-emnlp/2025.mrl-main.38/
- DOI:
- Cite (ACL):
- Somraj Gautam, Abhirama Subramanyam Penamakuri, Abhishek Bhandari, and Gaurav Harit. 2025. Mind the (Language) Gap: Towards Probing Numerical and Cross-Lingual Limits of LVLMs. In Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025), pages 568–584, Suzhuo, China. Association for Computational Linguistics.
- Cite (Informal):
- Mind the (Language) Gap: Towards Probing Numerical and Cross-Lingual Limits of LVLMs (Gautam et al., MRL 2025)
- PDF:
- https://preview.aclanthology.org/ingest-emnlp/2025.mrl-main.38.pdf