Towards Blind and Low-Vision Accessibility of Lightweight VLMs and Custom LLM-Evals

Shruti Singh Baghel, Yash Pratap Singh Rathore, Anurag Pradhan, Sushovan Jena, Arnav Bhavsar, Amit Shukla, Pawan Goyal


Abstract
Large Vision-Language Models (VLMs) excel at understanding and generating video descriptions but their high memory, computation, and deployment demands hinder practical use particularly for blind and low-vision (BLV) users who depend on detailed, context-aware descriptions. To study the effect of model size on accessibility-focused description quality, we evaluate SmolVLM2 variants with 500M and 2.2B parameters across two diverse datasets: AVCaps (outdoor), and Charades (indoor). In this work, we introduce two novel evaluation frameworks specifically designed for BLV accessibility assessment: the Multi-Context BLV Framework evaluating spatial orientation, social interaction, action events, and ambience contexts; and the Navigational Assistance Framework focusing on mobility-critical information. Additionally, we conduct a systematic evaluation of four different prompt design strategies and deploy both models on a smartphone, evaluating FP32 and INT8 precision variants to assess real-world performance constraints on resource-limited mobile devices.
Anthology ID:
2025.mmloso-1.8
Volume:
Proceedings of the 1st Workshop on Multimodal Models for Low-Resource Contexts and Social Impact (MMLoSo 2025)
Month:
December
Year:
2025
Address:
Mumbai, India
Editors:
Ankita Shukla, Sandeep Kumar, Amrit Singh Bedi, Tanmoy Chakraborty
Venues:
MMLoSo | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
86–94
Language:
URL:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.mmloso-1.8/
DOI:
Bibkey:
Cite (ACL):
Shruti Singh Baghel, Yash Pratap Singh Rathore, Anurag Pradhan, Sushovan Jena, Arnav Bhavsar, Amit Shukla, and Pawan Goyal. 2025. Towards Blind and Low-Vision Accessibility of Lightweight VLMs and Custom LLM-Evals. In Proceedings of the 1st Workshop on Multimodal Models for Low-Resource Contexts and Social Impact (MMLoSo 2025), pages 86–94, Mumbai, India. Association for Computational Linguistics.
Cite (Informal):
Towards Blind and Low-Vision Accessibility of Lightweight VLMs and Custom LLM-Evals (Baghel et al., MMLoSo 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.mmloso-1.8.pdf