Towards Blind and Low-Vision Accessibility of Lightweight VLMs and Custom LLM-Evals
Shruti Singh Baghel, Yash Pratap Singh Rathore, Anurag Pradhan, Sushovan Jena, Arnav Bhavsar, Amit Shukla, Pawan Goyal
Abstract
Large Vision-Language Models (VLMs) excel at understanding and generating video descriptions but their high memory, computation, and deployment demands hinder practical use particularly for blind and low-vision (BLV) users who depend on detailed, context-aware descriptions. To study the effect of model size on accessibility-focused description quality, we evaluate SmolVLM2 variants with 500M and 2.2B parameters across two diverse datasets: AVCaps (outdoor), and Charades (indoor). In this work, we introduce two novel evaluation frameworks specifically designed for BLV accessibility assessment: the Multi-Context BLV Framework evaluating spatial orientation, social interaction, action events, and ambience contexts; and the Navigational Assistance Framework focusing on mobility-critical information. Additionally, we conduct a systematic evaluation of four different prompt design strategies and deploy both models on a smartphone, evaluating FP32 and INT8 precision variants to assess real-world performance constraints on resource-limited mobile devices.- Anthology ID:
- 2025.mmloso-1.8
- Volume:
- Proceedings of the 1st Workshop on Multimodal Models for Low-Resource Contexts and Social Impact (MMLoSo 2025)
- Month:
- December
- Year:
- 2025
- Address:
- Mumbai, India
- Editors:
- Ankita Shukla, Sandeep Kumar, Amrit Singh Bedi, Tanmoy Chakraborty
- Venues:
- MMLoSo | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 86–94
- Language:
- URL:
- https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.mmloso-1.8/
- DOI:
- Cite (ACL):
- Shruti Singh Baghel, Yash Pratap Singh Rathore, Anurag Pradhan, Sushovan Jena, Arnav Bhavsar, Amit Shukla, and Pawan Goyal. 2025. Towards Blind and Low-Vision Accessibility of Lightweight VLMs and Custom LLM-Evals. In Proceedings of the 1st Workshop on Multimodal Models for Low-Resource Contexts and Social Impact (MMLoSo 2025), pages 86–94, Mumbai, India. Association for Computational Linguistics.
- Cite (Informal):
- Towards Blind and Low-Vision Accessibility of Lightweight VLMs and Custom LLM-Evals (Baghel et al., MMLoSo 2025)
- PDF:
- https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.mmloso-1.8.pdf