Sushovan Jena
2025
Towards Blind and Low-Vision Accessibility of Lightweight VLMs and Custom LLM-Evals
Shruti Singh Baghel
|
Yash Pratap Singh Rathore
|
Anurag Pradhan
|
Sushovan Jena
|
Arnav Bhavsar
|
Amit Shukla
|
Pawan Goyal
Proceedings of the 1st Workshop on Multimodal Models for Low-Resource Contexts and Social Impact (MMLoSo 2025)
Large Vision-Language Models (VLMs) excel at understanding and generating video descriptions but their high memory, computation, and deployment demands hinder practical use particularly for blind and low-vision (BLV) users who depend on detailed, context-aware descriptions. To study the effect of model size on accessibility-focused description quality, we evaluate SmolVLM2 variants with 500M and 2.2B parameters across two diverse datasets: AVCaps (outdoor), and Charades (indoor). In this work, we introduce two novel evaluation frameworks specifically designed for BLV accessibility assessment: the Multi-Context BLV Framework evaluating spatial orientation, social interaction, action events, and ambience contexts; and the Navigational Assistance Framework focusing on mobility-critical information. Additionally, we conduct a systematic evaluation of four different prompt design strategies and deploy both models on a smartphone, evaluating FP32 and INT8 precision variants to assess real-world performance constraints on resource-limited mobile devices.
2024
OdiaGenAI’s Participation in WMT2024 English-to-Low Resource Multimodal Translation Task
Shantipriya Parida
|
Shashikanta Sahoo
|
Sambit Sekhar
|
Upendra Jena
|
Sushovan Jena
|
Kusum Lata
Proceedings of the Ninth Conference on Machine Translation
This paper covers the system description of the team “ODIAGEN’s” submission to the WMT~2024 English-to-Low-Resource Multimodal Translation Task. We participated in the English-to-Low Resource Multimodal Translation Task, in two of the tasks, i.e. Text-only Translation and Multi-modal Translation. For Text-only Translation, we trained the Mistral-7B model for English to Multi-lingual (Hindi, Bengali, Malayalam, Hausa). For Multi-modal Translation (using both image and text), we trained the PaliGemma-3B model for English to Hindi translation.
Search
Fix author
Co-authors
- Shruti Singh Baghel 1
- Arnav Bhavsar 1
- Pawan Goyal 1
- Upendra Jena 1
- Kusum Lata 1
- show all...