Amit Shukla


2025

Large Vision-Language Models (VLMs) excel at understanding and generating video descriptions but their high memory, computation, and deployment demands hinder practical use particularly for blind and low-vision (BLV) users who depend on detailed, context-aware descriptions. To study the effect of model size on accessibility-focused description quality, we evaluate SmolVLM2 variants with 500M and 2.2B parameters across two diverse datasets: AVCaps (outdoor), and Charades (indoor). In this work, we introduce two novel evaluation frameworks specifically designed for BLV accessibility assessment: the Multi-Context BLV Framework evaluating spatial orientation, social interaction, action events, and ambience contexts; and the Navigational Assistance Framework focusing on mobility-critical information. Additionally, we conduct a systematic evaluation of four different prompt design strategies and deploy both models on a smartphone, evaluating FP32 and INT8 precision variants to assess real-world performance constraints on resource-limited mobile devices.

2024

This paper explores the concept of solving implicature in Natural Language Processing (NLP), highlighting its significance in understanding indirect communication. Drawing on foundational theories by Austin, Searle, and Grice, we discuss how implicature extends beyond literal language to convey nuanced meanings. We review existing datasets, including the Pragmatic Understanding Benchmark (PUB), that assess models’ capabilities in recognizing and interpreting implicatures. Despite recent advances in large language models (LLMs), challenges remain in effectively processing implicature due to limitations in training data and the complexities of contextual interpretation. We propose future directions for research, including the enhancement of datasets and the integration of pragmatic reasoning tasks, to improve LLMs’ understanding of implicature and facilitate better human-computer interaction.