Language as a Label: Zero-Shot Multimodal Classification of Everyday Postures under Data Scarcity

Ming Ze Tang, Jubal Chandy Jacob


Abstract
This paper investigates how the specificity of natural language prompts influences zero-shot classification performance in modern vision language models (VLMs) under severe data scarcity. Using a curated 285 image subset of MS COCO containing three everyday postures (sitting, standing, and walking/running), we evaluate OpenCLIP, MetaCLIP2, and SigLIP alongside unimodal and pose-based baselines. We introduce a three tier prompt design, minimal labels, action cues, and compact geometric descriptions and systematically vary only the linguistic detail. Our results reveal a counterintuitive trend where simpler prompts consistently outperform more detailed ones, a phenomenon we term prompt overfitting. Grad-CAM attribution further shows that prompt specificity shifts attention between contextual and pose-relevant regions, explaining the model dependent behaviour. The study provides a controlled analysis of prompt granularity in low resource image based posture recognition, highlights the need for careful prompt design when labels are scarce.
Anthology ID:
2025.mmloso-1.5
Volume:
Proceedings of the 1st Workshop on Multimodal Models for Low-Resource Contexts and Social Impact (MMLoSo 2025)
Month:
December
Year:
2025
Address:
Mumbai, India
Editors:
Ankita Shukla, Sandeep Kumar, Amrit Singh Bedi, Tanmoy Chakraborty
Venues:
MMLoSo | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
48–57
Language:
URL:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.mmloso-1.5/
DOI:
Bibkey:
Cite (ACL):
Ming Ze Tang and Jubal Chandy Jacob. 2025. Language as a Label: Zero-Shot Multimodal Classification of Everyday Postures under Data Scarcity. In Proceedings of the 1st Workshop on Multimodal Models for Low-Resource Contexts and Social Impact (MMLoSo 2025), pages 48–57, Mumbai, India. Association for Computational Linguistics.
Cite (Informal):
Language as a Label: Zero-Shot Multimodal Classification of Everyday Postures under Data Scarcity (Tang & Jacob, MMLoSo 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.mmloso-1.5.pdf