VAQUUM: Are Vague Quantifiers Grounded in Visual Data?

Hugh Mee Wong, Rick Nouwen, Albert Gatt


Abstract
Vague quantifiers such as “a few” and “many” are influenced by various contextual factors, including the number of objects present in a given context. In this work, we evaluate the extent to which vision-and-language models (VLMs) are compatible with humans when producing or judging the appropriateness of vague quantifiers in visual contexts. We release a novel dataset, VAQUUM, containing 20,300 human ratings on quantified statements across a total of 1089 images. Using this dataset, we compare human judgments and VLM predictions using three different evaluation methods. Our findings show that VLMs, like humans, are influenced by object counts in vague quantifier use. However, we find significant inconsistencies across models in different evaluation settings, suggesting that judging and producing vague quantifiers rely on two different processes. We release our dataset and code at https://github.com/hughmee/vaquum.
Anthology ID:
2025.findings-acl.619
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11966–11982
Language:
URL:
https://preview.aclanthology.org/transition-to-people-yaml/2025.findings-acl.619/
DOI:
10.18653/v1/2025.findings-acl.619
Bibkey:
Cite (ACL):
Hugh Mee Wong, Rick Nouwen, and Albert Gatt. 2025. VAQUUM: Are Vague Quantifiers Grounded in Visual Data?. In Findings of the Association for Computational Linguistics: ACL 2025, pages 11966–11982, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
VAQUUM: Are Vague Quantifiers Grounded in Visual Data? (Wong et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/transition-to-people-yaml/2025.findings-acl.619.pdf