WINOVIZ: Probing Visual Properties of Objects Under Different States

Woojeong Jin, Tejas Srinivasan, Jesse Thomason, Xiang Ren


Abstract
Humans interpret visual aspects of objects based on contexts. For example, a banana appears brown when rotten and green when unripe. Previous studies focused on language models’ grasp of typical object properties. We introduce WINOVIZ, a text-only dataset with 1,380 examples of probing language models’ reasoning about diverse visual properties under different contexts. Our task demands pragmatic and visual knowledge reasoning. We also present multi-hop data, a more challenging version requiring multi-step reasoning chains. Experimental findings include: a) GPT-4 excels overall but struggles with multi-hop data. b) Large models perform well in pragmatic reasoning but struggle with visual knowledge reasoning. c) Vision-language models outperform language-only models.
Anthology ID:
2024.insights-1.14
Volume:
Proceedings of the Fifth Workshop on Insights from Negative Results in NLP
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Shabnam Tafreshi, Arjun Akula, João Sedoc, Aleksandr Drozd, Anna Rogers, Anna Rumshisky
Venues:
insights | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
110–123
Language:
URL:
https://aclanthology.org/2024.insights-1.14
DOI:
Bibkey:
Cite (ACL):
Woojeong Jin, Tejas Srinivasan, Jesse Thomason, and Xiang Ren. 2024. WINOVIZ: Probing Visual Properties of Objects Under Different States. In Proceedings of the Fifth Workshop on Insights from Negative Results in NLP, pages 110–123, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
WINOVIZ: Probing Visual Properties of Objects Under Different States (Jin et al., insights-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/jeptaln-2024-ingestion/2024.insights-1.14.pdf