Large Language Models Are Challenged by Habitat-Centered Reasoning

Sadaf Ghaffari, Nikhil Krishnaswamy


Abstract
In this paper we perform a novel in-depth evaluation of text-only and multimodal LLMs’ abilities to reason about object *habitats* or conditions on how objects are situated in their environments that affect the types of behaviors (or *affordances*) that can be enacted upon them. We present a novel curated multimodal dataset of questions about object habitats and affordances, which are formally grounded in the underlying lexical semantics literature, with multiple images from various sources that depict the scenario described in the question. We evaluate 16 text-only and multimodal LLMs on this challenging data. Our findings indicate that while certain LLMs can perform reasonably well on reasoning about affordances, there appears to be a consistent low upper bound on habitat-centered reasoning performance. We discuss how the formal semantics of habitats in fact predicts this behavior and propose this as a challenge to the community.
Anthology ID:
2024.findings-emnlp.763
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13047–13059
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2024.findings-emnlp.763/
DOI:
10.18653/v1/2024.findings-emnlp.763
Bibkey:
Cite (ACL):
Sadaf Ghaffari and Nikhil Krishnaswamy. 2024. Large Language Models Are Challenged by Habitat-Centered Reasoning. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 13047–13059, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Large Language Models Are Challenged by Habitat-Centered Reasoning (Ghaffari & Krishnaswamy, Findings 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2024.findings-emnlp.763.pdf
Data:
 2024.findings-emnlp.763.data.zip