This One or That One? A Study on Accessibility via Demonstratives with Multimodal Large Language Models

Yu Wang; Emmanuele Chersoni; Chu-Ren Huang

This One or That One? A Study on Accessibility via Demonstratives with Multimodal Large Language Models

Yu Wang, Emmanuele Chersoni, Chu-Ren Huang

Abstract

Accessibility refers to the ease with which a speaker can acquire an object, and it is often conveyed through demonstrative pronouns like "this" and "that", indicating proximal or distal objects. Most importantly, accessibility also involves perspective shifts, which are essential for understanding differing viewpoints. In this case study, we adopt an evaluation dataset with a pair-to-pair question structure for referent identification based on demonstratives. Our experiments show that current Multimodal Large Language Models (MLLMs) exhibit markedly low performance in accessibility tasks requiring perspective shifts, with accuracies around 2.33% (Chinese) and 1.83% (English). Moreover, models struggle with qualitative characteristics and frame-based reasoning, often failing to apply implicit contextual rules unless explicitly encoded in training data. These limitations suggest that MLLMs rely heavily on surface co-occurrence instead of truly grounded, embodied experience. Our evaluation framework provides a robust lens revealing that MLLMs lack both self-other distinction—an essential aspect of self-awareness—and the embodied cognition necessary for reliable performance in practical embodied AI applications.

Anthology ID:: 2026.lrec-main.763
Volume:: Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:: May
Year:: 2026
Address:: Palma de Mallorca, Spain
Editors:: Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:: LREC
SIG:
Publisher:: ELRA Language Resource Association
Note:
Pages:: 9722–9732
Language:
URL:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.763/
DOI:
Bibkey:
Cite (ACL):: Yu Wang, Emmanuele Chersoni, and Chu-Ren Huang. 2026. This One or That One? A Study on Accessibility via Demonstratives with Multimodal Large Language Models. International Conference on Language Resources and Evaluation, main:9722–9732.
Cite (Informal):: This One or That One? A Study on Accessibility via Demonstratives with Multimodal Large Language Models (Wang et al., LREC 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.763.pdf

PDF Cite Search Fix data