ObjChangeVR: Object State Change Reasoning from Continuous Egocentric Views in VR Environments

Shiyi Ding, Shaoen Wu, Ying Chen


Abstract
Recent advances in multimodal large language models (MLLMs) offer a promising approach for natural language-based scene change queries in virtual reality (VR). Prior work on applying MLLMs for object state understanding has focused on egocentric videos that capture the camera wearer’s interactions with objects. However, object state changes may occur in the background without direct user interaction, lacking explicit motion cues and making them difficult to detect. Moreover, no benchmark exists for evaluating this challenging scenario. To address these challenges, we introduce ObjChangeVR-Dataset, specifically for benchmarking the question-answering task of object state change. We also propose ObjChangeVR, a framework that combines viewpoint-aware and temporal-based retrieval to identify relevant frames, along with cross-view reasoning that reconciles inconsistent evidence from multiple viewpoints. Extensive experiments demonstrate that ObjChangeVR significantly outperforms baseline approaches across multiple MLLMs.
Anthology ID:
2026.eacl-long.272
Volume:
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5795–5812
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.272/
DOI:
Bibkey:
Cite (ACL):
Shiyi Ding, Shaoen Wu, and Ying Chen. 2026. ObjChangeVR: Object State Change Reasoning from Continuous Egocentric Views in VR Environments. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5795–5812, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
ObjChangeVR: Object State Change Reasoning from Continuous Egocentric Views in VR Environments (Ding et al., EACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.272.pdf